<a href="https://colab.research.google.com/github/guilhermelaviola/ApplicationOfDataScienceForBusiness/blob/main/Class05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Artificial Intelligence for Business**
Artificial intelligence is transforming the business world by automating tasks, improving decision-making, and driving innovation, with machine learning as a central element of this revolution. Using algorithms capable of analyzing large volumes of data and generating accurate predictions, supported by languages ​​like Python and libraries like Scikit-Learn, companies can develop and evaluate AI models more efficiently. Tools like train_test_split ensure proper model validation, while practical applications expand to areas such as marketing, fraud detection, and supply chain management. Despite its great potential, the adoption of AI requires investments in talent, infrastructure, and organizational culture, as well as rigorous attention to ethics, governance, and accountability, so that its use is transparent, fair, and beneficial to society as a whole.

In [2]:
# Importing all the necessary libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import mean_squared_error, accuracy_score, classification_report

## **Example: Wine study**

In [4]:
# Loading the dataset from URL:
url = 'https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv'
df = pd.read_csv(url)

# Displaying the DataFrame:
df.head()

Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [5]:
# Dividing the data between input (X) and output (y) variables:
X = df.drop('Flavanoids', axis=1)
y = df['Nonflavanoid.phenols']

In [6]:
# Dividing the data into training and test:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the model:
# Changed from RandomForestClassifier to RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Training the model:
model.fit(X_train, y_train)

# Predictions of the test data:
y_pred = model.predict(X_test)

# Evaluating the model:
# Changed evaluation metrics for regression
mse = mean_squared_error(y_test, y_pred)

# Printing the results:
print(f'Mean Squared Error: {mse}')

Mean Squared Error: 8.709166666667317e-06


## **Example: Predicting Purchasing Behavior**
A practical example of AI application in business is predicting customer purchasing behavior in an e-commerce store. Here we use a machine learning algorithm to analyze a set of purchasing data and predict whether a customer will buy a particular product.

In [7]:
url = 'https://raw.githubusercontent.com/luminati-io/eCommerce-dataset-samples/refs/heads/main/amazon-products.csv'

# Loading the dataset:
df = pd.read_csv(url)
df.head()

Unnamed: 0,timestamp,title,seller_name,brand,description,initial_price,final_price,currency,availability,reviews_count,...,root_bs_category,bs_category,bs_rank,badge,subcategory_rank,amazon_choice,images,product_details,prices_breakdown,country_of_origin
0,2023-08-08 00:00:00.000,Saucony Men's Kinvara 13 Running Shoe,Orv███tor███,Saucony,"When it comes to lightweight speed, nothing cr...",,"""57.79""",USD,In Stock,702,...,,,,,,,,,,
1,2023-08-09 00:00:00.000,Kishigo Premium Black Series Heavy Duty Unisex...,Ama███.co███,Kishigo,The Kishigo Premium Black Series Heavy Duty Ve...,,"""28.5""",USD,In Stock,916,...,,,,,,,,,,
2,2024-02-04 00:00:00.000,TWINSLUXES Solar Post Cap Lights Outdoor - Wat...,Twi███uxe███,TWINSLUXES,Solar Post Cap Lights Waterproof LED Fence Pos...,"""49.99""","""33.99""",USD,In Stock,3178,...,,,,,,,,,,
3,2024-06-09 00:00:00.000,Accutire MS-4021B Digital Tire Pressure Gauge ...,Cit███ran███Dir██████,Accutire,About this item Heavy duty construction and ru...,1.795000000000000e+01,1.795000000000000e+01,USD,In Stock,8034,...,Automotive,Tire Repair Tools,50.0,,"[{""subcategory_name"":""Automotive"",""subcategory...",False,,,,
4,2024-01-16 00:00:00.000,SAURA LIFE SCIENCE Adivasi Ayurvedic Neelgiri ...,PRA███ EN███PRI███,SAURA LIFE SCIENCE,This extraordinary fusion is designed to nouri...,"""1299""","""799""",INR,In stock,5,...,,,,,,,,,,


In [10]:
# Dividing the data into input (X) and output (y) variables:
# Convert 'initial_price' and 'final_price' to numeric, removing quotes and coercing errors to NaN
df['initial_price'] = pd.to_numeric(df['initial_price'].astype(str).str.replace('"', ''), errors='coerce')
df['final_price'] = pd.to_numeric(df['final_price'].astype(str).str.replace('"', ''), errors='coerce')

# Drop rows where the target variable 'initial_price' is NaN
# This ensures a clean target for training
df_cleaned = df.dropna(subset=['initial_price']).copy()

# Define features (X) and target (y) from the cleaned DataFrame
y = df_cleaned['initial_price']

# Select only numerical features from X. This automatically handles dropping non-numeric columns like 'timestamp', 'title', etc.
X = df_cleaned.drop('initial_price', axis=1).select_dtypes(include=['number'])

# Impute missing values in the numerical features of X with 0 (or another strategy like mean/median).
# This is important for models that don't handle NaNs directly.
X = X.fillna(0)

# Splitting the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the model:
# 'initial_price' is a continuous variable, so RandomForestRegressor is more appropriate than RandomForestClassifier.
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Training the model
model.fit(X_train, y_train)

# Predictions on the test set
y_pred = model.predict(X_test)

# Evaluating the model:
# For regression, Mean Squared Error (MSE) is a common metric.
mse = mean_squared_error(y_test, y_pred)

# Printing the results:
print(f'Mean Squared Error: {mse}')
# Accuracy and Classification Report are not suitable for regression tasks.
# Other regression metrics like R2 score, MAE could also be computed.

Mean Squared Error: 4898.552087092272
