# Coffee Shop Revenue Prediction with Random Forest Regressor

In this notebook, we'll build a machine learning model to predict coffee shop revenue using a Random Forest Regressor. We'll follow these steps:

1. Data Loading and EDA
2. Feature Engineering
3. Model Training with Hyperparameter Tuning
4. Model Evaluation
5. Saving the Model

In [36]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 1. Data Loading and EDA

In [37]:
df = pd.read_csv('coffee_shop_revenue.csv')

In [38]:
df.shape

(2000, 7)

In [39]:
df.head()

Unnamed: 0,Number_of_Customers_Per_Day,Average_Order_Value,Operating_Hours_Per_Day,Number_of_Employees,Marketing_Spend_Per_Day,Location_Foot_Traffic,Daily_Revenue
0,152,6.74,14,4,106.62,97,1547.81
1,485,4.5,12,8,57.83,744,2084.68
2,398,9.09,6,6,91.76,636,3118.39
3,320,8.48,17,4,462.63,770,2912.2
4,156,7.44,17,2,412.52,232,1663.42


In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 7 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Number_of_Customers_Per_Day  2000 non-null   int64  
 1   Average_Order_Value          2000 non-null   float64
 2   Operating_Hours_Per_Day      2000 non-null   int64  
 3   Number_of_Employees          2000 non-null   int64  
 4   Marketing_Spend_Per_Day      2000 non-null   float64
 5   Location_Foot_Traffic        2000 non-null   int64  
 6   Daily_Revenue                2000 non-null   float64
dtypes: float64(3), int64(4)
memory usage: 109.5 KB


In [41]:
df.isnull().sum()

Unnamed: 0,0
Number_of_Customers_Per_Day,0
Average_Order_Value,0
Operating_Hours_Per_Day,0
Number_of_Employees,0
Marketing_Spend_Per_Day,0
Location_Foot_Traffic,0
Daily_Revenue,0


In [42]:
df.duplicated().sum()

np.int64(0)

## 2. Feature Engineering

In [43]:
X = df.drop('Daily_Revenue', axis=1)
y = df['Daily_Revenue']

In [44]:
X.shape

(2000, 6)

In [45]:
y.shape

(2000,)

## 3. Model Training (Random Forest Regressor)

In [46]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [47]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()

In [48]:
from sklearn.model_selection import GridSearchCV
param_grid = {
    'n_estimators': [100,200],
    'max_depth': [10,20,None],
    'min_samples_split': [2,5],
    'min_samples_leaf': [1,2]
}

In [49]:
ge_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    n_jobs = -1,
    verbose=1
)

In [50]:
ge_search.fit(X_train, y_train)

Fitting 5 folds for each of 24 candidates, totalling 120 fits


In [51]:
y_pred = ge_search.predict(X_test)

## 4. Model Evaluation

In [52]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.9477723207775639

In [53]:
import pickle as pk

In [56]:
with open('model.pkl', 'wb') as fs:
    pk.dump(ge_search, fs)

## 5. Save the Trained Model