# Restaurant Occupancy Prediction Project

#### Develop and deploy web-based ML application using random forest regression and the Flask framework
This notebook applies RandomForestRegressor to build a model to predict the number of customers in a restaurants one hour in the future. The final  model was then integrated into a Flask web-based application for restaurant managers and chefs to use. 

## Project Overview
#### Machine Learning Development
The purpose of this project is to build a machine learning model to predict the number of patrons that will be present in a restaurant one hour from the query time. The model is to be used by the cafeteria’s head chef. The model predicts the number of whole persons present in a restaurant one hour into the future. The model is trained on the following features collected every 10 minutes over a 24-hour period for 6 months:
- The current day (a simple ordinal-encoded categorical variable with values 0-4 with 0 representing Monday, 1 representing Tuesday, 2 representing Wednesday, 3 representing Thursday, and 4 representing Friday)
- The current time (a float representing 24 hours)
- The current cafeteria occupancy (1 hour before predicted time)
- The cafeteria occupancy at the same time from the previous day (24 hours before predicted time)
- The cafeteria occupancy at the same time from the previous week (1 week before predicted time)
- The cafeteria occupancy at the same time from the previous month (1 month before predicted time)

And predicts the following response:
- A restaurant's occupancy one hour from the query time.

#### Machine Learning Model Deployment with Flask

To enhance usability, the model is integrated into a web application developed with Flask, a micro web framework for deploying web apps.

### About the data

The dataset consist of six months of cleaned restaurant occupancy data in `cleanTrain.csv` and `cleanTest.csv`. 

### 1.1 Machine Learning Model Development 

### Import libraries 

In [12]:
# common imports
import numpy as np
import pandas as pd
import pickle

### Load Data 
Start by loading the dataset and shaping it so that it's suitable for use in machine learning. 

In [4]:
cleanTrain = pd.read_csv('Data/cleanTrain.csv', index_col=0)
cleanTest = pd.read_csv('Data/cleanTest.csv', index_col=0)

cleanTrain

Unnamed: 0,day,curr_time,curr_occ,occ_24,occ_week,occ_month,target
0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
1,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
2,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
3,0.0,0.500000,0.0,0.0,0.0,0.0,0.0
4,0.0,0.666667,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...
7861,2.0,22.166667,0.0,0.0,0.0,0.0,0.0
7862,2.0,22.333333,0.0,0.0,0.0,0.0,0.0
7863,2.0,22.500000,0.0,0.0,0.0,0.0,0.0
7864,2.0,22.666667,0.0,0.0,0.0,0.0,0.0


#### Train/test split
Split the data into training data and testing data.

In [5]:
# Obtain X_train and y_train from clean data set
X_train = cleanTrain[['day', 'curr_time', 'curr_occ', 'occ_24', 'occ_week', 'occ_month']]
y_train = cleanTrain[['target']]

# Obtain X_test and y_test from clean data set
X_test = cleanTest[['day', 'curr_time', 'curr_occ', 'occ_24', 'occ_week', 'occ_month']]
y_test = cleanTest[['target']]

#### Hyperparameter tuning 
Train a RandomForestRegressor using three rounds of grid searching to ellucidate the optimal hyperparameter values.

In [6]:
# Train ML Models
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
# -----
# Coarse-Grained RandomForestRegressor GridSearch
# -----
param_grid = {'max_depth': [2,4,8,16,32],
              'n_estimators': [1,5,10,30,90,150,300],
              'min_samples_split':[2,4,6,8]
             }
grid_search_cv_rf = GridSearchCV(RandomForestRegressor(random_state=42),
                              param_grid, verbose=10, cv=3)
grid_search_cv_rf.fit(X_train, np.array(y_train).flatten())
print("The best parameters are: ", grid_search_cv_rf.best_params_)

Fitting 3 folds for each of 140 candidates, totalling 420 fits
[CV 1/3; 1/140] START max_depth=2, min_samples_split=2, n_estimators=1..........
[CV 1/3; 1/140] END max_depth=2, min_samples_split=2, n_estimators=1;, score=0.726 total time=   0.0s
[CV 2/3; 1/140] START max_depth=2, min_samples_split=2, n_estimators=1..........
[CV 2/3; 1/140] END max_depth=2, min_samples_split=2, n_estimators=1;, score=0.738 total time=   0.0s
[CV 3/3; 1/140] START max_depth=2, min_samples_split=2, n_estimators=1..........
[CV 3/3; 1/140] END max_depth=2, min_samples_split=2, n_estimators=1;, score=0.706 total time=   0.0s
[CV 1/3; 2/140] START max_depth=2, min_samples_split=2, n_estimators=5..........
[CV 1/3; 2/140] END max_depth=2, min_samples_split=2, n_estimators=5;, score=0.755 total time=   0.0s
[CV 2/3; 2/140] START max_depth=2, min_samples_split=2, n_estimators=5..........
[CV 2/3; 2/140] END max_depth=2, min_samples_split=2, n_estimators=5;, score=0.783 total time=   0.0s
[CV 3/3; 2/140] START 

In [7]:
# -----
# More Refined RandomForestRegressor GridSearch
# -----
param_grid = {'max_depth': [8, 12, 16, 22, 30],
              'n_estimators': [200,300,400],
              'min_samples_split':[2, 3]
             }
grid_search_cv_rf = GridSearchCV(RandomForestRegressor(random_state=42),
                              param_grid, verbose=10, cv=3)
grid_search_cv_rf.fit(X_train, np.array(y_train).flatten())
print("The best parameters are: ", grid_search_cv_rf.best_params_)

Fitting 3 folds for each of 30 candidates, totalling 90 fits
[CV 1/3; 1/30] START max_depth=8, min_samples_split=2, n_estimators=200.........
[CV 1/3; 1/30] END max_depth=8, min_samples_split=2, n_estimators=200;, score=0.896 total time=   3.3s
[CV 2/3; 1/30] START max_depth=8, min_samples_split=2, n_estimators=200.........
[CV 2/3; 1/30] END max_depth=8, min_samples_split=2, n_estimators=200;, score=0.908 total time=   3.3s
[CV 3/3; 1/30] START max_depth=8, min_samples_split=2, n_estimators=200.........
[CV 3/3; 1/30] END max_depth=8, min_samples_split=2, n_estimators=200;, score=0.879 total time=   3.4s
[CV 1/3; 2/30] START max_depth=8, min_samples_split=2, n_estimators=300.........
[CV 1/3; 2/30] END max_depth=8, min_samples_split=2, n_estimators=300;, score=0.896 total time=   4.9s
[CV 2/3; 2/30] START max_depth=8, min_samples_split=2, n_estimators=300.........
[CV 2/3; 2/30] END max_depth=8, min_samples_split=2, n_estimators=300;, score=0.909 total time=   4.9s
[CV 3/3; 2/30] STAR

In [8]:
# -----
# Final Refined RandomForestRegressor GridSearch
# -----
param_grid = {'max_depth': [16],
              'n_estimators': [250, 300, 350],
              'min_samples_split':[2, 3, 4]
             }
grid_search_cv_rf = GridSearchCV(RandomForestRegressor(random_state=42),
                              param_grid, verbose=10, cv=3)
grid_search_cv_rf.fit(X_train, np.array(y_train).flatten())
print("The best parameters are: ", grid_search_cv_rf.best_params_)

Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV 1/3; 1/9] START max_depth=16, min_samples_split=2, n_estimators=250.........
[CV 1/3; 1/9] END max_depth=16, min_samples_split=2, n_estimators=250;, score=0.904 total time=  10.6s
[CV 2/3; 1/9] START max_depth=16, min_samples_split=2, n_estimators=250.........
[CV 2/3; 1/9] END max_depth=16, min_samples_split=2, n_estimators=250;, score=0.916 total time=  12.9s
[CV 3/3; 1/9] START max_depth=16, min_samples_split=2, n_estimators=250.........
[CV 3/3; 1/9] END max_depth=16, min_samples_split=2, n_estimators=250;, score=0.894 total time=  10.6s
[CV 1/3; 2/9] START max_depth=16, min_samples_split=2, n_estimators=300.........
[CV 1/3; 2/9] END max_depth=16, min_samples_split=2, n_estimators=300;, score=0.904 total time=  12.5s
[CV 2/3; 2/9] START max_depth=16, min_samples_split=2, n_estimators=300.........
[CV 2/3; 2/9] END max_depth=16, min_samples_split=2, n_estimators=300;, score=0.916 total time=  12.6s
[CV 3/3; 2/9] START 

#### Final Model training and evaluation 

In [9]:
import math

# Create final model with optimal hyperparameter values
from sklearn.metrics import mean_squared_error
finalModel = RandomForestRegressor(random_state=42, max_depth=grid_search_cv_rf.best_params_['max_depth'],
                                  n_estimators=grid_search_cv_rf.best_params_['n_estimators'],
                                  min_samples_split=grid_search_cv_rf.best_params_['min_samples_split'])

# Train model
finalModel.fit(X_train, np.array(y_train).flatten())

# Get final model predictions
y_preds = finalModel.predict(X_test)

# Calculuate the root mean squared error
gen_err = math.sqrt(round(mean_squared_error(y_test, y_preds),4))
print("Error: ", gen_err)

Error:  5.471032809260058


#### Model Interpretation 

The final model was trained using optimal hyperparameters values and then evaluated using the MAE metric. Results show the model had a MAE of 5.47 which means each model prediction can by off about 5 persons.  

#### Export and test final model 

Export the trained model to a pickle file called `finalModel.pkl`.

In [10]:
pickle.dump(finalModel, open('finalModel.pkl', 'wb'))

Test the pickle file.

In [11]:
# Create Data Frame
data = pd.DataFrame({'day': [1.0], 'curr_time': [8.0],
     'curr_occ': [50.0], 'occ_24': [48.0], 'occ_week': [52.0],
     'occ_month': [54.0]})

# open file
file = open("finalModel.pkl", "rb")

# load trained model
trained_model = pickle.load(file)

# predict
prediction = trained_model.predict(data)
prediction

print("The model prediction states that", round(prediction[0]), "people will be present in the restaurant in 1 hour.")

The model prediction states that 29 people will be present in the restaurant in 1 hour.


### 1.2 Restaurant web-based application Development
To enable restaurant managers and chefs to make predictions, the model was incorporated into a Flask-based web application. This application was built in Python and deployed using the Flask framework.   

### 1.3 Localhost Deployment with Flask

Follow the below steps to deploy the ML model using Flask:
- Take a look at the `restaurant application` folder included in the restaurant occupancy prediction project. Replace the existing `finalModel.pkl` file with your own pickle file called `finalModel.pkl`.
- In your terminal/command prompt, navigate to the `restaurant_application` directory.
- Follow [these instructions](https://flask.palletsprojects.com/en/2.0.x/installation/#:~:text=Virtual%20environments¶) to create a virtual environment within the directory and to activate the environment. 
    - Note that you will only need to create a virtual environment once, but you will activate your virtual environment every time you work on or deploy the web application.
- Install the dependencies listed in `requirements.txt` within your virtual environment.
    - Note that you will only need to install the dependencies once.
- Check to be sure that your directory includes the following:
    - `application.py`
    - `finalModel.pkl`
    - `static`
        - `monday_curve_noisy.png`
        - `monday_curve.png`
        - `PredictionCurve.png`
        - `style.css` (webpage styling)
    - `templates`
        - `about.html` (html for 'about' view)
        - `base.html` (base html template for others to extend)
        - `cafeOccupancyPredictor.html` (html for 'project' view)
        - `predict.html` (html for prediction view)
    - `venv` (needed for localhost)
    - `requirements.txt` (not needed for local deployment)
- Execute the following commands in your terminal/command prompt in order to run the application (see [Run the Application](https://flask.palletsprojects.com/en/2.0.x/tutorial/factory/) section for help):
    - `<export/set> FLASK_ENV=development` (<Mac/Windows> commands)
    - `<export/set> FLASK_APP=application` (<Mac/Windows> commands)
    - `flask run`
- Visit the URL returned upon running the flask web application to see your predictive model in action.