# Group 1 Advanced Python Group Project Explanation Notebook
This document explains step by step the actions taken in order to develop the first part of the Group Project: Flask. The overall structure of the notebook is a little explanation along with the code

GitHub Repo: https://github.com/felixhommels/mcsbt-adv-python-gp

Pythonanywhere Link: flizerflix.pythonanywhere.com

### Developing the Machine Learning Model

After having found the dataset, the first step was to import all the libraries used for the model. 

In [1]:
#From model.py
import os
import pickle

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

 - After that since we are trying to make a machine learning model, we first had to load the data and drop colums, which dont add value to the model - in this case the Order_ID. 
- In order to make sure that we dont train on empty data, we also dropped all empty datapoints.
- Since the dataset is working with categorical data and the aim was to use a linear regression, we used one-hot encoding or "dummies" to transform the data such that the Linear Regression would work with the dataset
- Next we split the data into X and Y 
- Lastly we used scikitlearn to split the data into training and testing data 



In [2]:
#From model.py

data = pd.read_csv("data/Food_Delivery_times.csv")

data.drop(columns=["Order_ID"], inplace=True)
data.dropna(inplace=True)

#We have some categorical variables - for linear regression we need to convert them to numerical variables
data = pd.get_dummies(data)

X = data.drop(columns=["Delivery_Time_min"])
Y = data["Delivery_Time_min"]

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=42)

- Next we created a Linear Regression and fit the model to our X and Y training data respectively
- In order to see what happened "under the hood", we examined the coefficients of each variable to better understand which variables were key for the prediction
- We printed the feature name and coefficient respectively

In [None]:
#From model.py

model = LinearRegression()

model.fit(X_train, Y_train)

#During development wanted to see the coefficients of the different features
coefficients = model.coef_
feature_names = X.columns

for feature, coef in zip(feature_names, coefficients):
    print(f"{feature}: {coef}")

- Next we made the model predict Y based on our X testdata
- We then measured the performance using mean squred error and R^2
- During development, the R^2 of the model was approx. 0.83 which is an acceptable score
- Lastly we saved the model into a pickle file which we could use in our Flask backend

In [None]:
#Predicting the test set results
Y_pred = model.predict(X_test)

mse = mean_squared_error(Y_test, Y_pred)
r2 = r2_score(Y_test, Y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R2 Score: {r2}") #During testing was approx .83 which is acceptable

#Saving the model to pickle file
pickle.dump(model, open("food_delivery_model.pkl", "wb"))

### Developing the Backend Routes & Templates for the Home Route and Predict Route

First we needed to import all of the necessary libraries for a functioning backend

In [None]:
#From app.py

from flask import Flask, request, jsonify, render_template
import pickle
import numpy as np
import pandas as pd
import os
import subprocess

- We then created variables which are responsible for setting the correct directories for the base file, pickle model and data model later used in the backend - this is vital for successful deployment later
- We then created an app using Flask and configured it as DEBUG=True which updates the code "live" when changes are made while running locally

In [None]:
#From app.py

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
MODEL_PATH = os.path.join(BASE_DIR, "models", "food_delivery_model.pkl")
DATA_PATH = os.path.join(BASE_DIR, "data", "Food_Delivery_Times.csv")

app = Flask(__name__)
app.config["DEBUG"] = True

- The home route was then created
- The home route simply renders an HTML document in the front-end which gives some information on the endpoints such as: which route accepts which input parameters (query, path, body)
- For the HTML code, please visit the repository. 

In [None]:
#From app.py

@app.route("/")
def home():
    return render_template("welcome.html")

Next, the prediction route was created. **This route accepts the parameters within the request body.** Note that this route is accessable through a template rendering front end as well as simple URL with a body - we wanted to keep the flexibility to have both. Lets dive deeper in the code:
- This route accepts both GET and POST (where GET only renders the front-end form for the user to use and POST is used by the form as well as the URL to send the input data to the prediction model)
- Next, the function checks if the request is post and determines if the front end was used or through a normal json body 
- All of the input parameters are retrieved from the data
- The input data is formatted accordingly such that it can be loaded into a dataframe
- The input dataframe (pandas), is reindexed so that the columns of the input_df match exactly with the ones the model was trained on
- The prediction is then made
- Since the output display differs based on if the front end was used or a json, the model determines whether to return a jsonify object or whether to render the "output" template
- Again for the HTML, please visit the repo

In [None]:
#From app.py

#Body parameter route
@app.route("/predict", methods=["GET", "POST"])
def predict():
    if request.method == "POST":
        model = pickle.load(open(MODEL_PATH, "rb"))
        
        if request.is_json:
            data = request.get_json()
        else:
            data = request.form

        distance_km = data.get('Distance_km')
        weather_clear = data.get('Weather_Clear')
        weather_foggy = data.get('Weather_Foggy')
        weather_rainy = data.get('Weather_Rainy')
        weather_snowy = data.get('Weather_Snowy')
        weather_windy = data.get('Weather_Windy')
        traffic_level_low = data.get('Traffic_Level_Low')
        traffic_level_medium = data.get('Traffic_Level_Medium')
        traffic_level_high = data.get('Traffic_Level_High')
        time_of_day_afternoon = data.get('Time_of_Day_Afternoon')
        time_of_day_evening = data.get('Time_of_Day_Evening')
        time_of_day_morning = data.get('Time_of_Day_Morning')
        time_of_day_night = data.get('Time_of_Day_Night')
        vehicle_type_bike = data.get('Vehicle_Type_Bike')
        vehicle_type_car = data.get('Vehicle_Type_Car')
        vehicle_type_scooter = data.get('Vehicle_Type_Scooter')
        preparation_time_min = data.get('Preparation_Time_min')
        courier_experience_yrs = data.get('Courier_Experience_yrs')

        input_data = {
            'Distance_km': [distance_km],
            'Weather_Clear': [weather_clear],
            'Weather_Foggy': [weather_foggy],
            'Weather_Rainy': [weather_rainy],
            'Weather_Snowy': [weather_snowy],
            'Weather_Windy': [weather_windy],
            'Traffic_Level_Low': [traffic_level_low],
            'Traffic_Level_Medium': [traffic_level_medium],
            'Traffic_Level_High': [traffic_level_high],
            'Time_of_Day_Afternoon': [time_of_day_afternoon],
            'Time_of_Day_Evening': [time_of_day_evening],
            'Time_of_Day_Morning': [time_of_day_morning],
            'Time_of_Day_Night': [time_of_day_night],
            'Vehicle_Type_Bike': [vehicle_type_bike],
            'Vehicle_Type_Car': [vehicle_type_car],
            'Vehicle_Type_Scooter': [vehicle_type_scooter],
            'Preparation_Time_min': [preparation_time_min],
            'Courier_Experience_yrs': [courier_experience_yrs]
        }

        input_df = pd.DataFrame(input_data)
        input_df = input_df.reindex(columns=model.feature_names_in_, fill_value=0)
        prediction = model.predict(input_df)

        if request.is_json:
            return jsonify({"predicted_delivery_time": round(prediction[0], 2)})
        else:
            return render_template("prediction.html", prediction=round(prediction[0], 2))

    return render_template("prediction.html", prediction=None)

Next the statistics route was created. **This route accepts query parameters.** Please note, that this route does not have a front-end and isnt using render template. Reason being that this is a GET only route and rendering the template requires a POST method. Lets dive into the code step-by-step:
- First, the function gets the arguments and then loads the datafile from which it will later calculate the statistics
- If courier_experience is given, it transforms the data into a float
- There are then a variety of functions nested, each responsible for calculating statistics about certain weather, vehicle type, courier_experience, time of day or traffic 
- Note that since courier_experience is a number, we had to create bins of ranges to compute statistics around that
- We instantiated a stats dictionary to accumulate statistics if there is more than one query parameter
- If statements invoke the relevant statistics functions depending on the query parameters and add them to the stats dictionary
- The stats get returned
- *Note that you cannot query the same category twice (e.g. weather = "clear" and weather = "foggy" at the same time since it cant be both at time of delivery in the data)*

In [None]:
#From app.py

@app.route("/statistics", methods=["GET"])
def statistics():
    # Get query parameters
    weather = request.args.get('weather')
    traffic = request.args.get('traffic')
    time_of_day = request.args.get('time_of_day')
    vehicle_type = request.args.get('vehicle_type')
    courier_experience = request.args.get('courier_experience')

    # Load the data
    data = pd.read_csv(DATA_PATH)

    if courier_experience is not None:
        try:
            courier_experience = float(courier_experience)
        except ValueError:
            return jsonify({"error": "Invalid value for courier_experience. Must be a number."}), 400

    def get_weather_stats(data, weather):
        if weather:
            data = data[data['Weather'] == weather.title()]
        return {
            "average_distance_km": data["Distance_km"].mean(),
            "average_preparation_time_min": data["Preparation_Time_min"].mean(),
            "average_courier_experience_yrs": data["Courier_Experience_yrs"].mean(),
            "average_delivery_time_min": data["Delivery_Time_min"].mean()
        }

    def get_traffic_stats(data, traffic):
        if traffic:
            data = data[data['Traffic_Level'] == traffic.title()]
        return {
            "average_distance_km": data["Distance_km"].mean(),
            "average_preparation_time_min": data["Preparation_Time_min"].mean(),
            "average_courier_experience_yrs": data["Courier_Experience_yrs"].mean(),
            "average_delivery_time_min": data["Delivery_Time_min"].mean()
        }

    def get_time_of_day_stats(data, time_of_day):
        if time_of_day:
            data = data[data['Time_of_Day'] == time_of_day.title()]
        return {
            "average_distance_km": data["Distance_km"].mean(),
            "average_preparation_time_min": data["Preparation_Time_min"].mean(),
            "average_courier_experience_yrs": data["Courier_Experience_yrs"].mean(),
            "average_delivery_time_min": data["Delivery_Time_min"].mean()
        }

    def get_vehicle_type_stats(data, vehicle_type):
        if vehicle_type:
            data = data[data['Vehicle_Type'] == vehicle_type.title()]
        return {
            "average_distance_km": data["Distance_km"].mean(),
            "average_preparation_time_min": data["Preparation_Time_min"].mean(),
            "average_courier_experience_yrs": data["Courier_Experience_yrs"].mean(),
            "average_delivery_time_min": data["Delivery_Time_min"].mean()
        }

    def get_courier_experience_stats(data, courier_experience):
        data["Courier_Experience_Group"] = pd.cut(data["Courier_Experience_yrs"], bins=[0, 1, 3, 5, 10], labels=["0-1", "1-3", "3-5", "5-10"])
        if courier_experience is not None:
            if 0 < courier_experience <= 1:
                experience_group = "0-1"
            elif 1 < courier_experience <= 3:
                experience_group = "1-3"
            elif 3 < courier_experience <= 5:
                experience_group = "3-5"
            elif 5 < courier_experience <= 10:
                experience_group = "5-10"
            else:
                return None

            data = data[data["Courier_Experience_Group"] == experience_group]
            return {
                "average_distance_km": data["Distance_km"].mean(),
                "average_preparation_time_min": data["Preparation_Time_min"].mean(),
                "average_delivery_time_min": data["Delivery_Time_min"].mean()
            }

    stats = {}

    if weather:
        w_stats = get_weather_stats(data, weather)
        stats[f"weather_stats_{weather}"] = w_stats
    if traffic:
        t_stats = get_traffic_stats(data, traffic)
        stats[f"traffic_stats_{traffic}"] = t_stats
    if time_of_day:
        tod_stats = get_time_of_day_stats(data, time_of_day)
        stats[f"time_of_day_stats_{time_of_day}"] = tod_stats
    if vehicle_type:
        v_stats = get_vehicle_type_stats(data, vehicle_type)
        stats[f"vehicle_type_stats_{vehicle_type}"] = v_stats
    if courier_experience is not None:
        c_stats = get_courier_experience_stats(data, courier_experience)
        stats[f"courier_experience_stats_{courier_experience}"] = c_stats

    return jsonify(stats)

Next we created the "Order ID" Route. **This is the path parameter route**. It allows users to fetch information of a certain order. 
- Only a GET route and expects the parameters in the path
- The data gets loaded and we filter the data based on the Order_ID which was provided in the path
- Lastly, we return a jsonify object with the information

In [None]:
#From app.py

#Path parameter route
@app.route("/data/<int:order_id>", methods=["GET"])
def data(order_id):
    data = pd.read_csv(DATA_PATH)
    data['Order_ID'] = data['Order_ID'].astype(int)
    return jsonify(data[data['Order_ID'] == order_id].to_dict(orient="records"))

### Webhook and CI/CD

Lastly, we created a webhook function which allows us to implement a CI/CD pipeline for Pythonanywhere deployment. It handles the incoming webhook notifications from GitHub and automatically pulls the changes from the repo when changes are made. The route only has a post method. More details:
- First we had to provide the paths to the repo in Pythonanywhere as well as the WSGI file
- It checks if the incoming request is json since webhook payloads are json format
- If its valid, it then extracts the repo name
- Then the function changes directory to where the repo and files are stored 
- The code then executes a git pull which pulls the changes from GitHub and "restarts" the WSGI by using the touch keyword

In [None]:
#From app.py

#Webhook route
@app.route("/webhook", methods=["POST"])
def webhook():
    path_repo = "/home/flizerflix/mcsbt-adv-python-gp"
    servidor_web = "/var/www/flizerflix_pythonanywhere_com_wsgi.py"

    if request.is_json:
        payload = request.json

        if "repository" in payload:
            repo_name = payload["repository"]["name"]

            try:
                os.chdir(path_repo)
            except FileNotFoundError:
                return jsonify({"message": "The directory of the repository does not exist!"}), 404

            try:
                subprocess.run(["git", "pull"], check=True)
                subprocess.run(["touch", servidor_web], check=True)
                return jsonify({"message": f"Successfully pulled latest changes for {repo_name}"}), 200
            except subprocess.CalledProcessError as e:
                return jsonify({"message": f"Git pull failed!", "error": str(e)}), 500
        else:
            return jsonify({"message": "No repository information in payload"}), 400
    else:
        return jsonify({"message": "Invalid request, expected JSON"}), 400

Below you can see that the last two pushes were successful!

   ![Alt text](webhook_proof.jpg)