# Running the code locally

## Trainer + Prediction API

### Train the model

``` bash
make run_locally
```

should create a model.joblib stored on the machine

### Start the API locally

``` bash
make run_api
```

Then go to the root entry point http://127.0.0.1:5000/

Generate an error with the prediction entry point without params http://127.0.0.1:5000/predict_fare

Call the API properlly with all parameters using the "api test notebook.ipynb" in the notebooks directory

## Website

### Start the website locally

Start the web server serving your website

``` bash
make run_website
```

Then go to http://localhost:8501

The prediction may take a while to be visible since the Prediction API targeted by the URL is hosted on Heroku and is probably sleeping

You may change the URL so the the website asks a prediction from your API

# Running the code in the cloud

## Trainer + Prediction API

### Train the model on AI Platform

You should probably replace in params.py the name of the bucket and make sure that you have a csv with the proper name at the proper location in your bucket:
* BUCKET_NAME = 'le-wagon-data'
* BUCKET_TRAIN_DATA_PATH = 'data/data_train_1k.csv'

Also in the Makefile:
* BUCKET_NAME=le-wagon-data


You can change the experiment name
* experiment_name = "[DE] [Berlin] [gmanchon] minimalistic trainer"

``` bash
make make gcp_submit_training
```

Then go https://console.cloud.google.com in order to verify that the job completed successfully and that your bucket contains the model.joblib

### Deploy the API on Heroku


``` bash
git init
git add .
git commit -m 'initial commit'
heroku create pred-api-492 --region eu
heroku ps:scale web=1
git push heroku master
```

Then go to the root entry point https://pred-api-492.herokuapp.com/

Generate an error with the prediction entry point without params https://pred-api-492.herokuapp.com/predict_fare

Call the API properlly with all parameters using the "api test notebook.ipynb" in the notebooks directory

Note: I encountered a weird error on prod which was fixed by fixing in the requirements.txt the versions of numpy==1.18.5 and scikit-learn==0.22. The fact that the error occured on my machine only was an indication that the version of the packages could be an issue. I retrieved the versions of my machine using `pip freeze G scikit`

## Website

### Deploy the website on Heroku

``` bash
git init
git add .
git commit -m 'initial commit'
heroku create website-492 --region eu
heroku ps:scale web=1
git push heroku master
```

Then go to https://website-492.herokuapp.com/

The prediction may take a while to be visible since the Prediction API targeted by the URL is hosted on Heroku and is probably sleeping


You may change the URL so the the website asks a prediction from your API

# Workflow


## gather some data

* access to several databases
* use open data
* use API & scraping
* ask your developper team for more data

## explore the data using a notebook

* clean the data
* do some exploratory dataviz
* use git as soon as possible (save your work)

## start to work with a notebook + a package

* start to package your code in python files
* working on a package allows to collaborate with teammates easier than a notebook
* using git allows to see code changes easier on python files than on a notebook (because of the json format and because the state of the cells is saved along their content)
* call your package from the notebook (on your machine using `pip install -e .`)

## start to do some training

* maybe train your first models on google colab, up to 10h
* either with all the code in the notebook (downside: no package for collaboration + code organisation/visibility + production deployment)
* or importing your package
  * option 1:
    * put your code on google drive + let the google drive app do its sync magic + import your code it from the notebook
    * nice in theory but I have not tested this option
  * option 2:
    * import the code from the notebook using `pip install git+https://github.com/yourlogin/yourepo`
    * more cumbersome : you need to push your code to github each time you modify it (but since you are performing a training it makes sense to save the code)

## do some serious training

* if you require more cpu or ram
* move your training code from the notebook to the package
* use the wagon-make-package-tool to generate the project
* or start from a data science project template from github
* use the AI Platform

## now you have a model

* or more models, and you stored the params used in order to train them and the performance metrics in MLFlow

## deploy your model to production

* first option : you only want to target internet users
* create a Website that will load your model
* if your package is small enough (500Mo), use Heroku that is so simple to use
* if not, use Google Cloud Run (the TAs will help you during the project week)



* second option: you only developer from other companies as users
* create a Prediction API
* deploy Heroku if package is small enough
* Google Cloud Run otherwise


* third option: you want to target both developers and internet users
* we suggest you split the projects
* Prediction API is deployed on Heroku if small enough or GCR
* Website calls the Prediction API and is deployed on Heroku


* fouth option: you target both developers and users
* but you want your user experience to be as smooth as possible
* the website and the prediction api both load the model and are deployed on Heroku or GCR

# Project 1 : Trainer + Prediction API

## Trainer

### .gitignore

This file allows to list all the files that we do not want to store in git: all temporary files that do not convey any value for the project

*/version.txt
*.pyc
*.swp
build/
dist/
.coverage
.ipynb_checkpoints
*.iml
groupama/data/base_pno.*
.DS_Store
.idea/
mlruns
*.egg-info


### Makefile

This file contains directives that can help us in order to remember the commands to launch in the command line

In [None]:

install_requirements:
	@pip install -r requirements.txt

test:
	@coverage run -m pytest tests/*.py
	@coverage report -m --omit=$(VIRTUAL_ENV)/lib/python*

clean:
	@rm -f */version.txt
	@rm -f .coverage
	@rm -fr */__pycache__ */*.pyc __pycache__
	@rm -fr build dist
	@rm -fr MinTrainer-*.dist-info
	@rm -fr MinTrainer.egg-info

install:
	@pip install . -U

all: clean install test black check_code

# ----------------------------------
#      MODEL DIRECTIVES
# ----------------------------------

run_locally:
	python -m MinTrainer.trainer

# bucket
BUCKET_NAME=le-wagon-data

# training folder
BUCKET_TRAINING_FOLDER=trainings

# training params
REGION=europe-west1

# app environment
PYTHON_VERSION=3.7
FRAMEWORK=scikit-learn
RUNTIME_VERSION=2.2

# package params
PACKAGE_NAME=MinTrainer
FILENAME=trainer

# pred
# PRED_FILENAME=predict

##### Job - - - - - - - - - - - - - - - - - - - - - - - - -

JOB_NAME=mintrainer_$(shell date +'%Y%m%d_%H%M%S')

gcp_submit_training:
	gcloud ai-platform jobs submit training ${JOB_NAME} \
		--job-dir gs://${BUCKET_NAME}/${BUCKET_TRAINING_FOLDER} \
		--package-path ${PACKAGE_NAME} \
		--module-name ${PACKAGE_NAME}.${FILENAME} \
		--python-version=${PYTHON_VERSION} \
		--runtime-version=${RUNTIME_VERSION} \
		--region ${REGION} \
		--stream-logs


### requirements.txt

This file allows to install the packages required by the project, both for your teammates, and so that the AI Platform and Heroku install the packages required in order to run your app

pip>=9
setuptools>=26
wheel>=0.29
pandas
pytest
coverage
flake8
black
yapf
python-gitlab
twine
six>=1.13.0
numpy
pandas
scikit-learn
joblib
memoized-property
mlflow
s3fs
gcsfs
google-cloud-storage
termcolor


### setup.py

This file instructs the AI Platform and Heroku how to install the package and its requirements

from setuptools import find_packages
from setuptools import setup

with open('requirements.txt') as f:
    content = f.readlines()
requirements = [x.strip() for x in content if 'git+' not in x]

setup(name='MinTrainer',
      version="1.0",
      description="Project Description",
      packages=find_packages(),
      test_suite='tests',
      scripts=['scripts/MinTrainer-run'],
      install_requires=requirements)


## Prediction API

### app.py

This file contains the api served by the Flask server

In [None]:

from flask import Flask, escape, request

import pandas as pd

import joblib

# create flask app
app = Flask(__name__)


@app.route('/')
def hello():
    # get param from http://127.0.0.1:5000/?name=value
    name = request.args.get("name", "World")
    return f'Hello, {escape(name)}!'


# @app.route('/toto')
# def hello():
#     return '''
#     <!DOCTYPE>
#     <html>
#         <head>
#             <title>My super page</title>
#         </head>
#         <body>
#             <div>
#                 This is a Le Wagon API site, please use rather the /predict_fare entry point
#                 <img src="https://dwj199mwkel52.cloudfront.net/assets/core/home/coding-school-that-cares-alumni-025e665def0e2f5a9a539cd2f8762fedbd4c5074a725ebed08570a5bdacc45f7.jpg">
#             </div>
#         </body>
#     </html>
#     '''


@app.route('/predict_fare', methods=['GET'])
def predict_fare():

    # get request arguments
    key = request.args.get('key')
    pickup_datetime = request.args.get('pickup_datetime')
    pickup_longitude = float(request.args.get('pickup_longitude'))
    pickup_latitude = float(request.args.get('pickup_latitude'))
    dropoff_longitude = float(request.args.get('dropoff_longitude'))
    dropoff_latitude = float(request.args.get('dropoff_latitude'))
    passenger_count = int(request.args.get('passenger_count'))

    # build X ⚠️ beware to the order of the parameters ⚠️
    X = pd.DataFrame({
        "Unnamed: 0": [0],  # These are not used by the model
        "key": [key],  # but they are required by the pipeline as it is coded
        "pickup_datetime": [pickup_datetime],
        "pickup_longitude": [pickup_longitude],
        "pickup_latitude": [pickup_latitude],
        "dropoff_longitude": [dropoff_longitude],
        "dropoff_latitude": [dropoff_latitude],
        "passenger_count": [passenger_count]})

    # print(X_test.dtypes)

    # TODO: get model from GCP

    # pipeline = get_model_from_gcp()
    pipeline = joblib.load('model.joblib')

    # make prediction
    results = pipeline.predict(X)

    # convert response from numpy to python type
    pred = float(results[0])

    return dict(
        prediction=pred)


if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5000, debug=True)


### api_test.py

This code can be place in a notebook or executed from the command line and allows to test the api

In [None]:
import requests

url = "http://127.0.0.1:5000/predict_fare"
# url = "https://pred-492.herokuapp.com/predict_fare"

params = dict(
    key="2012-10-06%2012:10:20.0000001",  # this is unused by the model
    pickup_datetime="2012-10-06 12:10:20 UTC",
    pickup_longitude=40.7614327,
    pickup_latitude=-73.9798156,
    dropoff_longitude=40.6513111,
    dropoff_latitude=-73.8803331,
    
    passenger_count=2)

requests.get(url, params=params).json()


### Procfile

This file allows to tell Heroku what command to run in order to start our Flask server running the API

web: python -m flask run --host=0.0.0.0 --port $PORT


# Project 2 : Website using prediction API

## Prediction website

### app.py

This is the entry point of the project, which is ran by streamlit in order to run the website

In [None]:

import streamlit as st

import requests

import datetime

# retrieve prediction parameters

"# Manhattan ride parameters"

"Select a date and time"

date = st.date_input("Select a pickup date", datetime.date(2012, 10, 6))

time = st.time_input("Select a pickup time", datetime.time(8, 45))

pickup_datetime = f"{date} {time} UTC"  # "2012-10-06%2012:10:20%20UTC"

"Select a pickup location"

pickup_longitude = st.number_input("Pickup longitude", value=40.7614327)
pickup_latitude = st.number_input("Pickup latitude", value=-73.9798156)

"Select a dropoff location"

dropoff_longitude = st.number_input("Dropoff longitude", value=40.6513111)
dropoff_latitude = st.number_input("Dropoff latitude", value=-73.8803331)

"Select a passenger count"

passenger_count = st.slider("Passenger count", 1, 10, 3)

# request prediction from api

url = "https://taxifaremodelapi.herokuapp.com/predict_fare"

params = dict(
    key="2012-10-06%2012:10:20.0000001",  # this is unused by the model
    pickup_datetime=pickup_datetime,
    pickup_longitude=pickup_longitude,
    pickup_latitude=pickup_latitude,
    dropoff_longitude=dropoff_longitude,
    dropoff_latitude=dropoff_latitude,
    passenger_count=passenger_count)

response = requests.get(url, params=params).json()

prediction = response['prediction']

# display response to user

f"Predicted ride cost: {prediction}"


### Makefile

This file lists directive that you can run from the command line, for example `make run_website` which launches the web server serving your website


run_website:
	streamlit run app.py


### MANIFEST.in

This file makes sure list the files that will be include in the package along with all the .py files once it is received by Heroku

include requirements.txt


### Procfile

This file tells Heroku what command to run in order to run your project
Here we run both setup.sh and the streamlit in order to start the web server serving your website to users

web: sh setup.sh && streamlit run app.py


### README.md

This is just an informative file used by GitHub in order to display information about your project

In [None]:
# Usage

``` bash
make run_website              # launch website
streamlit run app.py          # launch website
```

### requirements.txt

This file lists the package required in order to run the project
Your teammates can `pip install -r requirements.txt` in order to install all the packages when they clone your project
This file is also used by setup.py in order to install all the package on Heroku

streamlit
requests
datetime


### setup.py

This file is used by Heroku in order to install the package
The file will list the contents of requirements.txt and install each included package
It will also make sure that the files listed in MANIFEST.in are included in the delivered package, otherwise only .py files will be included

from setuptools import setup, find_packages

with open("requirements.txt") as f:
    content = f.readlines()
requirements = [x.strip() for x in content]

setup(name="taxifare prediction website",
      version="1.0",
      description="package description",
      packages=find_packages(),
      include_package_data=True,  # includes in package files from MANIFEST.in
      install_requires=requirements)


### setup.sh

This file is used by Heroku in order to indicate to Streamlit the $PORT on which to run

mkdir -p ~/.streamlit/

echo "\
[general]\n\
email = \"${HEROKU_EMAIL_ADDRESS}\"\n\
" > ~/.streamlit/credentials.toml

echo "\
[server]\n\
headless = true\n\
enableCORS = false\n\
port = $PORT\n\
" > ~/.streamlit/config.toml
