# Week 5 Notes

## 5.1 [Intro / Session overview](github.com/kemaldahha/machine-learning-course/blob/main/01-intro.md)


This week we will take the churn prediction model from week 4 that we have inside a Jupyter notebook and deploy it. We will save the model and use it.

We have the Jupter notebook with the model. The model is saved. We will create a churn service with the model. The model can be served in a web service and interacted with. 

The churn prediction model will be saved and be put inside a web service using Flask (framework for creating a web framework in Python). We will isolate the dependencies for this web service so they do not interfere with other services on our machine. To this end we'll create a special environment for Python dependencies using Pipenv. Then we add another layer on top with system dependencies, using Docker. This is then deployed in the cloud (AWS Elastic Beanstalk).

<img src=architecture.png width=600>


## 5.2 [Saving and loading the model](github.com/kemaldahha/machine-learning-course/blob/main/02-pickle.md)


Here is the code from the last week. Right now, it lives in this Jupyter Notebook and we cannot put it in a web service.

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split, KFold
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [2]:
df = pd.read_csv("data-week-3.csv")

df.columns = df.columns.str.lower().str.replace(" ", "_")

categorical_columns = list(df.dtypes[df.dtypes == "object"].index)

for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(" ", "_")

df.totalcharges = pd.to_numeric(df.totalcharges, errors="coerce")
df.totalcharges = df.totalcharges.fillna(0)

df.churn = (df.churn == "yes").astype(int)

In [3]:
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=1)

In [4]:
numerical = ["tenure", "monthlycharges", "totalcharges"]

In [6]:
categorical = [
    'gender',
    "seniorcitizen",
    'partner',
    'dependents',
    'phoneservice',
    'multiplelines',
    'internetservice',
    'onlinesecurity',
    'onlinebackup',
    'deviceprotection',
    'techsupport',
    'streamingtv',
    'streamingmovies',
    'contract',
    'paperlessbilling',
    'paymentmethod',
]

In [16]:
def train(df_train, y_train, C=1.0):
    dicts = df_train[categorical + numerical].to_dict(orient="records")
    
    dv = DictVectorizer(sparse=False)
    X_train = dv.fit_transform(dicts)
    
    model = LogisticRegression(C=C, max_iter=1000)
    model.fit(X_train, y_train)

    return dv, model

In [17]:
def predict(df, dv, model):
    dicts = df[categorical + numerical].to_dict(orient="records")

    X = dv.transform(dicts)
    y_pred = model.predict_proba(X)[:, 1]

    return y_pred

In [18]:
C = 1.0
n_splits = 5

In [19]:
kfold = KFold(n_splits=n_splits, shuffle=True, random_state=1)

scores = []

for train_idx, val_idx in kfold.split(df_full_train):
    df_train = df_full_train.iloc[train_idx]
    df_val = df_full_train.iloc[val_idx]

    y_train = df_train.churn.values
    y_val = df_val.churn.values

    dv, model = train(df_train, y_train, C=C)
    y_pred = predict(df_val, dv, model)

    auc = roc_auc_score(y_val, y_pred)
    scores.append(auc)

print(f"C={C} {np.mean(scores):.3f} += {np.std(scores):.3f}")


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

C=1.0 0.842 += 0.007


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [20]:
scores

[np.float64(0.8446632807655171),
 np.float64(0.8452295225797907),
 np.float64(0.833257074051776),
 np.float64(0.8346889588795804),
 np.float64(0.8517617897147877)]

We will save the model using `pickle`:

In [22]:
import pickle

In [24]:
output_file = f"model_C={C}.bin"

In [26]:
f_out = open(output_file, "wb")
pickle.dump((dv, model), f_out)
f_out.close()

In [28]:
with open(output_file, "wb") as f_out:
    pickle.dump((dv, model), f_out)

This is how we can load the model

In [43]:
input_file = "model_C=1.0.bin"

In [44]:
with open(output_file, "rb") as f_in:
    dv, model = pickle.load(f_in)

In [45]:
model

Now let's say we have this customer:

In [61]:
customer = {
    'gender': 'male',
    'seniorcitizen': 0,
    'partner': 'yes',
    'dependents': 'yes',
    'phoneservice': 'yes',
    'multiplelines': 'no',
    'internetservice': 'no',
    'onlinesecurity': 'no_internet_service',
    'onlinebackup': 'no_internet_service',
    'deviceprotection': 'no_internet_service',
    'techsupport': 'no_internet_service',
    'streamingtv': 'no_internet_service',
    'streamingmovies': 'no_internet_service',
    'contract': 'two_year',
    'paperlessbilling': 'no',
    'paymentmethod': 'mailed_check',
    'tenure': 12,
    'monthlycharges': 19.7,
    'totalcharges': 258.35
 }

We can predict churn as follows:

In [62]:
X = dv.transform([customer])
model.predict_proba(X)[0, 1]


np.float64(0.02597332043593159)

So now we can save a model, load it, use it. But we want to have it in a separate python file. We can create [train.py](train.py) and [predict.py](predict.py) and run them with Python. Next we will create a web service which uses these files.


## 5.3 [Web services: introduction to Flask](github.com/kemaldahha/machine-learning-course/blob/main/03-flask-intro.md)


Flask is a Python framework for creating web services. We want to encapsulate our model inside a web service called 'churn service'. 

Web service is a method for 2 devices to communicate over a network. Some information is sent to the web service with a request, then information is sent back by the web service. So we will send information on a customer and the web service will return a churn prediction.

We can create a file called `ping.py`:

```python
from flask import Flask

app = Flask("ping")

@app.route("/ping", methods=["GET"])
def ping():
    return "PONG"

if __name__=="__main__":
    app.run(debug=True, host="0.0.0.0", port="9696")
```

If we run this file, we can go to the browser or cmd with curl and type `127.0.0.1:9696/ping` or `localhost:9696/ping` and it will return `"PONG"`. This is a minimal example of creating a web service using Flask.

We can also use FastAPI as follows:
```python
from fastapi import FastAPI

app = FastAPI()

@app.get("/ping")
async def ping():
    return "PONG"

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9696)
```


## 5.4 [Serving the churn model with Flask](github.com/kemaldahha/machine-learning-course/blob/main/04-flask-deployment.md)



## 5.5 [Python virtual environment: Pipenv](github.com/kemaldahha/machine-learning-course/blob/main/05-pipenv.md)



## 5.6 [Environment management: Docker](github.com/kemaldahha/machine-learning-course/blob/main/06-docker.md)



## 5.7 [Deployment to the cloud: AWS Elastic Beanstalk (optional)](github.com/kemaldahha/machine-learning-course/blob/main/07-aws-eb.md)



## 5.8 [Summary](github.com/kemaldahha/machine-learning-course/blob/main/08-summary.md)


No notes.


## 5.9 [Explore more](github.com/kemaldahha/machine-learning-course/blob/main/09-explore-more.md)



## 5.10 [Homework](github.com/kemaldahha/machine-learning-course/blob/main/homework.md)

Go to [week_5_notes.ipynb](week_5_notes.ipynb)