# Diabetes Predictive Model: Development and Deployment
### Mohammed Mahyoub 
This Jupyter Notebook is a companion lab for a guest lecture about machine learning deployment at University of Alabama at Birmingham, Alabama, USA. 

### Install required packages for the project environment 

In [None]:
#installing packages 
%pip install pandas scipy numpy scikit-learn joblib flasgger flask

In [None]:
import os 
parent_folder = os.getcwd()

### Get data 

We will use a diabetes dataset from Microsoft Machine Learning data repository. 

1. Import data.

In [None]:
import pandas as pd
dataset = pd.read_csv('https://aka.ms/diabetes-data')
dataset.drop('PatientID', axis = 1, inplace = True)
dataset.head()

2. Explore data

In [None]:
dataset.info()

In [None]:
dataset['Diabetic'].value_counts()

3. Sample a subset of the data for deployment testing


In [None]:
deployment_dataset = dataset.sample(n = 100)
deployment_dataset.drop('Diabetic', axis = 'columns', inplace = True)
deployment_dataset.to_csv('deployment_dataset.csv', index = False)

4. Save dev dataset to current directory

In [None]:
dataset.to_csv('diabetes.csv', index = False)

### Model Development

1. Development folder

In [None]:
import os 
import shutil

dev_folder = 'Development'
os.makedirs(dev_folder, exist_ok = True)

shutil.copy( os.path.join(parent_folder, 'diabetes.csv'), os.path.join(parent_folder, dev_folder, 'diabetes.csv'))



2. Training script

In [None]:
%%writefile $parent_folder/$dev_folder/train_diabetes.py

# Packages 
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, accuracy_score, f1_score, precision_score, recall_score
import joblib
import json


# Dataset
dataset = pd.read_csv('./diabetes.csv')
X = dataset.drop('Diabetic', axis = 'columns')
y = dataset['Diabetic']

# Preprocessing 
numeric_features = list(range(X.shape[1]))
numeric_transformer = Pipeline(steps=[('scaler', MinMaxScaler())])

preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_features)])

# Training 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('randomforest', RandomForestClassifier())])
model = pipeline.fit(X_train, y_train)
y_hat = model.predict(X_test)
model_performance = {'Description': 'Diabetes prediction model. Version 1.',
                     'Author': 'Mohammed Mahyoub',
                     'Metrics': {'Accuracy': round(accuracy_score(y_test,y_hat), 2),
                     'AUC': round(roc_auc_score(y_test, model.predict_proba(X_test)[:,1]), 2),
                     'f1-score': round(f1_score(y_test, y_hat), 2),
                     'Precision': round(precision_score(y_test, y_hat), 2),
                     'Recall': round(recall_score(y_test, y_hat), 2)}
                        }
perf_file = open('./model_performance.json', 'w')
json.dump(model_performance, perf_file, indent = 4)                                 
perf_file.close()
# Save trained pipeline
joblib.dump(model, './diabetes-predict.pkl')

3. Run training script

In [None]:

os.chdir(os.path.join(parent_folder, dev_folder))
%run train_diabetes.py 

In [None]:
os.chdir(parent_folder)

### Model Deployment -  Flask and Swagger

1. Deployment folder

In [None]:
import os 
import shutil

dep_folder = 'Deployment'
os.makedirs(dep_folder, exist_ok = True)

model_artificat_loc = os.path.join(parent_folder, dev_folder, 'diabetes-predict.pkl')
test_dataset_loc = os.path.join(parent_folder, 'deployment_dataset.csv')

shutil.copy(model_artificat_loc, os.path.join(parent_folder, dep_folder, 'diabetes-predict.pkl'))
shutil.copy(test_dataset_loc, os.path.join(parent_folder, dep_folder, 'test_dataset.csv'))

2. Deployment code: API and scoring

In [None]:
%%writefile $parent_folder/$dep_folder/deploy_diabetes.py

# Packages 
from flask import Flask, request
import numpy as np
import pandas as pd
import joblib
import flasgger 
from flasgger import Swagger


# Create app and wrap it with Swagger framework
app = Flask(__name__)
Swagger(app)

# Load model
model = joblib.load('./diabetes-predict.pkl')

# On-demand prediction 
@app.route('/predict', methods = ["Get"])
def ondemand_predict():
    """
    Endpoint for ondemand diabetes prediction. Single entry.
    ---
    parameters:
        - name: Pregnancies
          in: query
          type: number
          required: true
        - name: Plasma Glucose
          in: query
          type: number
          required: true  
        - name: Diastolic Blood Pressure
          in: query
          type: number
          required: true
        - name: Triceps Thickness
          in: query
          type: number
          required: true
        - name: Serum Insulin
          in: query
          type: number
          required: true
        - name: BMI
          in: query
          type: number
          required: true
        - name: Diabetes Pedigree 
          in: query
          type: number
          required: true
        - name: Age
          in: query
          type: number
          required: true
    responses:
        500:
            description: "Prediction"

    """
    Pregnancies = float(request.args.get("Pregnancies"))
    PlasmaGlucose = float(request.args.get("Plasma Glucose"))
    DiastolicBloodPressure = float(request.args.get("Diastolic Blood Pressure"))
    TricepsThickness = float(request.args.get("Triceps Thickness"))
    SerumInsulin = float(request.args.get("Serum Insulin"))
    BMI = float(request.args.get("BMI"))
    DiabetesPedigree = float(request.args.get("Diabetes Pedigree"))
    Age = float(request.args.get("Age"))

    input_features = np.array([[
        Pregnancies,
        PlasmaGlucose,
        DiastolicBloodPressure,
        TricepsThickness,
        SerumInsulin,
        BMI,
        DiabetesPedigree,
        Age]])

    prediction = model.predict(input_features)

    labels = {0: 'Non-Diabetic', 1: 'Diabetic'}

    return labels[prediction[0]]

@app.route('/predict_batch', methods = ["Post"])
def batch_predict():
  """
  Endpoint for batch prediction. Batch of patients.
  ---
  parameters:
    - name: file
      in: formData
      type: file
      required: true
  responses:
      500:
        description: "Batch Prediction"
  """

  batch_df = pd.read_csv(request.files.get("file"))
  predictions = model.predict(batch_df)
  labels = {0: 'Non-Diabetic', 1: 'Diabetic'}
  predictions = [labels[p] for p in predictions]
  
  return str(predictions)



if __name__ == '__main__':
    app.run(debug = True, host = '0.0.0.0', port = 80)


3. Run deployment script
> Note: Go to the suggested link and add /apidocs to access the web app. 

In [None]:
os.chdir(os.path.join(parent_folder, dep_folder))
%run deploy_diabetes.py

In [None]:
os.chdir(parent_folder)

### Model Deployment - Docker Container 

1. Requirements (packages)

In [None]:
%pip freeze > $parent_folder/$dep_folder/requirements.txt

2. Dockerfile

In [None]:
%%writefile $parent_folder/$dep_folder/Dockerfile 

FROM python:3.9
COPY . usr/webapp/diabetesapp
EXPOSE 80
WORKDIR usr/webapp/diabetesapp
RUN pip install -r requirements.txt 
CMD python deploy_diabetes.py



3. Build image

>Note: for Windows you can use the terminal or CMD command. 

In [None]:
%%bash
docker build -t diabetes_webapp ./Deployment

4. Run container

>Note: Add /apidocs to the url provider during running the docker image. 

In [None]:
%%bash
docker container run -p 80:80 diabetes_webapp

In [None]:
%%bash
docker container ps

In [None]:
%%bash
docker stop distracted_tesla   # Change the name of the container, eatch time will be different. Get correct name from the NAMES option above. 

# Model Deployment in the Cloud

We will use Microsoft Azure to build our image and register the container in Azure Container Registry. Then, we will be building a webapp uisng Azure APP services. Similar approach could be used with other cloud computation providers (GCP, AWS, etc.). Usually, we would use Azure Machine Learning to build and deploy our ML models as managed real-time or batch endpoints. 

This part assumes that you have a functioning Azure subscription. 

1. Log in to Azure portal

In [None]:
%%bash
az login 

2. Create a resource group for the project

In [None]:
%%bash 
az group create -l eastus -n rg-ml

3. Create a container registry

In [None]:
%%bash
az acr create -n crmldeployment -g rg-ml --admin-enabled true --sku Standard 

4. Build the docker image from the artificats saved in the Deployment folder

In [None]:
%%bash
az acr build -t diabeteswebappcr:{{.Run.ID}} -r crmldeployment ./Deployment 

5. List the repositories and tags in the container registry

In [None]:
%%bash 
az acr repository list -n crmldeployment

In [None]:
%%bash
az acr repository show-tags -n crmldeployment --repository diabeteswebappcr

6. Create a webapp plan. We will create the free plan for the sake of this tutorial

In [None]:
%%bash
az appservice plan create -g rg-ml -n depplan --is-linux --sku F1

7. Create the webapp to be hosted on Azure

In [None]:
%%bash 
az webapp create -g rg-ml -p depplan -n diabeteswebappmm -i crmldeployment.azurecr.io/diabeteswebappcr:ca1  

8. Access the webapp
> Note: You need to add "/apidocs" to the url to access the Swagger API. Initially, you will get an access error so you need to add the "/apidocs" similar to what we did on premise. 

In [None]:
%%bash
az webapp browse --name diabeteswebappmm -g rg-ml

9. Delete the resource group to cleanup resources. I am doing this step because I create the webapp for the sake of illustration and don`t want to incur costs beyond this lecutre. In real scenario, this step will destroy all your efforts (Be Careful!).

In [None]:
%%bash
az group delete -g rg-ml --no-wait --yes --force-deletion-types Microsoft.Compute/virtualMachines

<h1 align='center'> Thank you! </h1>