## MLFLOW MODEL DEPLOYMENT 

**The Goal**:
    Deploy a xgboost MLflow model into Docker

## Data preprocessing

In [1]:
# Importing dataset

import warnings
warnings.filterwarnings('ignore')

# importing the libraries
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBRegressor
import mlflow.xgboost
import requests
import mlflow

In [2]:
#importing data
data = pd.read_csv('C:\MINE\DATA SCIENCE\my datasets\insurance.csv')
data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [3]:
data.shape

(1338, 7)

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


In [5]:
# list the categorical variables
car_var=data.select_dtypes(include=['object']).columns

In [6]:
# handle categorical data sex using label encoder

labelencoder_X = LabelEncoder()
data['sex'] = labelencoder_X.fit_transform(data['sex'])
data['region'] = labelencoder_X.fit_transform(data['region'])
data['smoker'] = labelencoder_X.fit_transform(data['smoker'])

In [8]:
## Getting the correlation of all the features with target variable(charges). 
(data.corr()**2)["charges"].sort_values(ascending = False)[1:]

smoker      0.619765
age         0.089406
bmi         0.039339
children    0.004624
sex         0.003282
region      0.000039
Name: charges, dtype: float64

Age,smoker and Children are the top three variables significant to our target variable. 
my observstion matches the correlation(This don't work all the time BE CAREFUL!)

## ML FLOW

In [9]:
# Defining an MLflow experiment
try:
    mlflow.create_experiment("insurance charges 1")
    experiment = mlflow.get_experiment_by_name("insurance charges 1")
except:
    experiment = mlflow.set_experiment("insurance charges 1")

In [10]:
 print(experiment)

<Experiment: artifact_location='file:///C:/Users/Alimat%20sadia/Mlops-projects/P03/mlruns/1', experiment_id='1', lifecycle_stage='active', name='insurance charges 1', tags={}>


In [11]:
exp_id = experiment.experiment_id

In [12]:
# splitting data into target and independant variables
x = data.drop('charges',axis=1)
y = data['charges']

In [13]:
# splitting the dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

In [14]:
# define the parameter values
paramters = {"n_estimators": [50, 100, 200, 500],
             "max_depth": [5, 7, 10]}

##  XGBoost Model

In [15]:
#Logging experiemnet manually

with mlflow.start_run(experiment_id=exp_id, run_name=" Parent XGBoostRegressor", nested=True):
    for estimator_param in paramters['n_estimators']:
        for mdepth_param in paramters['max_depth']:
            with mlflow.start_run(experiment_id=exp_id, run_name=" Child XGBoostRegressor", nested=True):
                xgb_model = XGBRegressor(max_depth=mdepth_param, n_estimators=estimator_param)
                xgb_model.fit(X_train, y_train)

                y_pred = xgb_model.predict(X_test)

                # log the parameters
                mlflow.log_param("n_estimator", estimator_param)
                mlflow.log_param("max depth", mdepth_param)

                # log the R2 score
                mlflow.log_metric("R2", r2_score(y_test, y_pred))

                # Logging training data
                mlflow.log_artifact(local_path='C:\MINE\DATA SCIENCE\my datasets\insurance.csv')
                # Logging training code
                mlflow.log_artifact(local_path=r'C:\Users\Alimat sadia\Mlops-projects\P03\insurance-charges.py')

                # saving model
                mlflow.xgboost.log_model(xgb_model, 'XGBModel')

The code above will create a new folder name named **mlruns** which contains all artifacts,parameters,metrics of each model,then you can visualize the performance of each model by running *mlflow ui* on your terminal.<br>
Now Copy the **XGBmodel** folder of your prefered model performance into the parent path. 

## Serving With REST APIs & Docker

In [16]:
# load your best model
model = mlflow.xgboost.load_model("XGBModel")

 Run in your conda command line "**mlflow models serve -m XGBModel/**"
 After a successfull running, the webserver will be available at http://127.0.0.1:5000

In [17]:
## Testing our model on the top 2 data row
test_df = X_test.head(2)
xgb_model.predict(test_df)

array([8752.548, 8196.461], dtype=float32)

In [18]:
test_json = test_df.to_json(orient='split')
test_json

'{"columns":["age","sex","bmi","children","smoker","region"],"index":[578,610],"data":[[52,1,30.2,1,0,3],[47,0,29.37,1,0,2]]}'

In [19]:
# testing model on post request

result = requests.post(url="http://127.0.0.1:5000/invocations",
                       data=test_json,
                       headers={'Content-Type':'application/json'})

In [21]:
result.json()

[13028.3466796875, 10437.822265625]

Great !

## Deploy the api as a docker container

To deploy mlflow model into docker  run this command :<br>
     **mlflow models build-docker -m "runs:/ab61c11a19d54f93894ddd5ed0a8e431/XGBModel/" -n "insurance-app"**<br>
with ab61c11a19d54f93894ddd5ed0a8e431 corresponding to your preferred model uuid. <br>
This will create an image with the name "insurance app" on your docker dashboard.<br>
You can containerize the REST API using the command:  **docker run -ip 8000:8080 <image_id>** <br>
Now the model is available at http://127.0.0.1:8000

In [22]:
# testing our deployed model

result = requests.post(url="http://127.0.0.1:8000/invocations",
                       data=test_json,
                       headers={'Content-Type':'application/json'})

result.json()

[13028.3466796875, 10437.822265625]

Congratulations !! now you know how to deplaoy an MLflow model into Docker.