# Step 2: Model Building & Evaluation
Using the training and test data sets we constructed in the `Code/1_data_ingestion_and_preparation.ipynb` Jupyter notebook, this notebook builds an XGBoost for similar scenerio described at [Predictive Maintenance Template](https://gallery.cortanaintelligence.com/Collection/Predictive-Maintenance-Template-3) to predict failure in aircraft engines. We will store the model for deployment in an Azure web service which we build in the bottom section of this notebook.


Dataset derived from:
https://data.nasa.gov/Aerospace/CMAPSS-Jet-Engine-Simulated-Data/ff5v-kuh6/data


In [None]:

# Ensure you have the dependencies for this notebook
#%pip install -r xgboost_classification_mlflow.txt

In [None]:
import time
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import confusion_matrix, recall_score, precision_score, accuracy_score
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import ConfusionMatrixDisplay
import xgboost as xgb
import itertools
import random
import string
import json
import sklearn

from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

import pickle

import mlflow
import mlflow.keras
from mlflow.deployments import get_deploy_client

import math 
import warnings


In [None]:

warnings.simplefilter("ignore")

In [None]:

# These file names detail the data files. 
TRAIN_DATA = 'PM_train_files.pkl'
TEST_DATA = 'PM_test_files.pkl'

# We'll serialize the model in json format
LSTM_MODEL = 'modellstm.json'

# and store the weights in h5
MODEL_WEIGHTS = 'modellstm.h5'

In [None]:
train_df = pd.read_pickle(TRAIN_DATA)
train_df.head(10)

In [None]:
test_df = pd.read_pickle(TEST_DATA)

test_df.head(10)

In [None]:
y_train = train_df[["label1"]]
X_train = train_df.drop(["RUL","label1","label2","id"],axis=1)
#y_train.head()
X_train.head()

In [None]:
y_test = test_df[["label1"]]
X_test = test_df.drop(["RUL","label1","label2","id"],axis=1)
#y_test.head()
X_test.head()

In [None]:
%%time
regressor=xgb.XGBClassifier(eval_metric='logloss')

#=========================================================================
# exhaustively search for the optimal hyperparameters
#=========================================================================
from sklearn.model_selection import GridSearchCV

# set up our search grid
param_grid = {"max_depth":    [4, 6, 8, 10, 12, 14],
              "n_estimators": [400, 600, 800, 1000, 1200, 1400],
              "learning_rate": [0.0075, 0.015]}

# try out a random combination of the above values
search = RandomizedSearchCV(regressor, param_grid, cv=5).fit(X_train, y_train)

print("The best hyperparameters are ",search.best_params_)

In [None]:
experiment_name="XGBoost-PD-Classification"
mlflow.set_experiment(experiment_name=experiment_name)

In [None]:
mlflow.xgboost.autolog()
run = mlflow.start_run()

In [None]:
regressor=xgb.XGBClassifier(learning_rate = search.best_params_["learning_rate"],
                           n_estimators  = search.best_params_["n_estimators"],
                           max_depth     = search.best_params_["max_depth"],
                           eval_metric='logloss')

regressor.fit(X_train, y_train)

In [None]:
train_predictions = regressor.predict(X_train)

In [None]:
cm = confusion_matrix(y_train, train_predictions)
display(cm)
cm_display = ConfusionMatrixDisplay(cm,display_labels=regressor.classes_).plot(cmap="Blues", values_format='')

In [None]:
# compute precision and recall
precision = precision_score(y_train, train_predictions)
recall = recall_score(y_train, train_predictions)
accuracy = accuracy_score(y_train, train_predictions)
f1 = 2 * (precision * recall) / (precision + recall)
print( 'Train Precision: ', precision, '\n', 'Train Recall: ', recall, '\n', 'Train F1 Score:', f1,'\n', 'Train Accuracy Score:', f1)


In [None]:
test_predictions = regressor.predict(X_test)

In [None]:
cm = confusion_matrix(y_test, test_predictions)
display(cm)
cm_display = ConfusionMatrixDisplay(cm,display_labels=regressor.classes_).plot(cmap="Blues", values_format='')

In [None]:
# compute precision and recall
precision = precision_score(y_test, test_predictions)
recall = recall_score(y_test, test_predictions)
accuracy = accuracy_score(y_test, test_predictions)
f1 = 2 * (precision * recall) / (precision + recall)
print( 'Test Precision: ', precision, '\n', 'Test Recall: ', recall, '\n', 'Test F1 Score:', f1,'\n', 'Test Accuracy Score:', accuracy)


In [None]:
from xgboost import plot_importance
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 16})

fig, ax = plt.subplots(figsize=(12,6))
plot_importance(regressor, max_num_features=8, ax=ax)
plt.show();

In [None]:
mlflow.end_run()

In [None]:
run = mlflow.get_run(run.info.run_id)
pd.DataFrame(data=[run.data.params], index=["Value"]).T

In [None]:
client = mlflow.tracking.MlflowClient()
client.list_artifacts(run_id=run.info.run_id)

# Step 3: Register and Deploy

#### Creating models from an existing run
If you have an Mlflow model logged inside of a run and you want to register it in a registry, you can do that by using the experiment and run ID information from the run. Let's create a simple experiment and run to demonstrate it:

In [None]:
exp = mlflow.get_experiment_by_name(experiment_name)
last_run = mlflow.search_runs(exp.experiment_id, output_format="list")[-1]
print(last_run.info.run_id)

You can now register the model from the local path:

In [None]:
model_name = "xgb_PD_Classifier"
artifact_path = "model"

mlflow.register_model(f"runs:/{last_run.info.run_id}/{artifact_path}", model_name)

Online Endpoints have the concept of **Endpoint** and **Deployment**. An endpoint represents the API that customers uses to consume the model, while a deployment indicates the specific implementation of that API. This distinction allows users to decouple the API from the implementation and to change the underlying implementation without affecting the consumer.

In [None]:
# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "PD-XGB-Classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

First, let's create an MLflow deployment client for Azure Machine Learning:

In [None]:
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

Let's create the endpoint with basic configuration:

In [None]:
endpoint = deployment_client.create_endpoint(endpoint_name)

We can get the scoring URI from the endpoint:

In [None]:
scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)["properties"][
    "scoringUri"
]
print(scoring_uri)

To configure the hardware requirements of your deployment, you need to create a JSON file with the desired configuration:

In [None]:
deployment_name = "default"

In [None]:
deploy_config = {
    "instance_type": "Standard_DS3_v2",
    "instance_count": 1,
}

Write the configuration to a file:

In [None]:

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

The method **create_deployment** allows you to create a simple deployment using the configuration indicated in the configuration file. We are going to name this deployment "default".  This step may take 10-20 minutes, you can monitor it in the Azure ML Portal as well under Endpoints


In [None]:
version = 1

deployment = deployment_client.create_deployment(
    name=deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

By default, new deployments receive none of the traffic from the endpoint. Let's assign all of it to the deployment:

In [None]:
traffic_config = {"traffic": {deployment_name: 100}}


Let's write the configuration to a file:

In [None]:
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
    

We are going to use the key endpoint-config-file to update the configuration:

In [None]:
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

In [None]:
#deployment_client.delete_deployment(endpoint_name)