# Model Packaging

- Model evaluation and interpretability metrics
- Production testing methods
- Package ML models
- Inference ready models

**Model evaluation and interpretability metrics** : They enable us to understand and validate the ML models to determine the business value they will produce.

Commonly used metrics:

- Cross-validation (stratified cross-validation, leave-one-out cross-validation, and K-fold cross-validation)
//K-fold is not a good choice if you have a very large training dataset or if the model requires a large amount of time, CPU, and/or GPU processing for running.//

- Precision

- Recall

- F score

- Confusion matrix

- AUc-ROC ( it uses TPR and FPR)
//An ROC curve depicts the TPR versus FPR for different thresholds for classification. Lowering the threshold for classification enables more items to be classified as positive, which in turn increases both false positives and true positives. The Area Under the Curve (AUC) is a metric used to quantify the effectiveness or ability of a classifier to distinguish between classes and is used to summarize the ROC curve.//

//the classifier is able to correctly distinguish between all the positive and negative class points if the AUC value is 1, and the classifier is unable to correctly distinguish between all the positive and negative class points if the AUC value is 0. When the AUC value is 0.5 (without manually setting a threshold), then this is a random classifier.//



In [None]:
import pandas as pd
import numpy as np
import warnings
import pickle
from math import sqrt
warnings.filterwarnings('ignore')
from azureml.core.run import Run
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.model import Model
from azureml.core.authentication import ServicePrincipalAuthentication

In [None]:
# Connect to Workspace
ws = Workspace.from_config()
print(ws)

## Download scaler and model from workspace

In [None]:
scaler = Model(ws,'scaler').download(exist_ok=True)

In [None]:
svc_model = Model(ws,'support-vector-classifier').download(exist_ok=True)

## Load files

In [None]:
with open('scaler.pkl', 'rb') as file:
    scaler = pickle.load(file)

In [None]:
# Compute the prediction with ONNX Runtime
import onnxruntime as rt
import numpy
sess = rt.InferenceSession("svc.onnx")

In [None]:
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

## Inference on test data


In [None]:
test_data = np.array([34.927778, 0.24, 7.3899, 83, 16.1000, 1])
# these are values for Temperature_C, Humidity, Wind_speed_kmph, Wind_bearing_degrees, 
# Visibility_km, Pressure_millibars, and Current_weather_condition.

In [None]:
# Scale data
test_data = scaler.fit_transform(test_data.reshape(1, 6))

In [None]:
pred_onx = sess.run([label_name], {input_name: test_data.astype(numpy.float32)})[0]

In [None]:
pred_onx[0]

## Testing methods

- Batch testing (The model is usually served as a serialized file, and the file is loaded as an object and inferred on test data.)

- A/B testing ( Z-test, G-test)

- Stage test or shadow test




## Testing with CI/CD 

- Upon a successful run of an ML pipeline, CI/CD pipelines can trigger a new model's A/B test in the staging environment.

- When a new model is trained, it is beneficial to set up a dataset separate from the test set to measure its performance against suitable metrics, and this step can be fully automated.

- CI/CD pipelines can periodically trigger ML pipelines at a set time in a day to train a new model, which uses live or real-time data to train a new model or fine-tune an existing model.

- CI/CD pipelines can monitor the ML model's performance of the deployed model in production, and this can be triggered or managed using time-based triggers or manual triggers (initiated by team members responsible for quality assurance).

- CI/CD pipelines can provision two or more staging environments to perform A/B testing on unique datasets to perform more diverse and comprehensive testing.

## ML model packaging

In order to serve the models, they need to be packed into software artifacts to be shipped to the testing or production environments. Usually, these software artifacts are packaged into a file or a bunch of files or containers. This allows the software to be environment- and deployment-agnostic. 

- packaged model can be deployed in a virtual machine or serverless setup , a container serverless environment, a streaming service, microservices, or batch services.

- ML model interoperability is the ability of two or more models or components to exchange information and to use exchanged information in order to learn or fine-tune from each other and perform operations with efficiency. Exchanged information can be in the form of data or software artifacts or model parameters. Such information enables models to fine-tune, retrain, or adapt to various environments from the experience of other software artifacts in order to perform and be efficient.


## Ways to Packaage ML models

1. **Serialization** 

- is the method of converting an object or a data structure (for example, variables, arrays, and tuples) into a storable artefact, for example, into a file or a memory buffer that can be transported or transmitted (across computer networks). The main purpose of serialization is to reconstruct the serialized file into its previous data structure (for example, a serialized file into an ML model variable) in a different environment. This way, a newly trained ML model can be serialized into a file and exported into a new environment where it can de-serialized back into an ML model variable or data structure for ML inferencing.

- examples: .pkl , .h5 , .onnx, .pb , .zip(for spark ML)


2. **Packetizing or containerizing**

- Using docker and Kubernetes.