![image](https://docs.google.com/uc?export=download&id=1Nh6vjig-ADM1RMbaHrj_GItD3O1ZW4Vr)
# Análisis con Machine Learning - MINE4206

# **Laboratorio Final - Despliegue de Modelos**
## **Objetivos**
- Aprender a usar Flask como alternativa de despliegue sencilla
- Aprender a utilizar AWS SageMaker y MLFlow como alternativa de despliegue avanzada.

## **Importando las librerías requeridas para el laboratorio**
- En esta parte del taller se asume que el usuario consta con las llaves de configuración de AWS y la creación del bucket de S3
- Antes de iniciar el desarrollo, vamos a subir el directorio de ejecución de modelos con el comando `aws s3 sync ./mlruns/ s3://${BUCKET_NAME}`
- Luego, vamos a crear el contenedor en el repositorio de ECR con el comando `mlflow sagemaker build-and-push-container`

In [1]:
import boto3

import mlflow.sagemaker as mfs

from dotenv import load_dotenv, find_dotenv
import os
import json

import numpy as np
import pandas as pd

from tqdm.auto import tqdm

from sklearn.metrics import (classification_report, accuracy_score)

In [2]:
load_dotenv(find_dotenv())

True

## **Configuración de variables de entorno**

- Para esta parte vamos a necesitar configurar un conjunto de variables necesarias para ejecutar y desplegar nuestro modelo sobre AWS, entre ellas:
    - El nombre de la app, que va a ser configurado por nosotros
    - El rol de ejecución de ARN. Este se obtiene desde AWS AMI
    - La url de la imagen en ECR para crear el contenedor
    - La región del bucket
    - El run id del modelo creado por MLFlow
    - El nombre del modelo que creamos en MLFlow

In [3]:
app_name = os.getenv("APP_NAME")
execution_role_arn = os.getenv("EXECUTION_ROLE_ARN")
image_ecr_url = os.getenv("IMAGE_ECR_URL")
region = os.getenv("REGION")

s3_bucket_name = os.getenv("S3_BUCKET_NAME")
run_id = os.getenv("RUN_ID")

model_name = "Decision Tree Model"

experiment_id = "1"

In [4]:
model_uri = "s3://{}/{}/{}/artifacts/{}/".format(
    s3_bucket_name, experiment_id, run_id, model_name
)

In [5]:
model_uri

's3://mlflow-sagemaker/1/aab67419c5bf477488352d5a976e0664/artifacts/Decision Tree Model/'

## **Despliegue y validación**
- Aquí hacemos uso de las variables que definimos en las celdas de arriba para desplegar a nuestra aplicación de AWS SageMaker
- Aclaración: El proceso tarda un poco.

In [6]:
mfs.deploy(
    app_name=app_name,
    model_uri=model_uri,
    execution_role_arn=execution_role_arn,
    region_name=region,
    image_url=image_ecr_url,
    mode=mfs.DEPLOYMENT_MODE_CREATE
)

2021/05/25 20:50:17 INFO mlflow.sagemaker: Using the python_function flavor for deployment!
2021/05/25 20:50:17 INFO mlflow.sagemaker: No model data bucket specified, using the default bucket
2021/05/25 20:50:18 INFO mlflow.sagemaker: Default bucket `mlflow-sagemaker-us-east-2-962145169713` already exists. Skipping creation.
2021/05/25 20:50:20 INFO mlflow.sagemaker: tag response: {'ResponseMetadata': {'RequestId': 'WVJFRX4J7TV8HZ1Y', 'HostId': 'YFz0F7Kv8LGKt5bPEDsd3T7RIeAxdfgm+xO0GNaPhWlmonfRwfbCTvHrlG/80SWFfN7QYI8OytY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'YFz0F7Kv8LGKt5bPEDsd3T7RIeAxdfgm+xO0GNaPhWlmonfRwfbCTvHrlG/80SWFfN7QYI8OytY=', 'x-amz-request-id': 'WVJFRX4J7TV8HZ1Y', 'date': 'Tue, 25 May 2021 20:50:21 GMT', 'server': 'AmazonS3', 'content-length': '0'}, 'RetryAttempts': 0}}
2021/05/25 20:50:20 INFO mlflow.sagemaker: Creating new endpoint with name: mlops-sagemaker ...
2021/05/25 20:50:20 INFO mlflow.sagemaker: Created model with arn: arn:aws:sagemaker:us-east-2

- En esta parte nos encargamos de validar los datos de nuestro modelo al enviar al endpoint creado en el anterior punto.
- Cuando llamamos a la función `invoke_endpoint`, vamos a necesitar una serie de parámetros:
    - EndpointName: El nombre de la aplicación que desplegamos
    - Body: los datos que queremos predecir
    - ContentType: El MIME Type de los datos que vamos a enviar. **Mucho cuidado, no se pueden enviar datos en bruto, en este ejemplo mandamos datos con formato JSON-Split generado por pandas**

In [7]:
def query(input_json):
    client = boto3.session.Session().client(
        "sagemaker-runtime", region
    )
    
    response = client.invoke_endpoint(
        EndpointName=app_name,
        Body=input_json,
        ContentType="application/json; format=pandas-split"
    )
    
    y_pred = response["Body"].read().decode("ascii")
    y_pred = json.loads(y_pred)
    return y_pred

In [8]:
X_test = pd.read_csv("data/X_test.csv.gz")
y_test = pd.read_csv("data/y_test.csv.gz")

In [9]:
X_test.shape

(1115, 8672)

In [10]:
X_test.head()

Unnamed: 0,00,000,000pes,008704050406,0089,0121,01223585236,01223585334,0125698789,02,...,ó_,û_,û_thanks,ûªm,ûªt,ûªve,ûï,ûïharry,ûò,ûówell
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
batch_size = 32
predictions = []

for f in tqdm(range(34)):
    sample = X_test.iloc[f * batch_size: (f + 1) * batch_size].to_json(orient="split")
    
    y_pred = query(sample)
    predictions.extend(y_pred)

  0%|          | 0/34 [00:00<?, ?it/s]

In [12]:
len(predictions)

1088

In [13]:
print(classification_report(y_test[:len(predictions)], predictions))

              precision    recall  f1-score   support

           0       0.97      0.98      0.98       945
           1       0.86      0.80      0.83       143

    accuracy                           0.96      1088
   macro avg       0.91      0.89      0.90      1088
weighted avg       0.96      0.96      0.96      1088



In [14]:
accuracy_score(y_test[:len(predictions)], predictions)

0.9568014705882353

## **Actualización del Modelo**
- Aquí vamos a necesitar nuevos atributos, sin embargo, hay parámetros que vamos a cambiar.

In [42]:
new_run_id = os.getenv("NEW_RUN_ID")
new_model_name = "Log Reg Model"

new_model_uri = "s3://{}/{}/{}/artifacts/{}/".format(
    s3_bucket_name, experiment_id, new_run_id, new_model_name
)

In [44]:
mfs.deploy(
    app_name=app_name,
    model_uri=new_model_uri,
    execution_role_arn=execution_role_arn,
    region_name=region,
    image_url=image_ecr_url,
    mode=mfs.DEPLOYMENT_MODE_REPLACE
)

2021/05/12 22:05:26 INFO mlflow.sagemaker: Using the python_function flavor for deployment!
2021/05/12 22:05:27 INFO mlflow.sagemaker: No model data bucket specified, using the default bucket
2021/05/12 22:05:28 INFO mlflow.sagemaker: Default bucket `mlflow-sagemaker-us-east-1-962145169713` already exists. Skipping creation.
2021/05/12 22:05:29 INFO mlflow.sagemaker: tag response: {'ResponseMetadata': {'RequestId': 'TCE5RJ3951X72B6N', 'HostId': 'Tq35IebrFln+pg7nKsGNP3Aorx68j79r+elCP2IfkviuRXSWw6VCcZTt5Cr0+qBO8ctE4wWxREk=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'Tq35IebrFln+pg7nKsGNP3Aorx68j79r+elCP2IfkviuRXSWw6VCcZTt5Cr0+qBO8ctE4wWxREk=', 'x-amz-request-id': 'TCE5RJ3951X72B6N', 'date': 'Thu, 13 May 2021 03:05:30 GMT', 'content-length': '0', 'server': 'AmazonS3'}, 'RetryAttempts': 0}}
2021/05/12 22:05:29 INFO mlflow.sagemaker: Found active endpoint with arn: arn:aws:sagemaker:us-east-1:962145169713:endpoint/mlops-sagemaker. Updating...
2021/05/12 22:05:30 INFO mlflow.sage

In [45]:
batch_size = 32
predictions = []

for f in tqdm(range(34)):
    sample = X_test.iloc[f * batch_size: (f + 1) * batch_size].to_json(orient="split")
    
    y_pred = query(sample)
    predictions.extend(y_pred)

  0%|          | 0/34 [00:00<?, ?it/s]

In [46]:
print(classification_report(y_test[:len(predictions)], predictions))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98       945
           1       0.97      0.74      0.84       143

    accuracy                           0.96      1088
   macro avg       0.97      0.87      0.91      1088
weighted avg       0.96      0.96      0.96      1088



In [47]:
accuracy_score(y_test[:len(predictions)], predictions)

0.9632352941176471

## **Eliminación del Modelo**

In [49]:
mfs.delete(app_name=app_name, region_name=region)

2021/05/12 22:19:23 INFO mlflow.sagemaker: Deleted endpoint with arn: arn:aws:sagemaker:us-east-1:962145169713:endpoint/mlops-sagemaker
2021/05/12 22:19:23 INFO mlflow.sagemaker: Waiting for the delete operation to complete...
2021/05/12 22:19:24 INFO mlflow.sagemaker: Deletion is still in progress. Current endpoint status: Deleting
2021/05/12 22:19:29 INFO mlflow.sagemaker: The deletion operation completed successfully with message: "The SageMaker endpoint was deleted successfully."
2021/05/12 22:19:29 INFO mlflow.sagemaker: Cleaning up unused resources...
2021/05/12 22:19:29 INFO mlflow.sagemaker: Deleted associated endpoint configuration with arn: arn:aws:sagemaker:us-east-1:962145169713:endpoint-config/mlops-sagemaker-config-abofu7ewjqxu6jclspkipfa
2021/05/12 22:19:30 INFO mlflow.sagemaker: Deleted associated model with arn: arn:aws:sagemaker:us-east-1:962145169713:model/mlops-sagemaker-model-vxxns3owtfwe2qxqx7fhja
