# <font color='blue'>Data Science Academy</font>
# <font color='blue'>Deploy de Modelos de Machine Learning</font>

# <font color='blue'>Amazon SageMaker</font>
## <font color='blue'>Lab</font>
### <font color='blue'>Deploy de Modelo Para Previsão de Doenças Usando Regsitros Médicos Eletrônicos</font>

## Parte 3 - Segunda Versão do Modelo e Criação do Endpoint

In [1]:
# Versão da Linguagem Python
from platform import python_version
print('Versão da Linguagem Python Usada Neste Jupyter Notebook:', python_version())

Versão da Linguagem Python Usada Neste Jupyter Notebook: 3.7.10


## Imports 

https://pypi.org/project/boto/

https://sagemaker.readthedocs.io/en/stable/

In [2]:
# Imports
import os
import json
import sagemaker
import boto3
import numpy as np
import pandas as pd
from sagemaker.serializers import CSVSerializer
from sagemaker.inputs import TrainingInput
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role

In [3]:
sagemaker.__version__

'2.31.0'

## Carrega os Dados

In [4]:
# Obtém a sessão do SageMaker
session = boto3.Session()

In [5]:
s3 = session.resource('s3')

In [6]:
s3

s3.ServiceResource()

In [7]:
from sagemaker import get_execution_role
role = get_execution_role()
print(role)

Couldn't call 'get_role' to get Role ARN from role name AmazonSageMaker-ExecutionRole-20210330T151228 to get Role path.
Assuming role was created in SageMaker AWS console, as the name contains `AmazonSageMaker-ExecutionRole`. Defaulting to Role ARN with service-role in path. If this Role ARN is incorrect, please add IAM read permissions to your role or supply the Role Arn directly.


arn:aws:iam::879456481532:role/service-role/AmazonSageMaker-ExecutionRole-20210330T151228


In [8]:
# Altere para o nome do seu bucket
s3_bucket = 'dsa-deploy-app'
prefix = 'dados'

In [9]:
raiz = 's3://{}/{}/'.format(s3_bucket, prefix)
print(raiz)

s3://dsa-deploy-app/dados/


In [10]:
dados_treino = TrainingInput(s3_data = raiz + 'treino.csv', content_type = 'csv')
dados_teste = TrainingInput(s3_data = raiz + 'teste.csv', content_type = 'csv')

In [11]:
print(json.dumps(dados_treino.__dict__, indent = 2))

{
  "config": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": "s3://dsa-deploy-app/dados/treino.csv",
        "S3DataDistributionType": "FullyReplicated"
      }
    },
    "ContentType": "csv"
  }
}


In [12]:
print(json.dumps(dados_teste.__dict__, indent = 2))

{
  "config": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": "s3://dsa-deploy-app/dados/teste.csv",
        "S3DataDistributionType": "FullyReplicated"
      }
    },
    "ContentType": "csv"
  }
}


## Construção e Treinamento do Modelo

In [13]:
# Criação do Container
# https://sagemaker.readthedocs.io/en/stable/api/utility/image_uris.html
container_uri = sagemaker.image_uris.retrieve(region = session.region_name, 
                                              framework = 'xgboost', 
                                              version = '1.0-1', 
                                              image_scope = 'training')

In [14]:
# Argumentos do estimador
sagemaker_execution_role = role
sagemaker_session = sagemaker.Session()

In [15]:
# Criação do Estimador
# https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html
xgb = sagemaker.estimator.Estimator(image_uri = container_uri,
                                    role = sagemaker_execution_role, 
                                    instance_count = 2, 
                                    instance_type = 'ml.m5.large',
                                    output_path = 's3://{}/artefatos'.format(s3_bucket),
                                    sagemaker_session = sagemaker_session,
                                    base_job_name = 'classifier')

In [16]:
# Definição dos Hiperparâmetros
# https://docs.aws.amazon.com/pt_br/sagemaker/latest/dg/xgboost_hyperparameters.html
xgb.set_hyperparameters(objective = 'binary:logistic', num_round = 100)

In [17]:
# Treinamento
xgb.fit({'train': dados_treino, 'validation': dados_teste})

2021-04-01 17:02:19 Starting - Starting the training job...
2021-04-01 17:02:42 Starting - Launching requested ML instancesProfilerReport-1617296539: InProgress
......
2021-04-01 17:03:43 Starting - Preparing the instances for training............
2021-04-01 17:05:50 Downloading - Downloading input data
2021-04-01 17:05:50 Training - Downloading the training image...
2021-04-01 17:06:05 Training - Training image download completed. Training in progress.[35mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[35mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[35mReturning the value itself[0m
[35mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[35mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[35mINFO:root:Determined delimiter of CSV input is ','[0m
[35mINFO:root:Determined delimiter of CSV input is ','[0m
[35mINFO:ro

## Gerando o Endpoint a Partir do Modelo

In [18]:
# Deploy do modelo treinado criando o endpoint
# https://docs.aws.amazon.com/pt_br/sagemaker/latest/dg/xgboost.html
xgb_predictor = xgb.deploy(initial_instance_count = 2, instance_type = 'ml.m5.large')

-------------!

## Previsões a Partir do Endpoint

In [19]:
csv_serializer = CSVSerializer()

In [20]:
predictor = Predictor(endpoint_name = xgb_predictor.endpoint_name, serializer = csv_serializer)

In [21]:
df_teste = pd.read_csv(raiz + 'teste.csv', names = ['class', 'bmi', 'diastolic_bp_change', 'systolic_bp_change', 'respiratory_rate'])

In [22]:
df_teste.head()

Unnamed: 0,class,bmi,diastolic_bp_change,systolic_bp_change,respiratory_rate
0,0,-0.940089,-0.403964,-0.279542,-0.817379
1,0,-0.502614,-0.665582,0.131742,-0.36245
2,0,1.078473,0.347981,0.228029,-0.817379
3,1,-0.636164,-0.251491,0.587034,-0.817379
4,1,-0.528479,2.037253,1.383463,0.185934


In [23]:
X = df_teste.sample(1)
X

Unnamed: 0,class,bmi,diastolic_bp_change,systolic_bp_change,respiratory_rate
1017,0,-0.022864,-0.496655,2.153753,-0.067314


In [24]:
X = X.values[0]
X[1:]

array([-0.02286428, -0.49665455,  2.15375335, -0.06731361])

In [25]:
paciente = X[1:]
paciente

array([-0.02286428, -0.49665455,  2.15375335, -0.06731361])

In [26]:
# Faz a previsão de um paciente
predicted_class_prob = predictor.predict(paciente).decode('utf-8')
if float(predicted_class_prob) < 0.5:
    print('Previsão = Não Diabético')
else:
    print('Previsão = Diabético')
print()

Previsão = Não Diabético



## Avaliando o Modelo

In [27]:
# Previsão de todos os pacientes no dataset de teste
predictions = []
expected = []
correct = 0
for row in df_teste.values:
    expected_class = row[0]
    payload = row[1:]
    predicted_class_prob = predictor.predict(payload).decode('utf-8')
    predicted_class = 1
    if float(predicted_class_prob) < 0.5:
        predicted_class = 0  
    if predicted_class == expected_class:
        correct += 1
    predictions.append(predicted_class)
    expected.append(expected_class)

In [28]:
print('Acurácia = {:.2f}%'.format(correct/len(predictions) * 100))

Acurácia = 77.72%


#### Confusion Matrix

In [29]:
expected = pd.Series(np.array(expected))
predictions = pd.Series(np.array(predictions))
pd.crosstab(expected, predictions, rownames = ['Actual'], colnames = ['Predicted'], margins = True)

Predicted,0,1,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,1909,71,1980
1.0,483,24,507
All,2392,95,2487


# Fim