In [1]:
!wget http://amazon-sagemaker.com/dependencies/dependencies.zip -O dependencies.zip
!unzip -o dependencies.zip


--2022-04-26 19:20:56--  http://amazon-sagemaker.com/dependencies/dependencies.zip
Resolving amazon-sagemaker.com (amazon-sagemaker.com)... 13.227.92.71, 13.227.92.86, 13.227.92.94, ...
Connecting to amazon-sagemaker.com (amazon-sagemaker.com)|13.227.92.71|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://amazon-sagemaker.com/dependencies/dependencies.zip [following]
--2022-04-26 19:20:56--  https://amazon-sagemaker.com/dependencies/dependencies.zip
Connecting to amazon-sagemaker.com (amazon-sagemaker.com)|13.227.92.71|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11904 (12K) [application/zip]
Saving to: ‘dependencies.zip’


2022-04-26 19:20:56 (6.41 MB/s) - ‘dependencies.zip’ saved [11904/11904]

Archive:  dependencies.zip
   creating: serving/
  inflating: serving/.DS_Store       
   creating: serving/custom_inference/
  inflating: serving/setup.py        
   creating: serving/.ipynb_checkpoints/
  inflating:

In [2]:
import os
import datetime
import sagemaker
import sagemaker_utils
import numpy as np
import matplotlib.pyplot as plt
from time import gmtime, strftime
from sklearn.metrics import confusion_matrix
from sagemaker import Session, get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.processing import Processor, ProcessingInput, ProcessingOutput
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter, IntegerParameter, CategoricalParameter
from sagemaker.inputs import TrainingInput, CreateModelInput, TransformInput
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CreateModelStep, TransformStep
from sagemaker.workflow.parameters import ParameterString, ParameterFloat
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.properties import PropertyFile
from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
sagemaker.__version__

'2.88.0'

In [5]:
session = Session()
#sagemaker_role = get_execution_role()

data_file = 'Data sets/churn.txt'

region = session.boto_region_name
account_id = session.account_id()
bucket = session.default_bucket()

prefix = 'churn-clf'
datasets_prefix = f'{prefix}/datasets'
processed_data_prefix = f'{prefix}/processed'
eval_prefix = f'{prefix}/eval'
transformed_data_prefix = f'{prefix}/transformed'
images_directory = f'{prefix}/images'
code_prefix = f'{prefix}/code'
model_prefix = f'{prefix}/models'


In [7]:
images_directory

'churn-clf/images'

Para crear los contenedores Docker utilizaremos el servicio AWS Code Build y debido a que las imágenes bases serán descargadas del DockerHub repository podríamos llegar a obtener un error indicando que se han realizado demasiadas solicitudes, para conocer más detalles de esta limitante visitar este enlace.

Para evitar ese error necesitamos autenticarnos y para esto debemos obtener una cuenta en DockerHub y sustituir usuario y constraseña por los valores correspondientes.

In [8]:
secret_name = 'dockerhub'
sagemaker_utils.create_secret(secret_name,'usuario','contraseña')

INFO: Secret created


Necesitaremos un rol de ejecución para ser utilizado en el proyecto de AWS Code Build. Si estamos ejecutando el Notebook con permisos suficientes para crear un rol de IAM, podemos crear el rol simplemente ejecutando el siguiente método, de lo contrario tendría que ser creado de forma manual.

In [12]:
policy_document={
        "Version": "2012-10-17",
        "Statement": [               
            {
                "Effect": "Allow",
                "Action": [
                    "ecr:BatchCheckLayerAvailability",
                    "ecr:CompleteLayerUpload",
                    "ecr:GetAuthorizationToken",
                    "ecr:InitiateLayerUpload",
                    "ecr:PutImage",
                    "ecr:UploadLayerPart",
                    "ecr:BatchGetImage",
                    "ecr:GetDownloadUrlForLayer",
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents",
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:GetObjectVersion",
                    "secretsmanager:GetSecretValue"
                ],
                "Resource": "*"
            }
        ]
    }

codebuild_role = sagemaker_utils.create_codebuild_execution_role('CodeBuildExecutionRole', policy_document)


INFO: Role does not exist, creating it...
INFO: Role created: arn:aws:iam::829825986145:role/CodeBuildExecutionRole


Especificamos las dependencias requeridas para cada uno de los contenedores Docker que crearemos.

In [9]:
docker_images = {'Processing':{'libraries':{'pandas':'1.2.4',
                                            'numpy':'1.20.2',
                                            'scikit-learn':'0.24.2'}},
                 'Training':{'libraries':{'pandas':'1.2.4',
                                          'numpy':'1.20.2',
                                          'scikit-learn':'0.24.2',
                                          'sagemaker-training':'3.9.2'}},
                 'Inference':{'libraries':{'pandas':'1.2.4',
                                           'numpy':'1.20.2',
                                           'scikit-learn':'0.24.2',
                                           'multi-model-server':'1.1.8',                            
                                           'sagemaker-inference':'1.5.11',
                                           'boto3':'1.21.43',
                                           'itsdangerous':'2.0.1'},
                              'dependencies':[('serving','/opt/ml/serving')],
                              'others':['RUN pip install -e /opt/ml/serving',
                                        'LABEL com.amazonaws.sagemaker.capabilities.multi-models=false',
                                        'LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true'],
                              'entrypoint':['python','/opt/ml/serving/custom_inference/serving.py'],
                              'cmd':['serve']}}

Creamos y publicamos las imágenes Docker en Amazon Elastic Container Registry para posteriormente poder ser utilizados en los jobs que crearemos y lanzaremos en Amazon SageMaker.

Para conocer más detalles sobre el uso de contenedores Docker con Amazon SageMaker y cómo crear tus propios contenedores consultar la documentación .

In [13]:
for image in docker_images:
    parameters = {'image_name': f'{prefix}-{image.lower()}',
                  'base_image': 'python:3.7.6-slim-buster',
                  's3_path': f's3://{bucket}/{images_directory}',
                  'role': codebuild_role,  
                  #'secret': secret_name,
                  'wait': False}
    
    parameters.update(docker_images[image])
    
    docker_images[image]['build_id'] = sagemaker_utils.create_docker_image(**parameters)  


In [15]:
parameters

{'image_name': 'churn-clf-inference',
 'base_image': 'python:3.7.6-slim-buster',
 's3_path': 's3://sagemaker-us-east-1-829825986145/churn-clf/images',
 'role': 'arn:aws:iam::829825986145:role/CodeBuildExecutionRole',
 'wait': False,
 'libraries': {'pandas': '1.2.4',
  'numpy': '1.20.2',
  'scikit-learn': '0.24.2',
  'multi-model-server': '1.1.8',
  'sagemaker-inference': '1.5.11',
  'boto3': '1.21.43',
  'itsdangerous': '2.0.1'},
 'dependencies': [('serving', '/opt/ml/serving')],
 'others': ['RUN pip install -e /opt/ml/serving',
  'LABEL com.amazonaws.sagemaker.capabilities.multi-models=false',
  'LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true'],
 'entrypoint': ['python', '/opt/ml/serving/custom_inference/serving.py'],
 'cmd': ['serve']}

Debido a que la creación de los containers ocurre de manera asíncrona, esperamos a que termine la creación de los tres contenedores.

In [14]:
image_uris = sagemaker_utils.wait_for_build([docker_images[image]['build_id'] for image in docker_images])
for image in docker_images:
    docker_images[image]['image_uri'] = image_uris[docker_images[image]['build_id']]


[Kchurn-clf-processing-build-image................FAILED!
[Kchurn-clf-inference-build-image.................SUCCEEDED!
[Kchurn-clf-training-build-image..................STOPPED!
[K[34m💥[0m Building docker images


Exception: Building some images failed!

Capítulo 4
Preparación de los datos
Objetivo
Antes de poder utilizar Amazon SageMaker para procesar nuestros datos, entrenar un modelo u optimizar algún algoritmo, primero debemos subir los datos a Amazon S3. Es lo que haremos en este capítulo y posteriormente crearemos un Job de Procesamiento en SageMaker el cual nos permitirá realizar las transformaciones necesarias a nuestro dataset como preparación para el entrenamiento de nuestros modelos de Machine Learning.

Antes de poder crear el Processing Job para la preparación de los datos para el entrenamiento del modelo, debemos subir los datos a un bucket de Amazon S3. Subiremos el archivo churn.txt localizado en la carpeta Data sets.

Antes debe haberse ejecutado el Jupyter Notebook descargado de la sección Introducción ya que este descarga el dataset, de lo contrario nos marcará un error por no encontrar el archivo

In [None]:
!wget http://amazon-sagemaker.com/datasets/DKD2e_data_sets.zip -O DKD2e_data_sets.zip

--2022-04-26 18:35:40--  http://amazon-sagemaker.com/datasets/DKD2e_data_sets.zip
Resolving amazon-sagemaker.com (amazon-sagemaker.com)... 13.227.92.115, 13.227.92.71, 13.227.92.86, ...
Connecting to amazon-sagemaker.com (amazon-sagemaker.com)|13.227.92.115|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://amazon-sagemaker.com/datasets/DKD2e_data_sets.zip [following]
--2022-04-26 18:35:41--  https://amazon-sagemaker.com/datasets/DKD2e_data_sets.zip
Connecting to amazon-sagemaker.com (amazon-sagemaker.com)|13.227.92.115|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1003616 (980K) [application/zip]
Saving to: ‘DKD2e_data_sets.zip’


2022-04-26 18:35:41 (7.30 MB/s) - ‘DKD2e_data_sets.zip’ saved [1003616/1003616]



In [None]:
!unzip -o DKD2e_data_sets.zip

Archive:  DKD2e_data_sets.zip
 extracting: Data sets/adult.zip     
  inflating: Data sets/cars.txt      
  inflating: Data sets/cars2.txt     
  inflating: Data sets/cereals.CSV   
  inflating: Data sets/churn.txt     
  inflating: Data sets/ClassifyRisk  
  inflating: Data sets/ClassifyRisk - Missing.txt  
 extracting: Data sets/DKD2e data sets.zip  
  inflating: Data sets/nn1.txt       


In [19]:
data_file, bucket, datasets_prefix

('Data sets/churn.txt',
 'sagemaker-us-east-1-829825986145',
 'churn-clf/datasets')

In [20]:
data_s3_path = sagemaker_utils.upload(data_file, f's3://{bucket}/{datasets_prefix}')


Uploading: 100%|██████████| 438k/438k [00:01<00:00, 239kB/s]
