# Crear y ejecutar un trabajo de entrenamiento (AWS SDK para Python (Boto 3))

Para capacitar un modelo, Amazon SageMaker utiliza la API de CreateTrainingJob . El AWS SDK para Python (Boto 3) proporciona el método create_training_job correspondiente.

In [28]:
%%time
import sagemaker
import boto3
import copy
import time
from time import gmtime, strftime
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker import get_execution_role

In [21]:
role = get_execution_role() #get_execution_role recupera el rol de IAM que creó durante la creación de la instancia de notebooks
region = boto3.Session().region_name
bucket= 'lbk-analytics-dev' # Replace with your s3 bucket name
prefix = 'sagemaker/xgboost-mnist' # Used as part of the path in the bucket where you store data
bucket_path = 'https://lbk-analytics-dev.s3.amazonaws.com'

In [22]:
container = get_image_uri(boto3.Session().region_name, 'xgboost' , '1.0-1')

'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


In [23]:
#Ensure that the train and validation data folders generated above are reflected in the "InputDataConfig" parameter below.
common_training_params = \
{
    "AlgorithmSpecification" : {
        "TrainingImage" : container,
        "TrainingInputMode" : "File"
    },
    "RoleArn" : role,
    "OutputDataConfig" : {
        "S3OutputPath" : bucket_path + "/" + prefix + "/xgboost"
    },
    "ResourceConfig" : {
        "InstanceCount" : 1 ,
        "InstanceType" : "ml.m4.xlarge" ,
        "VolumeSizeInGB" : 5
    },
    "HyperParameters" : {
        "max_depth" : "5" ,
        "eta" : "0.2" ,
        "gamma" : "4" ,
        "min_child_weight" : "6" ,
        "silent" : "0" ,
        "objective" : "multi:softmax" ,
        "num_class" : "10" ,
        "num_round" : "10"
    },
    "StoppingCondition" : {
        "MaxRuntimeInSeconds" : 86400
    },
    "InputDataConfig" : [
        {
            "ChannelName" : "train" ,
            "DataSource" : {
                "S3DataSource" : {
                    "S3DataType" : "S3Prefix" ,
                    "S3Uri" : bucket_path + "/" + prefix+ '/train/' ,
                    "S3DataDistributionType" : "FullyReplicated"
                }
            },
            "ContentType" : "text/csv" ,
            "CompressionType" : "None"
        },
        {
            "ChannelName" : "validation" ,
            "DataSource" : {
                "S3DataSource" : {
                    "S3DataType" : "S3Prefix" ,
                    "S3Uri" : bucket_path + "/" + prefix+ '/validation/' ,
                    "S3DataDistributionType" : "FullyReplicated"
                }
            },
            "ContentType" : "text/csv" ,
            "CompressionType" : "None"
        }
    ]
}

In [29]:
#training job params
training_job_name = 'sagemaker-xgboost-' + strftime( "%Y-%m-%d-%H-%M-%S" ,gmtime())
print( "Job name is:" , training_job_name)
training_job_params = copy.deepcopy(common_training_params)
training_job_params[ 'TrainingJobName' ] = training_job_name
training_job_params[ 'ResourceConfig' ][ 'InstanceCount' ] = 1

Job name is: sagemaker-xgboost-2020-07-29-15-26-22


In [30]:
%%time
region = boto3.Session().region_name
sm = boto3.Session().client( 'sagemaker' )
sm.create_training_job(**training_job_params)
status = sm.describe_training_job(TrainingJobName=training_job_name)[ 'TrainingJobStatus' ]
print(status)
sm.get_waiter( 'training_job_completed_or_stopped' ).wait(TrainingJobName=training_job_name)
status =sm.describe_training_job(TrainingJobName=training_job_name)[ 'TrainingJobStatus' ]
print( "Training job ended with status: " + status)
if status == 'Failed' :
    message =sm.describe_training_job(TrainingJobName=training_job_name)[ 'FailureReason' ]
    print( 'Training failed with the following error:{}' .format(message))
    raise Exception( 'Training job failed' )


InProgress
Training job ended with status: Completed
CPU times: user 305 ms, sys: 0 ns, total: 305 ms
Wall time: 8min 1s


# Implementar el modelo en los servicios de alojamiento de Amazon SageMaker (AWS SDK para Python (Boto3).)

1. Crear un modelo en Amazon SageMaker: envíe una solicitud CreateModel para proporcionar información como la ubicación del bucket de S3 que contiene sus artefactos de modelos y la ruta  de registro de la imagen que contiene el código de inferencia.
2. Crear una configuración de punto de enlace: envíe una solicitud CreateEndpointConfig para proporcionar la configuración de recursos para el alojamiento. Esto incluye el tipo y el  número de instancias de computación de aprendizaje automático que lanzar para la implementación del modelo.
3. Crear un punto de enlace: envíe una solicitud CreateEndpoint para crear un punto de enlace. Amazon SageMaker lanza las instancias de computación de aprendizaje automático e implementa el modelo. Amazon SageMaker devuelve un punto de enlace Las aplicaciones pueden enviar solicitudes a este punto de enlace para obtener inferencias.

In [None]:
model_name = training_job_name + '-mod'
info = sm.describe_training_job(TrainingJobName=training_job_name)
model_data = info[ 'ModelArtifacts' ][ 'S3ModelArtifacts' ]
print(model_data)
primary_container = {
    'Image' : container,
    'ModelDataUrl' : model_data
}

create_model_response = sm.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = primary_container)

print(create_model_response[ 'ModelArn' ])

In [None]:
endpoint_config_name = 'DEMO-XGBoostEndpointConfig-' + strftime( "%Y-%m-%d-%H-%M-%S" , gmtime())
print(endpoint_config_name)
create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType' : 'ml.m4.xlarge' ,
        'InitialVariantWeight' : 1 ,
        'InitialInstanceCount' : 1 ,
        'ModelName' :model_name,
        'VariantName' : 'AllTraffic' }])

print( "Endpoint Config Arn: " + create_endpoint_config_response[ 'EndpointConfigArn' ])

In [None]:
%%time

endpoint_name = 'DEMO-XGBoostEndpoint-' + strftime( "%Y-%m-%d-%H-%M-%S" ,gmtime())
print(endpoint_name)
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)

print(create_endpoint_response[ 'EndpointArn' ])
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp[ 'EndpointStatus' ]
print( "Status: " + status)
while status== 'Creating' :
    time.sleep( 60 )
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp[ 'EndpointStatus' ]
    print( "Status: " + status)
    
print( "Arn: " + resp[ 'EndpointArn' ])
print( "Status: " + status)

# Implementar un modelo con la transformación por lotes (SDK para Python (Boto 3))
Para crear un trabajo de transformación por lotes (SDK para Python (Boto 3)) Asigne un nombre al trabajo de transformación por lotes y especifique dónde se almacenan los datos de entrada (el conjunto de la prueba) y dónde se almacena el resultado del trabajo.

In [None]:
batch_job_name = 'xgboost-mnist-batch' + strftime( "%Y-%m-%d-%H-%M-%S" , gmtime())
batch_input = 's3://{}/{}/test/examples' .format(bucket, prefix)

print(batch_input)
batch_output = 's3://{}/{}/batch-inference' .format(bucket, prefix)

In [None]:
request = \
{
    "TransformJobName" : batch_job_name,
    "ModelName" : model_name,
    "BatchStrategy" : "MultiRecord" ,
    "TransformOutput" : {
        "S3OutputPath" : batch_output
    },
    "TransformInput" : {
        "DataSource" : {
            "S3DataSource" : {
                "S3DataType" : "S3Prefix" ,
                "S3Uri" : batch_input
            }
        },
        "ContentType" : "text/csv" ,
        "SplitType" : "Line" ,
        "CompressionType" : "None"
    },
    "TransformResources" : {
        "InstanceType" : "ml.m4.xlarge" ,
        "InstanceCount" : 1
    }
}

In [None]:
sm.create_transform_job(**request)
while ( True ):
    response = sm.describe_transform_job(TransformJobName=batch_job_name)
    status = response[ 'TransformJobStatus' ]
    if status == 'Completed' :
        print( "Transform job ended with status: " + status)
        break
    if status == 'Failed' :
        message = response[ 'FailureReason' ]
        print( 'Transform failed with the following error:{}' .format(message))
        raise Exception( 'Transform job failed' )
    print( "Transform job is still in status: " + status)
    time.sleep( 30 )