# Summary

    - How to look up Amazon ECR for a container image for xgboost algorithm
    - How to build an Estimator & define Hyperparams
    - How to specify training & validation data
    - How to train model
    - How to deploy model
    - How to make predictions

In [5]:
import sagemaker

session = sagemaker.Session()
role = sagemaker.get_execution_role()

s3_output_path = 's3://rsjainaimlmodels/bikerental/output'

## Get Container Reference & ECR Registry Path

In [4]:
#get reference to the docker container specifying region, algorithm and it's desired version
container = sagemaker.amazon.amazon_estimator.get_image_uri(
            session.boto_region_name,
            "xgboost",
            "0.90-1"
            )

print (container)

683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:0.90-1-cpu-py3


# Build Model

## define estimator

In [7]:
xgboostEstimator = sagemaker.estimator.Estimator(
                container,
                role,
                train_instance_type='ml.m4.xlarge',
                train_instance_count=1,
                output_path = s3_output_path,
                sagemaker_session = session,
                base_job_name = 'xgboost-bikerentail-v1'
            )

## define hyperparameters

In [15]:
xgboostEstimator.set_hyperparameters(max_depth=5, 
                                     objective='reg:squarederror',
                                     num_round=150)
xgboostEstimator.hyperparameters()

{'max_depth': 5, 'num_round': 150, 'objective': 'reg:linear'}

## set training & validation data paths

In [22]:
training_input_config = sagemaker.session.s3_input(s3_data='s3://rsjainaimlmodels/bikerental/train/',
                                                   content_type='csv',
                                                   s3_data_type='S3Prefix',
                                                   input_mode='Pipe'
                                               )

validation_input_config = sagemaker.session.s3_input(s3_data='s3://rsjainaimlmodels/bikerental/val/',
                                                     content_type='csv',
                                                     s3_data_type='S3Prefix',
                                                     input_mode='Pipe')

data_channels = {'train':training_input_config,
                 'validation':validation_input_config}

In [23]:
print (training_input_config.config)

{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://rsjainaimlmodels/bikerental/train/', 'S3DataDistributionType': 'FullyReplicated'}}, 'ContentType': 'csv', 'InputMode': 'Pipe'}


## Train the model

In [26]:
xgboostEstimator.fit(data_channels)

2020-05-15 19:48:11 Starting - Starting the training job...
2020-05-15 19:48:14 Starting - Launching requested ML instances.........
2020-05-15 19:49:54 Starting - Preparing the instances for training......
2020-05-15 19:50:43 Downloading - Downloading input data...
2020-05-15 19:51:38 Training - Training image download completed. Training in progress.
2020-05-15 19:51:38 Uploading - Uploading generated training model.[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value reg:linear to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Pipe path /opt/ml/input/data/train found.[0m
[34mINFO:root:Pipe path /opt/ml/input/data/validation found.[0m
[34mINFO:root:Single node training.[0m
[34


2020-05-15 19:51:45 Completed - Training job completed
Training seconds: 62
Billable seconds: 62


## Deploy the model

In [28]:
predictor = xgboostEstimator.deploy(initial_instance_count=1,
                                    instance_type='ml.m4.xlarge',
                                    endpoint_name='xgboost-bikerental-v1')

---------------!

In [29]:
print (predictor.endpoint)

xgboost-bikerental-v1


## Run Predictions

In [30]:
from sagemaker.predictor import csv_serializer

predictor.content_type='csv'
predictor.serializer=csv_serializer

In [31]:
pred_booking_count=predictor.predict([1,0,0,1,9.84,14.395,81,0.0,2011,1,1,0,0,0])
print (pred_booking_count)

b'34.38656234741211'


## Run Predictions on Test Data