# Offline Workflow
Run through the notebook to:
- build dummy train-able image
- push to aws ecr
- create dummy data and upload to s3
- train with aws sagemaker

https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html

In [1]:
import boto3
import sagemaker
# from sagemaker import get_execution_role

### Set up the execution account and roles:

In [3]:
# no need to create one if the repo does not exist.
#     349934695336.dkr.ecr.us-west-2.amazonaws.com/basic-sagemaker-train
ecr_namespace = '349934695336.dkr.ecr.us-west-2.amazonaws.com/'
prefix = 'basic-sagemaker-train' # sagemaker training job prefix
ecr_repository_name = 'basic-sagemaker-train'

#role arn: arn:aws:iam::349934695336:role/basic-sagemaker-role
role = 'arn:aws:iam::349934695336:role/basic-sagemaker-role'
account_id = role.split(':')[4]  #349934695336
region = 'us-west-2'

sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)

349934695336
us-west-2
arn:aws:iam::349934695336:role/basic-sagemaker-role
sagemaker-us-west-2-349934695336


### Build the train-able dummy image and push to ECR:

In [None]:
# print Dockerfile
! pygmentize ../docker/Dockerfile

In [5]:
# print build and push script:
! pygmentize ../scripts/build_and_push.sh

[31mACCOUNT_ID[39;49;00m=[31m$1[39;49;00m
[31mREGION[39;49;00m=[31m$2[39;49;00m
[31mREPO_NAME[39;49;00m=[31m$3[39;49;00m

docker build -f ../docker/Dockerfile -t [31m$REPO_NAME[39;49;00m ../docker

docker tag [31m$REPO_NAME[39;49;00m [31m$ACCOUNT_ID[39;49;00m.dkr.ecr.[31m$REGION[39;49;00m.amazonaws.com/[31m$REPO_NAME[39;49;00m:latest

[34m$([39;49;00maws ecr get-login --no-include-email --registry-ids [31m$ACCOUNT_ID[39;49;00m[34m)[39;49;00m

aws ecr describe-repositories --repository-names [31m$REPO_NAME[39;49;00m || aws ecr create-repository --repository-name [31m$REPO_NAME[39;49;00m

docker push [31m$ACCOUNT_ID[39;49;00m.dkr.ecr.[31m$REGION[39;49;00m.amazonaws.com/[31m$REPO_NAME[39;49;00m:latest


In [4]:
%%capture
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

### Prepare training data:

In [21]:
# We upload some dummy data to Amazon S3, in order to define our S3-based training channels.
! echo "val1, val2, val3" > dummy.csv
print(sagemaker_session.upload_data('dummy.csv', bucket, prefix + '/train'))
print(sagemaker_session.upload_data('dummy.csv', bucket, prefix + '/val'))
# remove after upload
! rm dummy.csv

s3://sagemaker-us-west-2-349934695336/basic-sagemaker-train/train/dummy.csv
s3://sagemaker-us-west-2-349934695336/basic-sagemaker-train/val/dummy.csv


### Training with Amazon SageMaker

In [19]:
# Training with SageMaker requires the ECR path of the training image.
image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(image_uri)

349934695336.dkr.ecr.us-west-2.amazonaws.com/basic-sagemaker-train:latest


Finally, we can execute the training job by calling the fit() method of the generic Estimator object defined in the Amazon SageMaker Python SDK (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py). This corresponds to calling the CreateTrainingJob() API (https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html).

In [26]:
est = sagemaker.estimator.Estimator(image_uri,
                                    role, 
                                    instance_count=1, 
                                    instance_type='ml.m5.xlarge',
                                    base_job_name=prefix)

In [27]:
est.set_hyperparameters(hp1='value1',
                        hp2=300,
                        hp3=0.001)

In [29]:
train_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/train/'.format(bucket, prefix), content_type='text/csv')
val_config = sagemaker.inputs.TrainingInput('s3://{0}/{1}/val/'.format(bucket, prefix), content_type='text/csv')

est.fit({'train': train_config, 'validation': val_config })

2020-09-08 07:30:20 Starting - Starting the training job...
2020-09-08 07:30:22 Starting - Launching requested ML instances......
2020-09-08 07:31:48 Starting - Preparing the instances for training......
2020-09-08 07:32:45 Downloading - Downloading input data
2020-09-08 07:32:45 Training - Downloading the training image...
2020-09-08 07:33:24 Training - Training image download completed. Training in progress................
2020-09-08 07:35:56 Uploading - Uploading generated training model.[34mRunning training...
[0m
[34mHyperparameters configuration:[0m
[34m{'hp1': 'value1', 'hp2': '300', 'hp3': '0.001'}
[0m
[34mInput data configuration:[0m
[34m{'train': {'ContentType': 'text/csv',
           'RecordWrapperType': 'None',
           'S3DistributionType': 'FullyReplicated',
           'TrainingInputMode': 'File'},
 'validation': {'ContentType': 'text/csv',
                'RecordWrapperType': 'None',
                'S3DistributionType': 'FullyReplicated',
                'Tra