<h1>Distributed Image Classification</h1>

The purpose of this notebook is to show how the example image classification task of Apache MXNet can be run on Amazon SageMaker.

Let's start by setting some variables and getting the execution role.

In [1]:
from sagemaker import get_execution_role

role = get_execution_role()
bucket_name = 'immersionday-sagemaker-test'
bucket_key_prefix = 'distributedimgclass'

print(role)

arn:aws:iam::825935527263:role/service-role/AmazonSageMaker-ExecutionRole-20181106T101932


<h2>Training the model with MXNet</h2>

We are now ready to run the training using the Amazon SageMaker MXNet built-in container. First let's have a look at the script defining our neural network.

In [None]:
!cat 'sources/image_classification.py'

We are now ready to run the training using the MXNet estimator object of the SageMaker Python SDK.

In [8]:
from sagemaker.mxnet import MXNet

output_path = 's3://{0}/{1}/output'.format(bucket_name, bucket_key_prefix)
code_location = 's3://{0}/{1}/code'.format(bucket_name, bucket_key_prefix)

m = MXNet('image_classification.py',
          role=role,
          source_dir='sources/',
          train_instance_count=2,
          train_instance_type='ml.p3.8xlarge',
          output_path=output_path,
          base_job_name='imgclass-mxnet',
          framework_version='1.3.0',
          code_location = code_location,
          py_version='py3',
          distributions={'parameter_server': {'enabled': True}},
          hyperparameters={'batch-size': 32,
                           'epochs': 1,
                           'lr': 0.1,
                           'model': 'vgg11',
                           'kvstore': 'dist_sync_device',
                           'dataset': 'cifar10'})

m.fit()

2019-06-03 16:35:23 Starting - Starting the training job...
2019-06-03 16:35:24 Starting - Launching requested ML instances......
2019-06-03 16:36:27 Starting - Preparing the instances for training......
2019-06-03 16:37:35 Downloading - Downloading input data
2019-06-03 16:37:35 Training - Downloading the training image...
2019-06-03 16:38:11 Training - Training image download completed. Training in progress.
[31m2019-06-03 16:38:12,648 sagemaker-containers INFO     Imported framework sagemaker_mxnet_container.training[0m
[31m2019-06-03 16:38:12,693 sagemaker_mxnet_container.training INFO     MXNet training environment: {'SM_HP_DATASET': 'cifar10', 'SM_USER_ARGS': '["--batch-size","32","--dataset","cifar10","--epochs","1","--kvstore","device","--lr","0.1","--model","vgg11"]', 'SM_FRAMEWORK_PARAMS': '{"sagemaker_parameter_server_enabled":true}', 'SM_HP_EPOCHS': '1', 'SM_MODULE_DIR': 's3://immersionday-sagemaker-test/distributedimgclass/code/imgclass-mxnet-2019-06-03-16-35-23-190/sou

[31mINFO:root:downloaded http://data.mxnet.io/mxnet/data/cifar10.zip into data/cifar10.zip successfully[0m
[31m[16:38:32] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar/train.rec, use 4 threads for decoding..[0m
[31m[16:38:35] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar/test.rec, use 4 threads for decoding..[0m
[31m[16:38:46] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)[0m
[31mINFO:root:Epoch[0] Batch [49]#011Speed: 4554.926035 samples/sec#011accuracy=0.102344, top_k_accuracy_5=0.490625[0m
[31mINFO:root:Epoch[0] Batch [99]#011Speed: 4940.514710 samples/sec#011accuracy=0.098750, top_k_accuracy_5=0.498984[0m
[31mINFO:root:Epoch[0] Batch [149]#011Speed: 4148.540414 samples/sec#011accuracy=0.099062, top_k_accuracy_5=0.501198[0m
[31mINFO:root:Epoch[0] Batch [199]#01