## Training SageMaker Models using the DGL with MXNet backend
The **SageMaker Python SDK** makes it easy to train DGL models. In this example, we train a simple graph neural network using the [DMLC DGL API](https://github.com/dmlc/dgl.git) and the [cora dataset](https://relational.fit.cvut.cz/dataset/CORA). The cora dataset describes a citation network. The cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. The task at hand is to train a node classification model using Cora dataset. 

For more details about Graph Neural Network and this example please refer to https://docs.dgl.ai/en/latest/tutorials/models/1_gnn/1_gcn.html

### Prepare
First we need to install necessary packages.

In [2]:
!conda install -y boto3
!conda install -c anaconda -y botocore

Solving environment: done


  current version: 4.5.12
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/mxnet_p36

  added / updated specs: 
    - boto3


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    boto3-1.10.19              |             py_0          91 KB
    botocore-1.13.19           |             py_0         3.3 MB
    ------------------------------------------------------------
                                           Total:         3.4 MB

The following packages will be UPDATED:

    boto3:    1.9.234-py_0  --> 1.10.19-py_0
    botocore: 1.12.234-py_0 --> 1.13.19-py_0


Downloading and Extracting Packages
boto3-1.10.19        | 91 KB     | ##################################### | 100% 
botocore-1.13.19     | 3.3 MB    | #########################

### Setup
We need to define a few variables that will be needed later in the example.

In [3]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session

# Setup session
sess = sagemaker.Session()

# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = sess.default_bucket()

# Location to put your custom code.
custom_code_upload_location = 'customcode'

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment. 
role = get_execution_role()

### The training script
The mxnet_gcn.py script provides all the code we need for training a SageMaker model. 

In [4]:
!cat mxnet_gcn.py

#!/usr/bin/env python
# coding: utf-8

"""GCN using DGL nn package
References:
- Semi-Supervised Classification with Graph Convolutional Networks
- Paper: https://arxiv.org/abs/1609.02907
- Code: https://github.com/tkipf/gcn
"""
import mxnet as mx
from mxnet import gluon
import os
import argparse
import dgl
from dgl.nn.mxnet import GraphConv

import time
import json
import numpy as np
from mxnet import gluon

from dgl import DGLGraph
from dgl.data import register_data_args, load_data

import collections
class GCN(gluon.Block):
    def __init__(self,
                 g,
                 in_feats,
                 n_hidden,
                 n_classes,
                 n_layers,
                 activation,
                 dropout):
        super(GCN, self).__init__()
        self.g = g
        self.layers = gluon.nn.Sequential()
        # input layer
        self.layers.add(GraphConv(in_feats, n_hidden, activation=activation))
        # hidden laye

### SageMaker's  estimator class
The SageMaker Estimator allows us to run single machine in SageMaker, using CPU or GPU-based instances.

When we create the estimator, we pass in the filename of our training script, the name of our IAM execution role. We also provide a few other parameters. train_instance_count and train_instance_type determine the number and type of SageMaker instances that will be used for the training job. The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the mxnet_gcn.py script above.

Here we can use the official docker image for this example, please see https://github.com/aws/sagemaker-mxnet-container for more information.


In [5]:
from sagemaker.mxnet.estimator import MXNet

CODE_PATH = 'mxnet_gcn.py'

account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
docker_name = 'beta-mxnet-training'
docker_tag = '1.6.0-py3-gpu-build'
image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}'.format(account, region, docker_name, docker_tag)
print(image)

params = {}
params['dataset'] = 'cora'
estimator = MXNet(entry_point=CODE_PATH,
                        role=role, 
                        train_instance_count=1, 
                        train_instance_type='ml.p3.2xlarge',
                        image_name=image,
                        hyperparameters=params,
                        sagemaker_session=sess)

No framework_version specified, defaulting to version 1.2. This is not the latest supported version. If you would like to use version 1.4.1, please add framework_version=1.4.1 to your constructor.
The Python 2 mxnet images will be soon deprecated and may not be supported for newer upcoming versions of the mxnet images.
Please set the argument "py_version='py3'" to use the Python 3 mxnet image.


397262719838.dkr.ecr.us-east-2.amazonaws.com/beta-mxnet-training:1.6.0-py3-gpu-build


### Running the Training Job
After we've constructed our Estimator object, we can fit it using sagemaker (The dataset will be automatically downloaded). Below we run SageMaker training on one channels: training-code, the code to run.

In [6]:
estimator.fit()

2019-11-27 01:19:57 Starting - Starting the training job...
2019-11-27 01:19:58 Starting - Launching requested ML instances......
2019-11-27 01:21:22 Starting - Preparing the instances for training......
2019-11-27 01:22:04 Downloading - Downloading input data
2019-11-27 01:22:04 Training - Downloading the training image.........
2019-11-27 01:23:45 Training - Training image download completed. Training in progress.[31m2019-11-27 01:23:46,223 sagemaker-containers INFO     Imported framework sagemaker_mxnet_container.training[0m
[31m2019-11-27 01:23:46,249 sagemaker_mxnet_container.training INFO     MXNet training environment: {'SM_HOSTS': '["algo-1"]', 'SM_NETWORK_INTERFACE_NAME': 'eth0', 'SM_HPS': '{"dataset":"cora"}', 'SM_USER_ENTRY_POINT': 'mxnet_gcn.py', 'SM_FRAMEWORK_PARAMS': '{}', 'SM_RESOURCE_CONFIG': '{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}', 'SM_INPUT_DATA_CONFIG': '{}', 'SM_OUTPUT_DATA_DIR': '/opt/ml/output/data', 'SM_CHANNELS': '[]', '


2019-11-27 01:24:16 Uploading - Uploading generated training model
2019-11-27 01:24:16 Completed - Training job completed
Training seconds: 138
Billable seconds: 138


## Output
You can get the model training output from the Sagemaker Console by searching for the training task and looking for the address of 'S3 model artifact'