# Using Keras Scripts in SageMaker - Quickstart

Starting with TensorFlow version 1.11, you can use SageMaker's TensorFlow containers to train TensorFlow scripts the same way you would train outside SageMaker. This feature is named **Script Mode**. 

This example is adapted from 
[Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow). 
You can use the same technique for other scripts or repositories, including 
[TensorFlow Model Zoo](https://github.com/tensorflow/models) and 
[TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks).

For this notebook we used the keras version of char-rnn, [char-rnn-keras](https://github.com/ekzhang/char-rnn-keras). We’ll train RNN character-level language models. That is, we’ll give the RNN a huge chunk of text and ask it to model the probability distribution of the next character in the sequence given a sequence of previous characters. This will then allow us to generate new text one character at a time.

As a working example, suppose we only had a vocabulary of four possible letters “helo”, and wanted to train an RNN on the training sequence “hello”. This training sequence is in fact a source of 4 separate training examples: 1. The probability of “e” should be likely given the context of “h”, 2. “l” should be likely in the context of “he”, 3. “l” should also be likely given the context of “hel”, and finally 4. “o” should be likely given the context of “hell”.

### Get the data
For training data, we use plain text versions of Sherlock Holmes stories.
Let's create a folder named **sherlock** to store our dataset:

In [19]:
import os
data_dir = os.path.join(os.getcwd(), 'sherlock')

os.makedirs(data_dir, exist_ok=True)

We need to download the dataset to this folder:

In [20]:
!wget https://sherlock-holm.es/stories/plain-text/cnus.txt --force-directories --output-document=sherlock/input.txt

--2019-08-06 00:41:31--  https://sherlock-holm.es/stories/plain-text/cnus.txt
Resolving sherlock-holm.es (sherlock-holm.es)... 78.46.175.31, 2a01:4f8:c0c:1dea::2
Connecting to sherlock-holm.es (sherlock-holm.es)|78.46.175.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3382026 (3.2M) [text/plain]
Saving to: ‘sherlock/input.txt’


2019-08-06 00:41:32 (4.28 MB/s) - ‘sherlock/input.txt’ saved [3382026/3382026]



The training script executes in the container as shown bellow:

```bash
python train.py --epochs 1 --data_dir /opt/ml/input/data/training --model_dir /opt/ml/model
```

## Test locally using SageMaker Python SDK TensorFlow Estimator

You can use the SageMaker Python SDK [`TensorFlow`](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/README.rst#training-with-tensorflow) estimator to easily train locally and in SageMaker. 

For this notebook, we will use Keras with the Tensorflow backend

Let's start by setting the training script arguments `--epochs` and `--data_dir` as hyperparameters. From the orginal code, the only changes made to allow the script to run natively in SageMaker was converting the data_dir, model_dir, and log_dir variables to arguments that can be passed into the script. 

In [7]:
hyperparameters = {'epochs': 1, 'data_dir': '/opt/ml/input/data/training'}

This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments. Just change your estimator's train_instance_type to local or local_gpu. For more information, see: https://github.com/aws/sagemaker-python-sdk#local-mode.

In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU). Running following script will install docker-compose or nvidia-docker-compose and configure the notebook environment for you.

Note, you can only run a single local notebook at a time.

In [3]:
!/bin/bash ./setup.sh

The user has root access.
nvidia-docker2 already installed. We are good to go!
SageMaker instance route table setup is ok. We are good to go.
SageMaker instance routing for Docker is ok. We are good to go!


To train locally, you set `train_instance_type` to [local](https://github.com/aws/sagemaker-python-sdk#local-mode):

In [5]:
train_instance_type='local'

We create the `TensorFlow` Estimator, passing the flag `script_mode=True`:

In [8]:
import os

import sagemaker
from sagemaker.tensorflow import TensorFlow


estimator = TensorFlow(entry_point='train.py',
                       source_dir='char-rnn-keras',
                       train_instance_type=train_instance_type,
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       role=sagemaker.get_execution_role(), # Passes to the container the AWS role that you are using on this notebook
                       framework_version='1.13',
                       py_version='py3',
                       script_mode=True)

To start a training job, we call `estimator.fit(inputs)`, where inputs is a dictionary where the keys, named **channels**, 
have values pointing to the data location. `estimator.fit(inputs)` downloads the TensorFlow container with TensorFlow Python 3, CPU version, locally and simulates a SageMaker training job. 
When training starts, the TensorFlow container executes **train.py**, passing `hyperparameters` and `model_dir` as script arguments, executing the example as follows:
```bash
python -m train --num-epochs 1 --data_dir /opt/ml/input/data/training --model_dir /opt/ml/model
```


In [None]:
inputs = {'training': f'file://{data_dir}'}

estimator.fit(inputs)

Creating tmpve_cymz9_algo-1-ntdvh_1 ... 
[1BAttaching to tmpve_cymz9_algo-1-ntdvh_12mdone[0m
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,508 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,516 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,795 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,815 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,837 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-ntdvh_1  |[0m 2019-08-06 00:52:13,852 sagemaker-containers INFO     Invoking user script
[36malgo-1-ntdvh_1  |[0m 
[36malgo-1-ntdvh_1  |[0m Training Env:
[36malgo-1-ntdvh_1  |[0m 
[36malgo-1-ntdvh_1  |[0m {
[36malgo-1-ntdvh_1  |[0

[36malgo-1-ntdvh_1  |[0m _________________________________________________________________
[36malgo-1-ntdvh_1  |[0m Layer (type)                 Output Shape              Param #   
[36malgo-1-ntdvh_1  |[0m embedding_1 (Embedding)      (16, 64, 512)             49664     
[36malgo-1-ntdvh_1  |[0m _________________________________________________________________
[36malgo-1-ntdvh_1  |[0m lstm_1 (LSTM)                (16, 64, 256)             787456    
[36malgo-1-ntdvh_1  |[0m _________________________________________________________________
[36malgo-1-ntdvh_1  |[0m dropout_1 (Dropout)          (16, 64, 256)             0         
[36malgo-1-ntdvh_1  |[0m _________________________________________________________________
[36malgo-1-ntdvh_1  |[0m lstm_2 (LSTM)                (16, 64, 256)             525312    
[36malgo-1-ntdvh_1  |[0m _________________________________________________________________
[36malgo-1-ntdvh_1  |[0m dropout_2 (Dropout)          (16, 64, 256) 

[36malgo-1-ntdvh_1  |[0m Batch 81: loss = 3.0461, acc = 0.22168
[36malgo-1-ntdvh_1  |[0m Batch 82: loss = 2.9970, acc = 0.23145
[36malgo-1-ntdvh_1  |[0m Batch 83: loss = 2.9985, acc = 0.23926
[36malgo-1-ntdvh_1  |[0m Batch 84: loss = 2.9449, acc = 0.23828
[36malgo-1-ntdvh_1  |[0m Batch 85: loss = 2.9602, acc = 0.23633
[36malgo-1-ntdvh_1  |[0m Batch 86: loss = 3.0469, acc = 0.22461
[36malgo-1-ntdvh_1  |[0m Batch 87: loss = 3.0300, acc = 0.22754
[36malgo-1-ntdvh_1  |[0m Batch 88: loss = 2.9836, acc = 0.23535
[36malgo-1-ntdvh_1  |[0m Batch 89: loss = 3.0335, acc = 0.22266
[36malgo-1-ntdvh_1  |[0m Batch 90: loss = 3.0083, acc = 0.23438
[36malgo-1-ntdvh_1  |[0m Batch 91: loss = 3.0222, acc = 0.22168
[36malgo-1-ntdvh_1  |[0m Batch 92: loss = 3.0141, acc = 0.22656
[36malgo-1-ntdvh_1  |[0m Batch 93: loss = 2.9874, acc = 0.23828
[36malgo-1-ntdvh_1  |[0m Batch 94: loss = 3.0193, acc = 0.23730
[36malgo-1-ntdvh_1  |[0m Batch 95: loss = 2.9829, acc = 0.22656
[36malgo-

[36malgo-1-ntdvh_1  |[0m Batch 204: loss = 2.6393, acc = 0.27344
[36malgo-1-ntdvh_1  |[0m Batch 205: loss = 2.5555, acc = 0.30957
[36malgo-1-ntdvh_1  |[0m Batch 206: loss = 2.5879, acc = 0.29883
[36malgo-1-ntdvh_1  |[0m Batch 207: loss = 2.6487, acc = 0.28320
[36malgo-1-ntdvh_1  |[0m Batch 208: loss = 2.6074, acc = 0.29004
[36malgo-1-ntdvh_1  |[0m Batch 209: loss = 2.6020, acc = 0.29980
[36malgo-1-ntdvh_1  |[0m Batch 210: loss = 2.6301, acc = 0.28223
[36malgo-1-ntdvh_1  |[0m Batch 211: loss = 2.5491, acc = 0.30566
[36malgo-1-ntdvh_1  |[0m Batch 212: loss = 2.5628, acc = 0.29688
[36malgo-1-ntdvh_1  |[0m Batch 213: loss = 2.6006, acc = 0.28418
[36malgo-1-ntdvh_1  |[0m Batch 214: loss = 2.5105, acc = 0.29688
[36malgo-1-ntdvh_1  |[0m Batch 215: loss = 2.5659, acc = 0.28906
[36malgo-1-ntdvh_1  |[0m Batch 216: loss = 2.5284, acc = 0.30273
[36malgo-1-ntdvh_1  |[0m Batch 217: loss = 2.5266, acc = 0.30078
[36malgo-1-ntdvh_1  |[0m Batch 218: loss = 2.4649, acc = 0.3

[36malgo-1-ntdvh_1  |[0m Batch 327: loss = 2.2384, acc = 0.37695
[36malgo-1-ntdvh_1  |[0m Batch 328: loss = 2.2480, acc = 0.34766
[36malgo-1-ntdvh_1  |[0m Batch 329: loss = 2.1969, acc = 0.36133
[36malgo-1-ntdvh_1  |[0m Batch 330: loss = 2.1623, acc = 0.36426
[36malgo-1-ntdvh_1  |[0m Batch 331: loss = 2.2602, acc = 0.35156
[36malgo-1-ntdvh_1  |[0m Batch 332: loss = 2.2320, acc = 0.34863
[36malgo-1-ntdvh_1  |[0m Batch 333: loss = 2.1872, acc = 0.36523
[36malgo-1-ntdvh_1  |[0m Batch 334: loss = 2.2266, acc = 0.36230
[36malgo-1-ntdvh_1  |[0m Batch 335: loss = 2.1811, acc = 0.35742
[36malgo-1-ntdvh_1  |[0m Batch 336: loss = 2.2507, acc = 0.36230
[36malgo-1-ntdvh_1  |[0m Batch 337: loss = 2.4575, acc = 0.35840
[36malgo-1-ntdvh_1  |[0m Batch 338: loss = 2.2330, acc = 0.36133
[36malgo-1-ntdvh_1  |[0m Batch 339: loss = 2.2271, acc = 0.37500
[36malgo-1-ntdvh_1  |[0m Batch 340: loss = 2.2896, acc = 0.36816
[36malgo-1-ntdvh_1  |[0m Batch 341: loss = 2.2304, acc = 0.3

[36malgo-1-ntdvh_1  |[0m Batch 450: loss = 2.0345, acc = 0.40820
[36malgo-1-ntdvh_1  |[0m Batch 451: loss = 2.0849, acc = 0.41992
[36malgo-1-ntdvh_1  |[0m Batch 452: loss = 2.1143, acc = 0.38672
[36malgo-1-ntdvh_1  |[0m Batch 453: loss = 2.0777, acc = 0.43848
[36malgo-1-ntdvh_1  |[0m Batch 454: loss = 2.0156, acc = 0.42676
[36malgo-1-ntdvh_1  |[0m Batch 455: loss = 2.0854, acc = 0.41602
[36malgo-1-ntdvh_1  |[0m Batch 456: loss = 2.0443, acc = 0.42188
[36malgo-1-ntdvh_1  |[0m Batch 457: loss = 2.0030, acc = 0.44434
[36malgo-1-ntdvh_1  |[0m Batch 458: loss = 1.9636, acc = 0.43555
[36malgo-1-ntdvh_1  |[0m Batch 459: loss = 1.9582, acc = 0.41992
[36malgo-1-ntdvh_1  |[0m Batch 460: loss = 2.0705, acc = 0.41406
[36malgo-1-ntdvh_1  |[0m Batch 461: loss = 1.9716, acc = 0.42090
[36malgo-1-ntdvh_1  |[0m Batch 462: loss = 1.9016, acc = 0.44434
[36malgo-1-ntdvh_1  |[0m Batch 463: loss = 1.9884, acc = 0.42969
[36malgo-1-ntdvh_1  |[0m Batch 464: loss = 2.0433, acc = 0.4

[36malgo-1-ntdvh_1  |[0m Batch 573: loss = 1.8678, acc = 0.44922
[36malgo-1-ntdvh_1  |[0m Batch 574: loss = 1.9523, acc = 0.43555
[36malgo-1-ntdvh_1  |[0m Batch 575: loss = 1.8100, acc = 0.46289
[36malgo-1-ntdvh_1  |[0m Batch 576: loss = 1.9449, acc = 0.40723
[36malgo-1-ntdvh_1  |[0m Batch 577: loss = 1.8800, acc = 0.45215
[36malgo-1-ntdvh_1  |[0m Batch 578: loss = 1.8986, acc = 0.44238
[36malgo-1-ntdvh_1  |[0m Batch 579: loss = 1.8648, acc = 0.45508
[36malgo-1-ntdvh_1  |[0m Batch 580: loss = 1.9098, acc = 0.43945
[36malgo-1-ntdvh_1  |[0m Batch 581: loss = 1.9233, acc = 0.43848
[36malgo-1-ntdvh_1  |[0m Batch 582: loss = 1.9592, acc = 0.42676
[36malgo-1-ntdvh_1  |[0m Batch 583: loss = 1.8525, acc = 0.42871
[36malgo-1-ntdvh_1  |[0m Batch 584: loss = 1.8345, acc = 0.46680
[36malgo-1-ntdvh_1  |[0m Batch 585: loss = 1.9353, acc = 0.44336
[36malgo-1-ntdvh_1  |[0m Batch 586: loss = 1.9282, acc = 0.43359
[36malgo-1-ntdvh_1  |[0m Batch 587: loss = 1.8138, acc = 0.4

[36malgo-1-ntdvh_1  |[0m Batch 696: loss = 1.9508, acc = 0.47559
[36malgo-1-ntdvh_1  |[0m Batch 697: loss = 1.7494, acc = 0.47363
[36malgo-1-ntdvh_1  |[0m Batch 698: loss = 1.8118, acc = 0.45117
[36malgo-1-ntdvh_1  |[0m Batch 699: loss = 1.8047, acc = 0.45801
[36malgo-1-ntdvh_1  |[0m Batch 700: loss = 1.7614, acc = 0.45898
[36malgo-1-ntdvh_1  |[0m Batch 701: loss = 1.8407, acc = 0.45898
[36malgo-1-ntdvh_1  |[0m Batch 702: loss = 1.8838, acc = 0.43652
[36malgo-1-ntdvh_1  |[0m Batch 703: loss = 1.8103, acc = 0.47656
[36malgo-1-ntdvh_1  |[0m Batch 704: loss = 1.8396, acc = 0.46582
[36malgo-1-ntdvh_1  |[0m Batch 705: loss = 1.7752, acc = 0.47656
[36malgo-1-ntdvh_1  |[0m Batch 706: loss = 1.8081, acc = 0.45996
[36malgo-1-ntdvh_1  |[0m Batch 707: loss = 1.7903, acc = 0.47266
[36malgo-1-ntdvh_1  |[0m Batch 708: loss = 1.8136, acc = 0.46387
[36malgo-1-ntdvh_1  |[0m Batch 709: loss = 1.8803, acc = 0.43555
[36malgo-1-ntdvh_1  |[0m Batch 710: loss = 1.8016, acc = 0.4

[36malgo-1-ntdvh_1  |[0m Batch 819: loss = 1.6922, acc = 0.50488
[36malgo-1-ntdvh_1  |[0m Batch 820: loss = 1.6820, acc = 0.50293
[36malgo-1-ntdvh_1  |[0m Batch 821: loss = 1.6839, acc = 0.48926
[36malgo-1-ntdvh_1  |[0m Batch 822: loss = 1.7368, acc = 0.50098
[36malgo-1-ntdvh_1  |[0m Batch 823: loss = 1.6642, acc = 0.51367
[36malgo-1-ntdvh_1  |[0m Batch 824: loss = 1.8472, acc = 0.46094
[36malgo-1-ntdvh_1  |[0m Batch 825: loss = 1.7225, acc = 0.49121
[36malgo-1-ntdvh_1  |[0m Batch 826: loss = 1.6646, acc = 0.51367
[36malgo-1-ntdvh_1  |[0m Batch 827: loss = 1.7426, acc = 0.48633
[36malgo-1-ntdvh_1  |[0m Batch 828: loss = 1.7278, acc = 0.50586
[36malgo-1-ntdvh_1  |[0m Batch 829: loss = 1.7828, acc = 0.47070
[36malgo-1-ntdvh_1  |[0m Batch 830: loss = 1.7569, acc = 0.49805
[36malgo-1-ntdvh_1  |[0m Batch 831: loss = 1.8359, acc = 0.46094
[36malgo-1-ntdvh_1  |[0m Batch 832: loss = 1.6896, acc = 0.49121
[36malgo-1-ntdvh_1  |[0m Batch 833: loss = 1.7111, acc = 0.4

[36malgo-1-ntdvh_1  |[0m Batch 942: loss = 1.6229, acc = 0.53125
[36malgo-1-ntdvh_1  |[0m Batch 943: loss = 1.6894, acc = 0.50391
[36malgo-1-ntdvh_1  |[0m Batch 944: loss = 1.6966, acc = 0.49707
[36malgo-1-ntdvh_1  |[0m Batch 945: loss = 1.7212, acc = 0.49121
[36malgo-1-ntdvh_1  |[0m Batch 946: loss = 1.6916, acc = 0.48926
[36malgo-1-ntdvh_1  |[0m Batch 947: loss = 1.6747, acc = 0.50488
[36malgo-1-ntdvh_1  |[0m Batch 948: loss = 1.7277, acc = 0.48730
[36malgo-1-ntdvh_1  |[0m Batch 949: loss = 1.7355, acc = 0.47852
[36malgo-1-ntdvh_1  |[0m Batch 950: loss = 1.6690, acc = 0.50684
[36malgo-1-ntdvh_1  |[0m Batch 951: loss = 1.6204, acc = 0.51465
[36malgo-1-ntdvh_1  |[0m Batch 952: loss = 1.6669, acc = 0.49121
[36malgo-1-ntdvh_1  |[0m Batch 953: loss = 1.6806, acc = 0.50293
[36malgo-1-ntdvh_1  |[0m Batch 954: loss = 1.6830, acc = 0.49316
[36malgo-1-ntdvh_1  |[0m Batch 955: loss = 1.7017, acc = 0.48926
[36malgo-1-ntdvh_1  |[0m Batch 956: loss = 1.6076, acc = 0.5

[36malgo-1-ntdvh_1  |[0m Batch 1064: loss = 1.6963, acc = 0.50488
[36malgo-1-ntdvh_1  |[0m Batch 1065: loss = 1.7574, acc = 0.46680
[36malgo-1-ntdvh_1  |[0m Batch 1066: loss = 1.6468, acc = 0.51172
[36malgo-1-ntdvh_1  |[0m Batch 1067: loss = 1.7546, acc = 0.50684
[36malgo-1-ntdvh_1  |[0m Batch 1068: loss = 1.6564, acc = 0.51562
[36malgo-1-ntdvh_1  |[0m Batch 1069: loss = 1.7919, acc = 0.48828
[36malgo-1-ntdvh_1  |[0m Batch 1070: loss = 1.6112, acc = 0.50781
[36malgo-1-ntdvh_1  |[0m Batch 1071: loss = 1.6832, acc = 0.46582
[36malgo-1-ntdvh_1  |[0m Batch 1072: loss = 1.5681, acc = 0.52539
[36malgo-1-ntdvh_1  |[0m Batch 1073: loss = 1.7242, acc = 0.49414
[36malgo-1-ntdvh_1  |[0m Batch 1074: loss = 1.6083, acc = 0.51172
[36malgo-1-ntdvh_1  |[0m Batch 1075: loss = 1.6215, acc = 0.52246
[36malgo-1-ntdvh_1  |[0m Batch 1076: loss = 1.6169, acc = 0.50391
[36malgo-1-ntdvh_1  |[0m Batch 1077: loss = 1.5919, acc = 0.52148
[36malgo-1-ntdvh_1  |[0m Batch 1078: loss = 1.

[36malgo-1-ntdvh_1  |[0m Batch 1185: loss = 1.5478, acc = 0.51855
[36malgo-1-ntdvh_1  |[0m Batch 1186: loss = 1.5756, acc = 0.52832
[36malgo-1-ntdvh_1  |[0m Batch 1187: loss = 1.5928, acc = 0.53613
[36malgo-1-ntdvh_1  |[0m Batch 1188: loss = 1.6037, acc = 0.50391
[36malgo-1-ntdvh_1  |[0m Batch 1189: loss = 1.6122, acc = 0.51367
[36malgo-1-ntdvh_1  |[0m Batch 1190: loss = 1.6270, acc = 0.49609
[36malgo-1-ntdvh_1  |[0m Batch 1191: loss = 1.6510, acc = 0.50098
[36malgo-1-ntdvh_1  |[0m Batch 1192: loss = 1.5415, acc = 0.54785
[36malgo-1-ntdvh_1  |[0m Batch 1193: loss = 1.6684, acc = 0.49121
[36malgo-1-ntdvh_1  |[0m Batch 1194: loss = 1.6467, acc = 0.50879
[36malgo-1-ntdvh_1  |[0m Batch 1195: loss = 1.5441, acc = 0.52148
[36malgo-1-ntdvh_1  |[0m Batch 1196: loss = 1.6829, acc = 0.51270
[36malgo-1-ntdvh_1  |[0m Batch 1197: loss = 1.6431, acc = 0.50488
[36malgo-1-ntdvh_1  |[0m Batch 1198: loss = 1.5608, acc = 0.54297
[36malgo-1-ntdvh_1  |[0m Batch 1199: loss = 1.

Let's explain the values of `--data_dir` and `--model_dir` with more details:

- **/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data is downloaded to this folder because `training` is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. 

- **/opt/ml/model** use this directory to save models, checkpoints, or any other data. Any data saved in this folder is saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.

### Reading additional information from the container

Often, a user script needs additional information from the container that is not available in ```hyperparameters```.
SageMaker containers write this information as **environment variables** that are available inside the script.

For example, the example above can read information about the `training` channel provided in the training job request by adding the environment variable `SM_CHANNEL_TRAINING` as the default value for the `--data_dir` argument:

```python
if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # reads input channels training and testing from the environment variables
  parser.add_argument('--data_dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
```

Script mode displays the list of available environment variables in the training logs. You can find the [entire list here](https://github.com/aws/sagemaker-containers/blob/master/README.rst#list-of-provided-environment-variables-by-sagemaker-containers).

# Training in SageMaker

After you test the training job locally, upload the dataset to an S3 bucket so SageMaker can access the data during training:

In [1]:
import sagemaker

inputs = sagemaker.Session().upload_data(path='sherlock', key_prefix='datasets/sherlock')

The returned variable inputs above is a string with a S3 location which SageMaker Tranining has permissions
to read data from.

In [2]:
inputs

's3://sagemaker-us-east-1-951232522638/datasets/sherlock'

To train in SageMaker:
- change the estimator argument `train_instance_type` to any SageMaker ml instance available for training.
- set the `training` channel to a S3 location.

In [None]:
estimator = TensorFlow(entry_point='train.py',
                       source_dir='char-rnn-keras',
                       train_instance_type='ml.c4.xlarge', # Executes training in a ml.c4.xlarge instance
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       role=sagemaker.get_execution_role(),
                       framework_version='1.13',
                       py_version='py3',
                       script_mode=True)
             

estimator.fit({'training': inputs})

2019-08-06 01:11:38 Starting - Starting the training job...
2019-08-06 01:11:40 Starting - Launching requested ML instances.........
2019-08-06 01:13:12 Starting - Preparing the instances for training...
2019-08-06 01:14:04 Downloading - Downloading input data...
2019-08-06 01:14:21 Training - Downloading the training image.....
[31m2019-08-06 01:15:15,744 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[31m2019-08-06 01:15:15,749 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-08-06 01:15:16,248 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-08-06 01:15:16,264 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-08-06 01:15:16,281 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-08-06 01:15:16,294 sagemaker-containers INFO     Invoking user script
[0m
[31mTraining Env:
[0


2019-08-06 01:15:11 Training - Training image download completed. Training in progress.[31mBatch 10: loss = 3.2813, acc = 0.14453[0m
[31mBatch 11: loss = 3.2297, acc = 0.16309[0m
[31mBatch 12: loss = 3.1750, acc = 0.21973[0m
[31mBatch 13: loss = 3.0742, acc = 0.23438[0m
[31mBatch 14: loss = 3.0730, acc = 0.25195[0m
[31mBatch 15: loss = 3.0202, acc = 0.25684[0m
[31mBatch 16: loss = 3.0665, acc = 0.24707[0m
[31mBatch 17: loss = 3.0691, acc = 0.24707[0m
[31mBatch 18: loss = 3.0796, acc = 0.23730[0m
[31mBatch 19: loss = 3.0855, acc = 0.24902[0m
[31mBatch 20: loss = 2.9943, acc = 0.24609[0m
[31mBatch 21: loss = 3.0144, acc = 0.22363[0m
[31mBatch 22: loss = 2.9809, acc = 0.23047[0m
[31mBatch 23: loss = 3.0437, acc = 0.23828[0m
[31mBatch 24: loss = 3.0613, acc = 0.22070[0m
[31mBatch 25: loss = 3.0478, acc = 0.23535[0m
[31mBatch 26: loss = 3.0594, acc = 0.23633[0m
[31mBatch 27: loss = 2.9946, acc = 0.24023[0m
[31mBatch 28: loss = 3.0302, acc = 0.24121[0m


[31mBatch 190: loss = 2.5372, acc = 0.30273[0m
[31mBatch 191: loss = 2.5891, acc = 0.28711[0m
[31mBatch 192: loss = 2.5545, acc = 0.31055[0m
[31mBatch 193: loss = 2.5352, acc = 0.31348[0m
[31mBatch 194: loss = 2.5111, acc = 0.31152[0m
[31mBatch 195: loss = 2.4848, acc = 0.31738[0m
[31mBatch 196: loss = 2.4978, acc = 0.29785[0m
[31mBatch 197: loss = 2.5141, acc = 0.29883[0m
[31mBatch 198: loss = 2.5707, acc = 0.29297[0m
[31mBatch 199: loss = 2.4941, acc = 0.31836[0m
[31mBatch 200: loss = 2.5923, acc = 0.29492[0m
[31mBatch 201: loss = 2.4563, acc = 0.32617[0m
[31mBatch 202: loss = 2.5391, acc = 0.31152[0m
[31mBatch 203: loss = 2.6007, acc = 0.30078[0m
[31mBatch 204: loss = 2.5293, acc = 0.30371[0m
[31mBatch 205: loss = 2.4706, acc = 0.33398[0m
[31mBatch 206: loss = 2.5158, acc = 0.33301[0m
[31mBatch 207: loss = 2.5410, acc = 0.30176[0m
[31mBatch 208: loss = 2.5038, acc = 0.31934[0m
[31mBatch 209: loss = 2.5244, acc = 0.31836[0m
[31mBatch 210: loss

[31mBatch 371: loss = 1.9771, acc = 0.43750[0m
[31mBatch 372: loss = 2.1215, acc = 0.40137[0m
[31mBatch 373: loss = 2.0347, acc = 0.42871[0m
[31mBatch 374: loss = 2.1547, acc = 0.37305[0m
[31mBatch 375: loss = 2.1239, acc = 0.38672[0m
[31mBatch 376: loss = 2.1350, acc = 0.38770[0m
[31mBatch 377: loss = 2.1635, acc = 0.38086[0m
[31mBatch 378: loss = 2.0774, acc = 0.41016[0m
[31mBatch 379: loss = 2.1139, acc = 0.41113[0m
[31mBatch 380: loss = 2.1575, acc = 0.37891[0m
[31mBatch 381: loss = 2.1538, acc = 0.38379[0m
[31mBatch 382: loss = 2.1042, acc = 0.39355[0m
[31mBatch 383: loss = 2.1218, acc = 0.38184[0m
[31mBatch 384: loss = 2.1437, acc = 0.38574[0m
[31mBatch 385: loss = 2.1335, acc = 0.38477[0m
[31mBatch 386: loss = 2.2512, acc = 0.37207[0m
[31mBatch 387: loss = 2.0765, acc = 0.40137[0m
[31mBatch 388: loss = 2.0371, acc = 0.41504[0m
[31mBatch 389: loss = 2.0416, acc = 0.41895[0m
[31mBatch 390: loss = 2.1350, acc = 0.39258[0m
[31mBatch 391: loss

[31mBatch 553: loss = 1.9270, acc = 0.43945[0m
[31mBatch 554: loss = 1.9029, acc = 0.43652[0m
[31mBatch 555: loss = 1.9506, acc = 0.43359[0m
[31mBatch 556: loss = 1.9567, acc = 0.43555[0m
[31mBatch 557: loss = 1.8270, acc = 0.46191[0m
[31mBatch 558: loss = 1.8273, acc = 0.45215[0m
[31mBatch 559: loss = 1.9470, acc = 0.43750[0m
[31mBatch 560: loss = 1.8604, acc = 0.44336[0m
[31mBatch 561: loss = 1.9725, acc = 0.42383[0m
[31mBatch 562: loss = 1.9562, acc = 0.42969[0m
[31mBatch 563: loss = 1.8906, acc = 0.44727[0m
[31mBatch 564: loss = 1.8509, acc = 0.46094[0m
[31mBatch 565: loss = 1.9720, acc = 0.41699[0m
[31mBatch 566: loss = 1.9447, acc = 0.43164[0m
[31mBatch 567: loss = 1.8517, acc = 0.45215[0m
[31mBatch 568: loss = 1.9443, acc = 0.44043[0m
[31mBatch 569: loss = 1.9875, acc = 0.41113[0m
[31mBatch 570: loss = 1.9377, acc = 0.42480[0m
[31mBatch 571: loss = 1.9097, acc = 0.42285[0m
[31mBatch 572: loss = 1.8421, acc = 0.43945[0m
[31mBatch 573: loss

[31mBatch 733: loss = 1.7662, acc = 0.47266[0m
[31mBatch 734: loss = 1.8110, acc = 0.49121[0m
[31mBatch 735: loss = 1.8874, acc = 0.44434[0m
[31mBatch 736: loss = 1.8992, acc = 0.43945[0m
[31mBatch 737: loss = 1.7888, acc = 0.47363[0m
[31mBatch 738: loss = 1.7895, acc = 0.46484[0m
[31mBatch 739: loss = 1.8390, acc = 0.45605[0m
[31mBatch 740: loss = 1.8654, acc = 0.44629[0m
[31mBatch 741: loss = 1.8551, acc = 0.45605[0m
[31mBatch 742: loss = 1.8291, acc = 0.43652[0m
[31mBatch 743: loss = 1.7723, acc = 0.49609[0m
[31mBatch 744: loss = 1.8402, acc = 0.46582[0m
[31mBatch 745: loss = 1.8937, acc = 0.44238[0m
[31mBatch 746: loss = 1.9155, acc = 0.44141[0m
[31mBatch 747: loss = 1.9286, acc = 0.44141[0m
[31mBatch 748: loss = 1.8628, acc = 0.47266[0m
[31mBatch 749: loss = 1.8188, acc = 0.46973[0m
[31mBatch 750: loss = 1.9733, acc = 0.43457[0m
[31mBatch 751: loss = 1.8658, acc = 0.47461[0m
[31mBatch 752: loss = 1.7925, acc = 0.48145[0m
[31mBatch 753: loss

[31mBatch 901: loss = 1.7037, acc = 0.50195[0m
[31mBatch 902: loss = 1.7251, acc = 0.51270[0m
[31mBatch 903: loss = 1.7112, acc = 0.47949[0m
[31mBatch 904: loss = 1.7177, acc = 0.47754[0m
[31mBatch 905: loss = 1.8658, acc = 0.45508[0m
[31mBatch 906: loss = 1.7263, acc = 0.48438[0m
[31mBatch 907: loss = 1.8066, acc = 0.47656[0m
[31mBatch 908: loss = 1.7988, acc = 0.46973[0m
[31mBatch 909: loss = 1.6407, acc = 0.50000[0m
[31mBatch 910: loss = 1.7720, acc = 0.47656[0m
[31mBatch 911: loss = 1.7030, acc = 0.49219[0m
[31mBatch 912: loss = 1.7897, acc = 0.47070[0m
[31mBatch 913: loss = 1.6903, acc = 0.47852[0m
[31mBatch 914: loss = 1.6968, acc = 0.49512[0m
[31mBatch 915: loss = 1.7233, acc = 0.46973[0m
[31mBatch 916: loss = 1.7381, acc = 0.48047[0m
[31mBatch 917: loss = 1.7281, acc = 0.48730[0m
[31mBatch 918: loss = 1.7631, acc = 0.48535[0m
