# Use Script Mode to train any TensorFlow script from GitHub in SageMaker
In this tutorial, you train a TensorFlow script in SageMaker using the new Script Mode Tensorflow Container.

For this example, you use [Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow), but you can use the same technique for other scripts or repositories. For example, [TensorFlow Model Zoo](https://github.com/tensorflow/models) and [TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks).

## Set up the environment
Let's start by creating a SageMaker session and specifying the following:

- The S3 bucket and prefix to use for training and model data. The bucket should be in the same region as the Notebook Instance, training instance(s), and hosting instance(s). This example uses the default bucket that a SageMaker Session creates.
- The IAM role that allows SageMaker services to access your data. For more information about using IAM roles in SageMaker, see  [Amazon SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).\n

In [6]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()

# role = sagemaker.get_execution_role()

role = 'SageMakerRole'

## Clone the repository
Run the following command to clone the repository that contains the example:

In [2]:
!git clone https://github.com/sherjilozair/char-rnn-tensorflow > /dev/null 2>&1

This repository includes a README.md with an overview of the project, requirements, and basic usage:

## Get the data
For training data, use plain text versions of Sherlock Holmes stories.

In [3]:
!mkdir sherlock
!wget https://sherlock-holm.es/stories/plain-text/cnus.txt --force-directories --output-document=sherlock/input.txt

mkdir: sherlock: File exists
--2018-11-17 16:46:46--  https://sherlock-holm.es/stories/plain-text/cnus.txt
Resolving sherlock-holm.es (sherlock-holm.es)... 78.46.175.31
Connecting to sherlock-holm.es (sherlock-holm.es)|78.46.175.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3382026 (3.2M) [text/plain]
Saving to: ‘sherlock/input.txt’


2018-11-17 16:46:49 (1.49 MB/s) - ‘sherlock/input.txt’ saved [3382026/3382026]



## Test locally
Use [Local Mode](https://github.com/aws/sagemaker-python-sdk#local-mode) to run the script locally in the notebook instance before you run a SageMaker training job:

## THIS WILL BREAK, IT IS OK

In [11]:
import os
from sagemaker.tensorflow import TensorFlow

hyperparameters = {'num_epochs': 1, 
                   'data_dir': '/opt/ml/input/data/training'}

estimator = TensorFlow(entry_point='train.py', framework_version='1.11', py_version='py3',
                                 source_dir='char-rnn-tensorflow',
                                 train_instance_type='local',      # Run in local mode
                                 train_instance_count=1,
                                 hyperparameters=hyperparameters,
                                 role=role)

estimator.fit({'training': 'file://%s' % os.path.join(os.getcwd(), 'sherlock')})

INFO:sagemaker:Creating training-job with name: sagemaker-tensorflow-scriptmode-2018-11-18-00-57-30-541


Creating tmpuvgw0qfu_algo-1-YLCIL_1_3672e6e48602 ... 
[1BAttaching to tmpuvgw0qfu_algo-1-YLCIL_1_8755ed331d1d2mdone[0m
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,133 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,151 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,166 sagemaker_tensorflow_container.training INFO     parameter_server_enabled is False
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,167 sagemaker_tensorflow_container.training INFO     Running training job without parameter servers
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,661 sagemaker-containers INFO     Module train does not provide a setup.py. 
[36malgo-1-YLCIL_1_8755ed331d1d |[0m Generating setup.py
[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:35,662 sagemaker-containers INF

[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding 'train.py'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding 'utils.py'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/HEAD'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/config'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/description'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/index'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/packed-refs'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/applypatch-msg.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/commit-msg.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/fsmonitor-watchman.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/post-update.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/pre-applypatch.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/pre-commit.sample'
[36malgo-1-YLCIL_1_8755ed331d1d |[0m   adding '.git/hooks/pre-push.sampl

[36malgo-1-YLCIL_1_8755ed331d1d |[0m 2018-11-18 00:57:38,417 sagemaker-containers ERROR    ExecuteUserScriptError:
[36malgo-1-YLCIL_1_8755ed331d1d |[0m Command "/usr/bin/python -m train --data_dir /opt/ml/input/data/training --model_dir s3://sagemaker-us-west-2-369233609183/sagemaker-tensorflow-scriptmode-2018-11-18-00-57-30-541/model --num_epochs 1 --save_dir /opt/ml/model"
[36malgo-1-YLCIL_1_8755ed331d1d |[0m usage: train.py [-h] [--data_dir DATA_DIR] [--save_dir SAVE_DIR]
[36malgo-1-YLCIL_1_8755ed331d1d |[0m                 [--log_dir LOG_DIR] [--save_every SAVE_EVERY]
[36malgo-1-YLCIL_1_8755ed331d1d |[0m                 [--init_from INIT_FROM] [--model MODEL] [--rnn_size RNN_SIZE]
[36malgo-1-YLCIL_1_8755ed331d1d |[0m                 [--num_layers NUM_LAYERS] [--seq_length SEQ_LENGTH]
[36malgo-1-YLCIL_1_8755ed331d1d |[0m                 [--batch_size BATCH_SIZE] [--num_epochs NUM_EPOCHS]
[36malgo-1-YLCIL_1_8755ed331d1d |[0m                 [--grad_clip GRAD_CLIP] [--

RuntimeError: Failed to run: ['docker-compose', '-f', '/private/var/folders/r4/6vbcymq564x9g4_bsq1ystss0hvddl/T/tmpuvgw0qfu/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit'], Process exited with code: 1

The model dir

customers need to cvhange their script to handle the model dir. IN our script, please go to train.py and change save-dir to model dir

In [16]:
!sed -i -e 's/save_dir/model_dir/g' char-rnn-tensorflow/train.py

## TRY TO RUN AGAIN

In [18]:
import os
from sagemaker.tensorflow import TensorFlow

hyperparameters = {'num_epochs': 1, 
                   'data_dir': '/opt/ml/input/data/training'}

estimator = TensorFlow(entry_point='train.py', framework_version='1.11', py_version='py3',
                                 source_dir='char-rnn-tensorflow',
                                 train_instance_type='local',      # Run in local mode
                                 train_instance_count=1,
                                 hyperparameters=hyperparameters,
                                 role=role)

estimator.fit({'training': 'file://%s' % os.path.join(os.getcwd(), 'sherlock')})

INFO:sagemaker:Creating training-job with name: sagemaker-tensorflow-scriptmode-2018-11-18-01-04-51-826


Creating tmpeudjj1ri_algo-1-107J5_1_3019d52c778a ... 
[1BAttaching to tmpeudjj1ri_algo-1-107J5_1_4ba1bfdba5ac2mdone[0m
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:55,792 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:55,815 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:55,838 sagemaker_tensorflow_container.training INFO     parameter_server_enabled is False
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:55,838 sagemaker_tensorflow_container.training INFO     Running training job without parameter servers
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:56,510 sagemaker-containers INFO     Module train does not provide a setup.py. 
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m Generating setup.py
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:56,511 sagemaker-containers INF

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/refs
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/refs/heads
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   copying build/lib/.git/refs/heads/master -> build/bdist.linux-x86_64/wheel/.git/refs/heads
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/refs/remotes
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/refs/remotes/origin
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   copying build/lib/.git/refs/remotes/origin/HEAD -> build/bdist.linux-x86_64/wheel/.git/refs/remotes/origin
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/objects
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   creating build/bdist.linux-x86_64/wheel/.git/objects/pack
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   copying build/lib/.git/objects/pack/pack-066ab304ffc043ea536c4521ed763915603ee3d6.pac

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'pip-egg-info/train.egg-info/SOURCES.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'pip-egg-info/train.egg-info/dependency_links.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'pip-egg-info/train.egg-info/top_level.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'save/.gitignore'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train.egg-info/PKG-INFO'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train.egg-info/SOURCES.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train.egg-info/dependency_links.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train.egg-info/top_level.txt'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train-1.0.0.dist-info/LICENSE.md'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train-1.0.0.dist-info/METADATA'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train-1.0.0.dist-info/WHEEL'
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m   adding 'train-1.0.0.dist-info/top_level.txt'
[36malgo

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:59,113 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 2018-11-18 01:04:59,135 sagemaker-containers INFO     Invoking user script
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m Training Env:
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m {
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     "additional_framework_parameters": {},
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     "channel_input_dirs": {
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m         "training": "/opt/ml/input/data/training"
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     },
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     "current_host": "algo-1-107J5",
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     "framework_module": "sagemaker_tensorflow_container.training:main",
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m     "hosts": [
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m         "

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 9/1352 (epoch 0), train_loss = 3.119, time/batch = 0.363
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 10/1352 (epoch 0), train_loss = 3.055, time/batch = 0.261
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 11/1352 (epoch 0), train_loss = 3.042, time/batch = 0.193
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 12/1352 (epoch 0), train_loss = 3.038, time/batch = 0.173
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 13/1352 (epoch 0), train_loss = 3.025, time/batch = 0.281
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 14/1352 (epoch 0), train_loss = 3.039, time/batch = 0.200
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 15/1352 (epoch 0), train_loss = 2.988, time/batch = 0.512
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 16/1352 (epoch 0), train_loss = 3.024, time/batch = 0.250
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 17/1352 (epoch 0), train_loss = 3.029, time/batch = 0.183
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 18/1352 (epoch 0), train_loss = 2.996, time/batch = 0.224
[36malgo-1-107J5_1_4ba1bfdba5a

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 95/1352 (epoch 0), train_loss = 2.832, time/batch = 0.167
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 96/1352 (epoch 0), train_loss = 2.803, time/batch = 0.142
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 97/1352 (epoch 0), train_loss = 2.806, time/batch = 0.148
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 98/1352 (epoch 0), train_loss = 2.798, time/batch = 0.135
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 99/1352 (epoch 0), train_loss = 2.843, time/batch = 0.147
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 100/1352 (epoch 0), train_loss = 2.744, time/batch = 0.154
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 101/1352 (epoch 0), train_loss = 2.774, time/batch = 0.158
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 102/1352 (epoch 0), train_loss = 2.764, time/batch = 0.124
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 103/1352 (epoch 0), train_loss = 2.744, time/batch = 0.148
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 104/1352 (epoch 0), train_loss = 2.717, time/batch = 0.136
[36malgo-1-107J5_1_4ba1b

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 180/1352 (epoch 0), train_loss = 2.221, time/batch = 0.141
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 181/1352 (epoch 0), train_loss = 2.190, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 182/1352 (epoch 0), train_loss = 2.280, time/batch = 0.211
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 183/1352 (epoch 0), train_loss = 2.280, time/batch = 0.256
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 184/1352 (epoch 0), train_loss = 2.308, time/batch = 0.167
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 185/1352 (epoch 0), train_loss = 2.283, time/batch = 0.184
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 186/1352 (epoch 0), train_loss = 2.326, time/batch = 0.213
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 187/1352 (epoch 0), train_loss = 2.258, time/batch = 0.130
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 188/1352 (epoch 0), train_loss = 2.225, time/batch = 0.138
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 189/1352 (epoch 0), train_loss = 2.178, time/batch = 0.122
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 264/1352 (epoch 0), train_loss = 2.038, time/batch = 0.169
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 265/1352 (epoch 0), train_loss = 2.106, time/batch = 0.116
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 266/1352 (epoch 0), train_loss = 2.056, time/batch = 0.150
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 267/1352 (epoch 0), train_loss = 2.013, time/batch = 0.302
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 268/1352 (epoch 0), train_loss = 2.080, time/batch = 0.211
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 269/1352 (epoch 0), train_loss = 2.064, time/batch = 0.244
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 270/1352 (epoch 0), train_loss = 2.100, time/batch = 0.157
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 271/1352 (epoch 0), train_loss = 2.041, time/batch = 0.150
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 272/1352 (epoch 0), train_loss = 2.087, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 273/1352 (epoch 0), train_loss = 2.147, time/batch = 0.141
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 348/1352 (epoch 0), train_loss = 2.101, time/batch = 0.208
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 349/1352 (epoch 0), train_loss = 1.997, time/batch = 0.119
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 350/1352 (epoch 0), train_loss = 1.978, time/batch = 0.137
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 351/1352 (epoch 0), train_loss = 1.944, time/batch = 0.143
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 352/1352 (epoch 0), train_loss = 1.950, time/batch = 0.174
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 353/1352 (epoch 0), train_loss = 1.945, time/batch = 0.122
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 354/1352 (epoch 0), train_loss = 1.985, time/batch = 0.157
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 355/1352 (epoch 0), train_loss = 2.011, time/batch = 0.180
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 356/1352 (epoch 0), train_loss = 1.997, time/batch = 0.130
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 357/1352 (epoch 0), train_loss = 1.932, time/batch = 0.120
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 432/1352 (epoch 0), train_loss = 1.906, time/batch = 0.288
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 433/1352 (epoch 0), train_loss = 1.910, time/batch = 0.176
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 434/1352 (epoch 0), train_loss = 1.934, time/batch = 0.249
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 435/1352 (epoch 0), train_loss = 1.994, time/batch = 0.166
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 436/1352 (epoch 0), train_loss = 1.961, time/batch = 0.156
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 437/1352 (epoch 0), train_loss = 1.950, time/batch = 0.184
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 438/1352 (epoch 0), train_loss = 1.953, time/batch = 0.155
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 439/1352 (epoch 0), train_loss = 1.966, time/batch = 0.380
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 440/1352 (epoch 0), train_loss = 1.909, time/batch = 0.215
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 441/1352 (epoch 0), train_loss = 1.953, time/batch = 0.152
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 517/1352 (epoch 0), train_loss = 1.905, time/batch = 0.152
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 518/1352 (epoch 0), train_loss = 1.882, time/batch = 0.176
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 519/1352 (epoch 0), train_loss = 1.785, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 520/1352 (epoch 0), train_loss = 1.891, time/batch = 0.134
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 521/1352 (epoch 0), train_loss = 1.878, time/batch = 0.127
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 522/1352 (epoch 0), train_loss = 1.890, time/batch = 0.138
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 523/1352 (epoch 0), train_loss = 1.854, time/batch = 0.127
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 524/1352 (epoch 0), train_loss = 1.859, time/batch = 0.121
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 525/1352 (epoch 0), train_loss = 1.857, time/batch = 0.155
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 526/1352 (epoch 0), train_loss = 1.828, time/batch = 0.118
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 602/1352 (epoch 0), train_loss = 1.822, time/batch = 0.148
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 603/1352 (epoch 0), train_loss = 1.812, time/batch = 0.134
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 604/1352 (epoch 0), train_loss = 1.854, time/batch = 0.135
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 605/1352 (epoch 0), train_loss = 1.801, time/batch = 0.182
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 606/1352 (epoch 0), train_loss = 1.848, time/batch = 0.173
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 607/1352 (epoch 0), train_loss = 1.783, time/batch = 0.140
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 608/1352 (epoch 0), train_loss = 1.849, time/batch = 0.164
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 609/1352 (epoch 0), train_loss = 1.820, time/batch = 0.132
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 610/1352 (epoch 0), train_loss = 1.777, time/batch = 0.132
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 611/1352 (epoch 0), train_loss = 1.880, time/batch = 0.130
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 687/1352 (epoch 0), train_loss = 1.796, time/batch = 0.150
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 688/1352 (epoch 0), train_loss = 1.770, time/batch = 0.134
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 689/1352 (epoch 0), train_loss = 1.806, time/batch = 0.127
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 690/1352 (epoch 0), train_loss = 1.784, time/batch = 0.138
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 691/1352 (epoch 0), train_loss = 1.775, time/batch = 0.151
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 692/1352 (epoch 0), train_loss = 1.793, time/batch = 0.131
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 693/1352 (epoch 0), train_loss = 1.797, time/batch = 0.139
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 694/1352 (epoch 0), train_loss = 1.745, time/batch = 0.135
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 695/1352 (epoch 0), train_loss = 1.752, time/batch = 0.129
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 696/1352 (epoch 0), train_loss = 1.769, time/batch = 0.121
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 772/1352 (epoch 0), train_loss = 1.808, time/batch = 0.147
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 773/1352 (epoch 0), train_loss = 1.777, time/batch = 0.129
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 774/1352 (epoch 0), train_loss = 1.816, time/batch = 0.184
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 775/1352 (epoch 0), train_loss = 1.798, time/batch = 0.272
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 776/1352 (epoch 0), train_loss = 1.748, time/batch = 0.112
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 777/1352 (epoch 0), train_loss = 1.752, time/batch = 0.148
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 778/1352 (epoch 0), train_loss = 1.750, time/batch = 0.132
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 779/1352 (epoch 0), train_loss = 1.801, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 780/1352 (epoch 0), train_loss = 1.769, time/batch = 0.118
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 781/1352 (epoch 0), train_loss = 1.858, time/batch = 0.123
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 857/1352 (epoch 0), train_loss = 1.735, time/batch = 0.240
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 858/1352 (epoch 0), train_loss = 1.756, time/batch = 0.174
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 859/1352 (epoch 0), train_loss = 1.747, time/batch = 0.495
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 860/1352 (epoch 0), train_loss = 1.736, time/batch = 0.286
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 861/1352 (epoch 0), train_loss = 1.789, time/batch = 0.225
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 862/1352 (epoch 0), train_loss = 1.791, time/batch = 0.143
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 863/1352 (epoch 0), train_loss = 1.775, time/batch = 0.160
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 864/1352 (epoch 0), train_loss = 1.703, time/batch = 0.303
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 865/1352 (epoch 0), train_loss = 1.805, time/batch = 0.322
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 866/1352 (epoch 0), train_loss = 1.796, time/batch = 0.242
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 942/1352 (epoch 0), train_loss = 1.622, time/batch = 0.153
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 943/1352 (epoch 0), train_loss = 1.754, time/batch = 0.123
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 944/1352 (epoch 0), train_loss = 1.690, time/batch = 0.146
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 945/1352 (epoch 0), train_loss = 1.808, time/batch = 0.143
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 946/1352 (epoch 0), train_loss = 1.708, time/batch = 0.126
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 947/1352 (epoch 0), train_loss = 1.658, time/batch = 0.131
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 948/1352 (epoch 0), train_loss = 1.731, time/batch = 0.136
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 949/1352 (epoch 0), train_loss = 1.707, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 950/1352 (epoch 0), train_loss = 1.723, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 951/1352 (epoch 0), train_loss = 1.686, time/batch = 0.134
[36malgo-1-107J5_1_

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1024/1352 (epoch 0), train_loss = 1.693, time/batch = 0.130
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1025/1352 (epoch 0), train_loss = 1.643, time/batch = 0.135
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1026/1352 (epoch 0), train_loss = 1.684, time/batch = 0.133
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1027/1352 (epoch 0), train_loss = 1.659, time/batch = 0.124
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1028/1352 (epoch 0), train_loss = 1.757, time/batch = 0.141
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1029/1352 (epoch 0), train_loss = 1.777, time/batch = 0.138
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1030/1352 (epoch 0), train_loss = 1.695, time/batch = 0.148
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1031/1352 (epoch 0), train_loss = 1.728, time/batch = 0.120
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1032/1352 (epoch 0), train_loss = 1.641, time/batch = 0.144
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1033/1352 (epoch 0), train_loss = 1.644, time/batch = 0.132
[36malgo-

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1107/1352 (epoch 0), train_loss = 1.674, time/batch = 0.134
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1108/1352 (epoch 0), train_loss = 1.652, time/batch = 0.127
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1109/1352 (epoch 0), train_loss = 1.689, time/batch = 0.138
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1110/1352 (epoch 0), train_loss = 1.666, time/batch = 0.131
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1111/1352 (epoch 0), train_loss = 1.660, time/batch = 0.180
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1112/1352 (epoch 0), train_loss = 1.727, time/batch = 0.156
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1113/1352 (epoch 0), train_loss = 1.712, time/batch = 0.135
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1114/1352 (epoch 0), train_loss = 1.664, time/batch = 0.139
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1115/1352 (epoch 0), train_loss = 1.762, time/batch = 0.140
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1116/1352 (epoch 0), train_loss = 1.621, time/batch = 0.186
[36malgo-

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1191/1352 (epoch 0), train_loss = 1.632, time/batch = 0.201
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1192/1352 (epoch 0), train_loss = 1.591, time/batch = 0.310
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1193/1352 (epoch 0), train_loss = 1.663, time/batch = 0.330
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1194/1352 (epoch 0), train_loss = 1.644, time/batch = 0.217
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1195/1352 (epoch 0), train_loss = 1.588, time/batch = 0.205
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1196/1352 (epoch 0), train_loss = 1.669, time/batch = 0.199
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1197/1352 (epoch 0), train_loss = 1.656, time/batch = 0.202
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1198/1352 (epoch 0), train_loss = 1.670, time/batch = 0.191
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1199/1352 (epoch 0), train_loss = 1.617, time/batch = 0.168
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1200/1352 (epoch 0), train_loss = 1.632, time/batch = 0.208
[36malgo-

[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1274/1352 (epoch 0), train_loss = 1.685, time/batch = 0.259
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1275/1352 (epoch 0), train_loss = 1.716, time/batch = 0.378
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1276/1352 (epoch 0), train_loss = 1.674, time/batch = 0.217
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1277/1352 (epoch 0), train_loss = 1.617, time/batch = 0.308
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1278/1352 (epoch 0), train_loss = 1.582, time/batch = 0.304
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1279/1352 (epoch 0), train_loss = 1.588, time/batch = 0.293
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1280/1352 (epoch 0), train_loss = 1.623, time/batch = 0.227
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1281/1352 (epoch 0), train_loss = 1.762, time/batch = 0.141
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1282/1352 (epoch 0), train_loss = 1.624, time/batch = 0.128
[36malgo-1-107J5_1_4ba1bfdba5ac |[0m 1283/1352 (epoch 0), train_loss = 1.662, time/batch = 0.129
[36malgo-

# SO COOL HOW IT WORKS?
# This text is old, now the customer must handle the model_dir hyperparameter
## How Script Mode executes the script in the container

The above cell downloads a Python 3 CPU container locally and simulates a SageMaker training job. When training starts, script mode installs the user script as a Python module. The module name matches the script name. In this case, **train.py** is transformed into a Python module named **train**.

After that, the Python interpreter executes the user module, passing **hyperparameters** as script arguments. The example above is executed as follows:
```bash
python -m train --num-epochs 1 --data-dir /opt/ml/input/data/training --save-dir /opt/ml/model
```

The **train** module consumes the hyperparameters using any argument parsing library. [The example we're using](https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/train.py#L11) uses the Python [argparse](https://docs.python.org/3/library/argparse.html) library:

```python
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# Data and model checkpoints directories
parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare', help='data directory containing input.txt with training examples')
parser.add_argument('--save_dir', type=str, default='save', help='directory to store checkpointed models')
...
args = parser.parse_args()

```


Let's explain the values of **--data_dir** and **--save-dir**:

- **/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data is downloaded to this folder because **training** is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. 

- **/opt/ml/model** use this directory to save models, checkpoints, or any other data. Any data saved in this folder is saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.

### Reading additional information from the container

Often, a user script needs additional information from the container that is not available in ```hyperparameters```.
SageMaker containers write this information as **environment variables** that are available inside the script.

For example, the example above can read information about the **training** channel provided in the training job request by adding the environment variable `SM_CHANNEL_TRAINING` as the default value for the `--data_dir` argument:

```python
if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # reads input channels training and testing from the environment variables
  parser.add_argument('--data_dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
```

Script mode displays the list of available environment variables in the training logs. You can find the [entire list here](https://github.com/aws/sagemaker-containers/blob/master/README.md#environment-variables-full-specification).

# Training in SageMaker
After you test the training job locally, upload the dataset to an S3 bucket so SageMaker can access the data during training.


In [None]:
inputs = sagemaker_session.upload_data(path='sherlock', bucket=bucket, key_prefix='datasets/sherlock')

To train in SageMaker, change the estimator argument train_instance_type to any SageMaker ml instance available for training. For example:

In [None]:
hyperparameters = {'num_epochs': 1, 
                   'data_dir': '/opt/ml/input/data/training'}

estimator = TensorFlow(entry_point='train.py', framework_version='1.11', py_version='py3',
                                 source_dir='char-rnn-tensorflow',
                                 train_instance_type='ml.c5.xlarge', 
                                 train_instance_count=1,
                                 hyperparameters=hyperparameters,
                                 role=role)

estimator.fit({'training': inputs})

INFO:sagemaker:Creating training-job with name: sagemaker-tensorflow-scriptmode-2018-11-18-01-09-43-731


2018-11-18 01:09:45 Starting - Starting the training job...
2018-11-18 01:09:48 Starting - Launching requested ML instances......
2018-11-18 01:10:51 Starting - Preparing the instances for training...
2018-11-18 01:11:44 Downloading - Downloading input data
2018-11-18 01:11:44 Training - Training image download completed. Training in progress..
[31m2018-11-18 01:11:44,373 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[31m2018-11-18 01:11:44,376 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2018-11-18 01:11:44,386 sagemaker_tensorflow_container.training INFO     parameter_server_enabled is False[0m
[31m2018-11-18 01:11:44,386 sagemaker_tensorflow_container.training INFO     Running training job without parameter servers[0m
[31m2018-11-18 01:11:44,654 sagemaker-containers INFO     Module train does not provide a setup.py. [0m
[31mGenerating setup.py[0m
[31m2018-11-18 01:11:44,654 sagemaker-co

[31m0/1352 (epoch 0), train_loss = 4.597, time/batch = 7.273[0m
[31mmodel saved to s3://sagemaker-us-west-2-369233609183/sagemaker-tensorflow-scriptmode-2018-11-18-01-09-43-731/model/model.ckpt[0m
[31m1/1352 (epoch 0), train_loss = 4.536, time/batch = 0.095[0m
[31m2/1352 (epoch 0), train_loss = 4.379, time/batch = 0.091[0m
[31m3/1352 (epoch 0), train_loss = 3.984, time/batch = 0.097[0m
[31m4/1352 (epoch 0), train_loss = 3.578, time/batch = 0.089[0m
[31m5/1352 (epoch 0), train_loss = 3.419, time/batch = 0.134[0m
[31m6/1352 (epoch 0), train_loss = 3.336, time/batch = 0.092[0m
[31m7/1352 (epoch 0), train_loss = 3.232, time/batch = 0.090[0m
[31m8/1352 (epoch 0), train_loss = 3.183, time/batch = 0.092[0m
[31m9/1352 (epoch 0), train_loss = 3.088, time/batch = 0.095[0m
[31m10/1352 (epoch 0), train_loss = 3.044, time/batch = 0.096[0m
[31m11/1352 (epoch 0), train_loss = 3.036, time/batch = 0.101[0m
[31m12/1352 (epoch 0), train_loss = 3.059, time/batch = 0.092[0m
[31

[31m198/1352 (epoch 0), train_loss = 2.184, time/batch = 0.105[0m
[31m199/1352 (epoch 0), train_loss = 2.182, time/batch = 0.089[0m
[31m200/1352 (epoch 0), train_loss = 2.221, time/batch = 0.093[0m
[31m201/1352 (epoch 0), train_loss = 2.209, time/batch = 0.091[0m
[31m202/1352 (epoch 0), train_loss = 2.211, time/batch = 0.141[0m
[31m203/1352 (epoch 0), train_loss = 2.183, time/batch = 0.089[0m
[31m204/1352 (epoch 0), train_loss = 2.170, time/batch = 0.094[0m
[31m205/1352 (epoch 0), train_loss = 2.148, time/batch = 0.094[0m
[31m206/1352 (epoch 0), train_loss = 2.169, time/batch = 0.086[0m
[31m207/1352 (epoch 0), train_loss = 2.165, time/batch = 0.088[0m
[31m208/1352 (epoch 0), train_loss = 2.131, time/batch = 0.098[0m
[31m209/1352 (epoch 0), train_loss = 2.176, time/batch = 0.095[0m
[31m210/1352 (epoch 0), train_loss = 2.131, time/batch = 0.095[0m
[31m211/1352 (epoch 0), train_loss = 2.219, time/batch = 0.094[0m
[31m212/1352 (epoch 0), train_loss = 2.143, tim

[31m407/1352 (epoch 0), train_loss = 1.987, time/batch = 0.099[0m
[31m408/1352 (epoch 0), train_loss = 1.925, time/batch = 0.097[0m
[31m409/1352 (epoch 0), train_loss = 1.989, time/batch = 0.095[0m
[31m410/1352 (epoch 0), train_loss = 1.905, time/batch = 0.093[0m
[31m411/1352 (epoch 0), train_loss = 1.879, time/batch = 0.092[0m
[31m412/1352 (epoch 0), train_loss = 1.916, time/batch = 0.093[0m
[31m413/1352 (epoch 0), train_loss = 1.935, time/batch = 0.082[0m
[31m414/1352 (epoch 0), train_loss = 1.938, time/batch = 0.095[0m
[31m415/1352 (epoch 0), train_loss = 1.970, time/batch = 0.092[0m
[31m416/1352 (epoch 0), train_loss = 1.949, time/batch = 0.091[0m
[31m417/1352 (epoch 0), train_loss = 1.926, time/batch = 0.141[0m
[31m418/1352 (epoch 0), train_loss = 1.942, time/batch = 0.093[0m
[31m419/1352 (epoch 0), train_loss = 1.935, time/batch = 0.095[0m
[31m420/1352 (epoch 0), train_loss = 1.930, time/batch = 0.091[0m
[31m421/1352 (epoch 0), train_loss = 1.907, tim

[31m564/1352 (epoch 0), train_loss = 1.813, time/batch = 0.089[0m
[31m565/1352 (epoch 0), train_loss = 1.818, time/batch = 0.099[0m
[31m566/1352 (epoch 0), train_loss = 1.855, time/batch = 0.095[0m
[31m567/1352 (epoch 0), train_loss = 1.903, time/batch = 0.098[0m
[31m568/1352 (epoch 0), train_loss = 1.821, time/batch = 0.113[0m
[31m569/1352 (epoch 0), train_loss = 1.801, time/batch = 0.123[0m
[31m570/1352 (epoch 0), train_loss = 1.788, time/batch = 0.094[0m
[31m571/1352 (epoch 0), train_loss = 1.820, time/batch = 0.099[0m
[31m572/1352 (epoch 0), train_loss = 1.786, time/batch = 0.098[0m
[31m573/1352 (epoch 0), train_loss = 1.843, time/batch = 0.095[0m
[31m574/1352 (epoch 0), train_loss = 1.831, time/batch = 0.097[0m
[31m575/1352 (epoch 0), train_loss = 1.807, time/batch = 0.093[0m
[31m576/1352 (epoch 0), train_loss = 1.839, time/batch = 0.097[0m
[31m577/1352 (epoch 0), train_loss = 1.847, time/batch = 0.100[0m
[31m578/1352 (epoch 0), train_loss = 1.880, tim

[31m772/1352 (epoch 0), train_loss = 1.775, time/batch = 0.097[0m
[31m773/1352 (epoch 0), train_loss = 1.752, time/batch = 0.091[0m
[31m774/1352 (epoch 0), train_loss = 1.808, time/batch = 0.098[0m
[31m775/1352 (epoch 0), train_loss = 1.801, time/batch = 0.087[0m
[31m776/1352 (epoch 0), train_loss = 1.736, time/batch = 0.093[0m
[31m777/1352 (epoch 0), train_loss = 1.721, time/batch = 0.095[0m
[31m778/1352 (epoch 0), train_loss = 1.737, time/batch = 0.088[0m
[31m779/1352 (epoch 0), train_loss = 1.761, time/batch = 0.096[0m
[31m780/1352 (epoch 0), train_loss = 1.750, time/batch = 0.087[0m
[31m781/1352 (epoch 0), train_loss = 1.840, time/batch = 0.089[0m
[31m782/1352 (epoch 0), train_loss = 1.722, time/batch = 0.090[0m
[31m783/1352 (epoch 0), train_loss = 1.765, time/batch = 0.105[0m
[31m784/1352 (epoch 0), train_loss = 1.738, time/batch = 0.173[0m
[31m785/1352 (epoch 0), train_loss = 1.784, time/batch = 0.097[0m
[31m786/1352 (epoch 0), train_loss = 1.775, tim

[31m982/1352 (epoch 0), train_loss = 1.618, time/batch = 0.099[0m
[31m983/1352 (epoch 0), train_loss = 1.662, time/batch = 0.093[0m
[31m984/1352 (epoch 0), train_loss = 1.696, time/batch = 0.092[0m
[31m985/1352 (epoch 0), train_loss = 1.779, time/batch = 0.094[0m
[31m986/1352 (epoch 0), train_loss = 1.670, time/batch = 0.094[0m
[31m987/1352 (epoch 0), train_loss = 1.728, time/batch = 0.094[0m
[31m988/1352 (epoch 0), train_loss = 1.682, time/batch = 0.097[0m
[31m989/1352 (epoch 0), train_loss = 1.614, time/batch = 0.095[0m
[31m990/1352 (epoch 0), train_loss = 1.714, time/batch = 0.097[0m
[31m991/1352 (epoch 0), train_loss = 1.635, time/batch = 0.096[0m
[31m992/1352 (epoch 0), train_loss = 1.738, time/batch = 0.100[0m
[31m993/1352 (epoch 0), train_loss = 1.663, time/batch = 0.096[0m
[31m994/1352 (epoch 0), train_loss = 1.694, time/batch = 0.087[0m
[31m995/1352 (epoch 0), train_loss = 1.671, time/batch = 0.095[0m
[31m996/1352 (epoch 0), train_loss = 1.686, tim

[31m1179/1352 (epoch 0), train_loss = 1.710, time/batch = 0.096[0m
[31m1180/1352 (epoch 0), train_loss = 1.739, time/batch = 0.096[0m
[31m1181/1352 (epoch 0), train_loss = 1.651, time/batch = 0.091[0m
[31m1182/1352 (epoch 0), train_loss = 1.590, time/batch = 0.094[0m
[31m1183/1352 (epoch 0), train_loss = 1.652, time/batch = 0.136[0m
[31m1184/1352 (epoch 0), train_loss = 1.658, time/batch = 0.089[0m
[31m1185/1352 (epoch 0), train_loss = 1.578, time/batch = 0.097[0m
[31m1186/1352 (epoch 0), train_loss = 1.594, time/batch = 0.089[0m
[31m1187/1352 (epoch 0), train_loss = 1.662, time/batch = 0.089[0m
[31m1188/1352 (epoch 0), train_loss = 1.605, time/batch = 0.088[0m
[31m1189/1352 (epoch 0), train_loss = 1.632, time/batch = 0.093[0m
[31m1190/1352 (epoch 0), train_loss = 1.611, time/batch = 0.097[0m
[31m1191/1352 (epoch 0), train_loss = 1.620, time/batch = 0.096[0m
[31m1192/1352 (epoch 0), train_loss = 1.594, time/batch = 0.095[0m
[31m1193/1352 (epoch 0), train_lo