# Train and Host a Keras Sequential Model
## Using Pipe Mode datasets and distributed training with Horovod
This notebook shows how to train and host a Keras Sequential model on SageMaker. The model used for this notebook is a simple deep CNN that was extracted from [the Keras examples](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py).

## The dataset
The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) is one of the most popular machine learning datasets. It consists of 60,000 32x32 images belonging to 10 different classes (6,000 images per class). Here are the classes in the dataset, as well as 10 random images from each:

![cifar10](https://maet3608.github.io/nuts-ml/_images/cifar10.png)

In this tutorial, we will train a deep CNN to recognize these images.

We'll compare trainig with file mode, pipe mode datasets and distributed training with Horovod

## Set up the environment

In [1]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

## Download the CIFAR-10 dataset
Downloading the test and training data takes around 5 minutes.

In [2]:
import tensorflow as tf




In [3]:
tf.__version__

'1.15.0'

In [4]:
!pwd

/home/ec2-user/SageMaker/MLAI/script-mode


In [7]:
!pip install wget

Collecting wget
  Using cached wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25ldone
[?25h  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9681 sha256=bed51e161a34013e340f106ddd671cc20f9f58c2491554c0886a5d5ad916515e
  Stored in directory: /root/.cache/pip/wheels/90/1d/93/c863ee832230df5cfc25ca497b3e88e0ee3ea9e44adc46ac62
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [8]:
import wget

In [7]:
!python generate_cifar10_tfrecords_v1.x.py --data-dir data/


Download from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz and extract.
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded cifar-10-python.tar.gz 170498071 bytes.
Generating data//train/train.tfrecords


Generating data//validation/validation.tfrecords
Generating data//eval/eval.tfrecords
Done!


## Create a training job using the sagemaker.TensorFlow estimator, running locally
To test that the code will work in SageMaker, we'll first use SageMaker local mode.

In [9]:
from sagemaker.tensorflow import TensorFlow

import subprocess
instance_type = 'local'
print(instance_type)
if subprocess.call('nvidia-smi') == 0:
    ## Set type to GPU if one is present
    instance_type = 'local_gpu'
    
local_hyperparameters = {'epochs': 2, 'batch-size' : 64}

source_dir = os.path.join(os.getcwd(), 'source_dir')
estimator = TensorFlow(entry_point='cifar10_keras_main.py',
                       source_dir=source_dir,
                       role=role,
                       framework_version='1.12.0',
                       py_version='py3',
                       hyperparameters=local_hyperparameters,
                       train_instance_count=1, train_instance_type=instance_type)

local


In [10]:
local_inputs = {'train' : 'file://'+os.getcwd()+'/data/train', 
                'validation' : 'file://'+os.getcwd()+'/data/validation', 
                'eval' : 'file://'+os.getcwd()+'/data/eval'}
estimator.fit(local_inputs)

Creating tmpiaba668c_algo-1-bs0ry_1 ... 
[1BAttaching to tmpiaba668c_algo-1-bs0ry_12mdone[0m
[36malgo-1-bs0ry_1  |[0m 2020-05-05 19:08:00,915 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-bs0ry_1  |[0m 2020-05-05 19:08:00,922 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-bs0ry_1  |[0m 2020-05-05 19:08:01,848 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-bs0ry_1  |[0m 2020-05-05 19:08:01,861 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-bs0ry_1  |[0m 2020-05-05 19:08:01,871 sagemaker-containers INFO     Invoking user script
[36malgo-1-bs0ry_1  |[0m 
[36malgo-1-bs0ry_1  |[0m Training Env:
[36malgo-1-bs0ry_1  |[0m 
[36malgo-1-bs0ry_1  |[0m {
[36malgo-1-bs0ry_1  |[0m     "additional_framework_parameters": {},
[36malgo-1-bs0ry_1  |[0m     "channel_input_dirs": {
[36malgo-1-bs0ry_1  |[0m  