# Using script mode training
This notebook shows how to train a Keras Sequential model on SageMaker. The model used for this notebook is a simple deep CNN that was extracted from [the Keras examples](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py).

## The dataset
The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) is one of the most popular machine learning datasets. It consists of 60,000 32x32 images belonging to 10 different classes (6,000 images per class). Here are the classes in the dataset, as well as 10 random images from each:

![cifar10](https://maet3608.github.io/nuts-ml/_images/cifar10.png)

In this tutorial, we will train a deep CNN to recognize these images.


## Set up the environment

In [5]:
import os
import sagemaker
from sagemaker import get_execution_role
import boto3
sagemaker_session = sagemaker.Session()

role = get_execution_role()

In [6]:
bucket = 'demo-saeed'
prefix = 'sagemaker/script-mode'

## Download the CIFAR-10 dataset
Downloading the test and training data takes around 5 minutes.

In [2]:
!pip install wget

Processing /home/ec2-user/.cache/pip/wheels/90/1d/93/c863ee832230df5cfc25ca497b3e88e0ee3ea9e44adc46ac62/wget-3.2-py3-none-any.whl
Installing collected packages: wget
Successfully installed wget-3.2
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/tensorflow_p36/bin/python -m pip install --upgrade pip' command.[0m


In [3]:
# import wget # for TF2
!mkdir data
!python generate_cifar10_tfrecords_v2.py --data-dir data/

mkdir: cannot create directory ‘data’: File exists

Download from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz and extract.
data/
100% [..................................................] 170498071 / 170498071Generating data//train/train.tfrecords
Generating data//validation/validation.tfrecords
Generating data//eval/eval.tfrecords
Done!


## Run on SageMaker cloud

### Uploading the data to s3

In [7]:
dataset_location = sagemaker_session.upload_data(bucket=bucket, path='data', key_prefix=prefix)
display(dataset_location)

's3://demo-saeed/sagemaker/script-mode'

### Configuring metrics from the job logs
SageMaker can get training metrics directly from the logs and send them to CloudWatch metrics.

In [8]:
keras_metric_definition = [
    {'Name': 'train:loss', 'Regex': '.*loss: ([0-9\\.]+) - acc: [0-9\\.]+.*'},
    {'Name': 'train:accuracy', 'Regex': '.*loss: [0-9\\.]+ - acc: ([0-9\\.]+).*'},
    {'Name': 'validation:accuracy', 'Regex': '.*step - loss: [0-9\\.]+ - acc: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_acc: ([0-9\\.]+).*'},
    {'Name': 'validation:loss', 'Regex': '.*step - loss: [0-9\\.]+ - acc: [0-9\\.]+ - val_loss: ([0-9\\.]+) - val_acc: [0-9\\.]+.*'},
    {'Name': 'sec/steps', 'Regex': '.* - \d+s (\d+)[mu]s/step - loss: [0-9\\.]+ - acc: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_acc: [0-9\\.]+'}
]

### Train image classification based on the cifar10 dataset

In [9]:
hyperparameters = {'epochs': 10, 'batch-size' : 256}

In [10]:
from sagemaker.tensorflow import TensorFlow


source_dir = os.path.join(os.getcwd(), 'source_dir')
estimator = TensorFlow(base_job_name='cifar10-tf',
                       entry_point='cifar10_keras_main.py',
                       source_dir=source_dir,
                       role=role,
                       framework_version='1.12.0',
                       py_version='py3',
                       hyperparameters=hyperparameters,
                       train_instance_count=1, train_instance_type='ml.p3.2xlarge',
                       metric_definitions=keras_metric_definition)

In [11]:
remote_inputs = {'train' : dataset_location+'/train', 'validation' : dataset_location+'/validation', 'eval' : dataset_location+'/eval'}
estimator.fit(remote_inputs, wait=True)

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-08-05 17:36:12 Starting - Starting the training job...
2020-08-05 17:36:14 Starting - Launching requested ML instances......
2020-08-05 17:37:32 Starting - Preparing the instances for training......
2020-08-05 17:38:35 Downloading - Downloading input data...
2020-08-05 17:38:56 Training - Downloading the training image..[34m2020-08-05 17:39:19,278 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-08-05 17:39:19,632 sagemaker-containers INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "eval": "/opt/ml/input/data/eval",
        "validation": "/opt/ml/input/data/validation",
        "train": "/opt/ml/input/data/train"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_tensorflow_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "batch-size": 256,
        "model_dir": "

### View the job training metrics
SageMaker used the regular expression configured above, to send the job metrics to CloudWatch metrics.
You can also view the job metrics directly from the SageMaker Studio . On the left side bar select the SageMaker Experiment List, right click on  _Unassigned trial components_, open in trial component list, choose the latest training job, open in trial details, and now you can see all the metrics that you defined to sent to Cloud watch.   
You can also use CloudWatch metrics, where you can change the period and configure the statistics.

In [None]:
from IPython.core.display import Markdown

link = 'https://console.aws.amazon.com/cloudwatch/home?region='+sagemaker_session.boto_region_name+'#metricsV2:query=%7B/aws/sagemaker/TrainingJobs,TrainingJobName%7D%20'+estimator.latest_training_job.job_name
display(Markdown('CloudWatch metrics: [link]('+link+')'))
display(Markdown('After you choose a metric, change the period to 1 Minute (Graphed Metrics -> Period)'))