# Intro

This notebook served as the base to test the SageMaker refactored code for this project.

What you should know:

- During the dev process with SageMaker, we use the `sagemaker` **sdk** to call its functionnalities ( this notebook )
- For productionizing, we use SageMaker through API calls via **boto3** ( `sagemaker_hadling/sagemaker.py` via Jenkins )

The ML code behind is exactly the same ( meaning the refactored code stays intact ), the only change is *how* we call SageMaker.

### Let's first set the proxy variables

In [1]:
import os

os.environ['HTTP_PROXY'] = "http://proxy-internet-aws-eu.subsidia.org:3128"
os.environ['HTTPS_PROXY'] = "http://proxy-internet-aws-eu.subsidia.org:3128"
os.environ['no_proxy'] = "169.254.169.254,127.0.0.1"

### Parameters & Configuration

In [2]:
RUN_ENV = "preprod"
ONLY_LAST = True

In [4]:
import subprocess
import sys
sys.path.insert(0,'..')

import src.config as cf

In [5]:
config_file = "../conf/prod.yml" if RUN_ENV=="prod" else "../conf/dev.yml"
config = cf.ProgramConfiguration(config_file, "../conf/functional.yml")

  self._config_tech = yaml.load(f)
  self._config_func = yaml.load(f)


## Build Docker images for train & serve

In [6]:
config.get_train_image_name()

'demand-forecast-prophet-training-dev'

* For dev purposes, when you're working with a notebook instance ( not via Jenkins ), your docker daemon proxies may not have been set right, so run this ( only once for your Notebook instance session ):

In [8]:
%%sh
sudo su

cat <<EOF >> /etc/sysconfig/docker
export HTTPS_PROXY=http://proxy-internet-aws-eu.subsidia.org:3128
export HTTP_PROXY=http://proxy-internet-aws-eu.subsidia.org:3128
export NO_PROXY=169.254.169.254,127.0.0.1
EOF

service docker restart

Stopping docker: [  OK  ]
Starting docker:	.[  OK  ]


### Now let's build the image

#### First, a little something about what happens behind the scenes...

> The script in charge of building the image is `sagemaker_handling/build_image.sh`.
What it does is:

- Read the following arguments :
`algorithm_name` (the name you'd like to give the Docker image),
`run_env` (prod or preprod, it serves as an argument to the Docker image as well, because we need to propagate this variable from this script -running on a SageMaker notebook instance or Jenkins *for prod*- to the machine we pop for learning),
`only_last` (True or False, similarly, it serves as an argument to the Docker image)
- Update shell proxy variables
- Authenticate to ECR ( ForecastUnited account ID )
- Builds the image from `sagemaker_handling/Dockerfile_train`
- Push the image to ECR

**==> You don't have to change this file unless you'd like to change the arguments !**

> What happens in `sagemaker_handling/Dockerfile_train` ?

Well, we do the following:

- Copy the ML source code ( from `src/` and `conf/` ) to `/opt/program` on the Docker container running on the machine we'll pop for ML. Why `/opt/program` ? Because SageMaker expects all source code to be here ( it's a **WORKDIR** is the base Docker images we use, we can change it, but it's better to adhere to the norms ). We also copy the `sagemaker_handling/requirements_train_instance.txt` file to the image.

*PS: Any source code you don't COPY will not be available in the Docker image ==> not available in the ML instance machine you'll pop*

- We set the arguments ( remember ? `run_env` and `only_last` that we provide when we build the image ) as environment varibles ( in the Docker container on the mmachine we'll pop for ML )

- We set a bunch of SageMaker environment variables ( SM_CHANNEL_TRAIN, SM_MODEL_DIR SM_DATA_DIR ), to tell SageMaker where to look for training data and where to put model artefacts.

*PS: SageMaker copies your training data from an S3 path you provide ( when you call sagemaker for training - see later on in this notebook `Estimator` class ) to the path *SM_CHANNEL_TRAIN* on the container. So your ML code, should not read data from S3, but from this *local* path ( again, *local* in the container ).

- Lastly, we configure the container to run as a Python executable when it's running. What happens is:
When SageMaker is called to do training ( **sdk**: `Estimator` or **boto3**: create_training_job() etc. ), it adds an argument called `train` when it runs the Docker image in a container. Since we configured the container to be a Python executable, what happens in the container is:
```bash
python train
```
which means that the file in `/opt/program` called `train` ( `/opt/program` in the container, we copied it from *here* `src/train` remember ? ) is executed. This `train` file is where your ML code should be ! More on this later...

*PS: We can change the name `train` if we'd like, through a SageMaker environment varibale SM_PROGRAM, but again, the norm is to keep it like this )*

**==> You don't have to change this `sagemaker_handling/Dockerfile_train` file if you change your *ML* code ! Unless you want to copy more code in the Docker image, change the arguments, change SageMaker configurations etc.**

#### NOW let's build the image !

In [24]:
IMAGE_NAME = config.get_train_image_name()
ONLY_LAST_ = str(ONLY_LAST)

In [25]:
%%sh -s "$IMAGE_NAME" "$RUN_ENV" "$ONLY_LAST_"

cd ..
sh sagemaker_handling/build_image.sh $1 $2 $3 

demand-forecast-prophet-training-dev
preprod
True
Login Succeeded
Sending build context to Docker daemon  9.798MB
Step 1/13 : FROM 150258775384.dkr.ecr.eu-west-1.amazonaws.com/mxnet-gluonts-training-0.3.3-cpu:latest
latest: Pulling from mxnet-gluonts-training-0.3.3-cpu
Digest: sha256:060009925c389f20c3be2b291cf1032097c7aaedff213456c789ed19a315056a
Status: Image is up to date for 150258775384.dkr.ecr.eu-west-1.amazonaws.com/mxnet-gluonts-training-0.3.3-cpu:latest
 ---> 31ee196156f7
Step 2/13 : COPY src/ /opt/program
 ---> af382287c2b8
Step 3/13 : COPY conf/ /opt/program
 ---> f7feabcad899
Step 4/13 : COPY sagemaker_handling/requirements_train_instance.txt /opt/program
 ---> 99638d63e5e7
Step 5/13 : RUN pip install -r requirements_train_instance.txt
 ---> Running in 12140322ea25
Collecting pyyaml==3.13 (from -r requirements_train_instance.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/9e/a3/1d13970c3f36777c583f136c136f804d70f500168edc1edea6daa7200769/PyYAML-3.13.ta

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Training

In [5]:
from sagemaker.estimator import Estimator

### Training config

In [26]:
role = config.get_global_role_arn()
image_name = config.get_train_docker_image()
bucket = config.get_train_bucket_input()
project_id = config.get_train_path_refined_data_input()
hyperparameters = config.get_train_hyperparameters()
train_instance_count = config.get_train_instance_count()
train_instance_type = config.get_train_instance_type()
security_group_ids = config.get_global_security_group_ids()
subnets = config.get_global_subnets()

In [27]:
print("- role:", role,
      "\n- image name:", image_name,
      "\n- bucket:", bucket,
      "\n- project_id:", project_id,
      "\n- hyperparameters:\n", hyperparameters,
      "\n- train_instance_count:", train_instance_count,
      "\n- train_instance_type:", train_instance_type,
      "\n- security_group_id:s", security_group_ids,
      "\n- subnets:", subnets
     )

- role: arn:aws:iam::150258775384:role/FORECAST-SAGEMAKER-DEV 
- image name: 150258775384.dkr.ecr.eu-west-1.amazonaws.com/demand-forecast-prophet-training-dev:latest 
- bucket: fcst-refined-demand-forecast-dev 
- project_id: specific/domyos_nov_2019/train_data_cutoff 
- hyperparameters:
 {'yearly_order': '27', "quaterly_order'": '5', 'weekly_seasonality': 'False', 'daily_seasonality': 'False', 'yearly_seasonality': 'False', 'n_changepoints': '36', 'changepoint_range': '0.6970858418088761', 'changepoint_prior_scale': '1.9131406810094054', 'seasonality_prior_scale': '2.0461437728104253'} 
- train_instance_count: 1 
- train_instance_type: ml.c5.4xlarge 
- security_group_id:s ['sg-0186b5ab868f43e42'] 
- subnets: ['subnet-0f87a7ed73f4ead6d', 'subnet-02c60aed04f0d4ee5']


In [28]:
# If you'd like to run the docker container locally ( on this SageMaker notebook instance /!\ careful, it's not a very
## powerful instance, training my take a while ) instead of popping a machine ( faster to check your dev for small 
## datasets ).
# If not, let your configured ML instance be ( or change it here if you don't want to change it in the config file )

train_instance_type = 'local' #'ml.m5.2xlarge'

* The **Estimator** class from SageMaker SDK pops the machine you specify, and runs a container from the Docker image we built ( with the code in it ). This means that it executes the `train` file remember ?
SO, for your dev ML, you should only change the `train` file ( of course the file has dependencies with other files, but you know what I mean, the `main` program which runs when you call SageMaker for training is in `train` ). Don't forget to rebuild the image ( i.e. run the `sagemaker_handling/build_image.sh` file EVERYTIME you change the code, you know why ? because the new code needs to be copied to the Docker image ).

In [None]:
estimator = Estimator(role=role,
                      train_instance_count=train_instance_count,
                      train_instance_type=train_instance_type,
                      image_name=image_name,
                      hyperparameters=hyperparameters,
                      security_group_ids=security_group_ids,
                      subnets=subnets
                      )

estimator.fit('s3://'+bucket+'/'+project_id)

## Hyperparameter optimisation

In [11]:
import boto3
from time import gmtime, strftime

from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

This module works pretty much like Python's `skopt`. You specify your hyperparameter ranges :

In [20]:
tuning_job_name = 'demand-forecast-prophet-tuning-dev'# + strftime("%d-%H-%M-%S", gmtime())


hyperparameter_ranges = {
        'yearly_order': IntegerParameter (26, 29),#, #(1, 30)
        'quaterly_order': IntegerParameter(4, 6)#, #(1, 10)   
#        'n_changepoints': IntegerParameter(30, 32), #(1, 50)
#        'changepoint_range': ContinuousParameter(0.65, 0.69), #(0.6, 1.)
#        'changepoint_prior_scale': ContinuousParameter(1.8, 1.9, scaling_type="Logarithmic"), #1e-2, 1e2
#        'seasonality_prior_scale': ContinuousParameter(2.2, 2.4, scaling_type="Logarithmic")
}


What you must know is that your main training program ( i.e. `train`, you guessed it ! ) should expect the hyperparameters for your ML as program arguments ( this part of the SageMaker code refactoring your must do ).
Once you do this, the `HyperparameterTuner` SageMaker module works like magic ! It handles passing different hyperparameter combinations to your main script to test them.

But of course, we need to tell the module which hypeparameter combination to choose, i.e. what metric we'd like to optimize ! The way SageMaker does this, is via regular expressions. SO, in your training code, you need to print a metric at some point ( again, part of the SageMaker refactoring you must do ). Then, we tell SageMaker ( via the `HyperparameterTuner` call ) to extract the value of our metric for every combination, then choose the one which achieves our objective type ( i.e. *maximize* or *minimize* )

Check out how we did that for this project here:
- The function **train_model_fn()** from `src/model.py` ( but in the container its in `/opt/program/model.py`, but you already know that by now if you've ben following ;) ), called through our famous `train` script, does this at the end of training:

```python
print("\n--------------------------------\n")
print("cutoff_wape:", str(l_cutoff_wape))
print("global_wape:", str(global_wape))
print("\n--------------------------------\n")
```
So now, all we have to do is write the right regular expression to extract the **wape** metric we'd like to minimize:

In [21]:
objective_metric_name = 'global_wape'
objective_type = 'Minimize'
metric_definitions = [{'Name': 'global_wape',
                       'Regex': 'global_wape: ([0-9\\.]+)'}]

In [22]:
tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=9,
                            max_parallel_jobs=1,
                            objective_type=objective_type)

In [23]:
#tuner.fit({'training': 's3://'+bucket+'/'+project_id,
#          'test': 's3://'+bucket+'/'+project_id})

tuner.fit('s3://'+bucket+'/'+project_id)

In [24]:
boto3.client('sagemaker').describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']

'InProgress'

* Monitor, in more *visual* details, your tuning job right [here](https://eu-west-1.console.aws.amazon.com/sagemaker/home?region=eu-west-1#/hyper-tuning-jobs)

 -----------------------------------------------------------------------------------------------------------------
 
 *FbProphet* is too verbose ! This is its output ! Don't be surprised if your notebook gets too slow during training, the prints are too costly ! I did not find a way to reduce the verbosity level, if you do, by all mean, do share and update the code :D

In [11]:
estimator = Estimator(role=role,
                      train_instance_count=train_instance_count,
                      train_instance_type=train_instance_type,
                      image_name=image_name,
                      hyperparameters=hyperparameters,
                      security_group_ids=security_group_ids,
                      subnets=subnets
                      )

estimator.fit('s3://'+bucket+'/'+project_id)

2020-02-02 22:11:46 Starting - Starting the training job...
2020-02-02 22:11:47 Starting - Launching requested ML instances...
2020-02-02 22:12:42 Starting - Preparing the instances for training......
2020-02-02 22:13:21 Downloading - Downloading input data...
2020-02-02 22:13:52 Training - Downloading the training image.....[34mRun Env: preprod[0m
[34mOnly Last: True[0m
[34mContent /opt/ml/input/data/training ['train_data_cutoff_201942', 'train_data_cutoff_202004', 'train_data_cutoff_201814', 'train_data_cutoff_201941', 'train_data_cutoff_201935', 'train_data_cutoff_201943', 'train_data_cutoff_201902', 'train_data_cutoff_201912', 'train_data_cutoff_201950', 'train_data_cutoff_201922', 'train_data_cutoff_201944', 'train_data_cutoff_201937', 'train_data_cutoff_201948', 'train_data_cutoff_201949', 'train_data_cutoff_201824', 'train_data_cutoff_201844', 'train_data_cutoff_201924', 'train_data_cutoff_201928', 'train_data_cutoff_201930', 'train_data_cutoff_201945', 'train_data_cutoff_2

 -----------------------------------------------------------------------------------------------------------------