# Hyperparameter optimization of Keras RNN  with the Watson Machine learning Python client

This notebook continues working with the RNN developed  in [Predicting Oil Prices Using an RNN in Watson Studio](https://github.com/djccarew/timeseries-rnn-lab-part1). It  contains  the steps and code to demonstrate support of deep learning experiments in Watson Machine Learning Service based on the  RNN developed previously.  It  introduces commands for getting data, training_definition persistence and hyper parameter optimization.

This notebook is based on the example notebook
[From keras experiment to scoring with watson-machine-learning-client](https://dataplatform.ibm.com/analytics/notebooks/v2/1c9801fc-5063-4564-a756-75e99be47cd0/view?access_token=d38aa735e323be34260be5fcf65813cea1f5f8a17a256e1d2f23796fdcd11a7d) which follows more or less the same steps with a model based on the MNIST handwriting digits dataset.

## 1. Setup the environment

Before starting to run the code in this notebook, you must perform the following setup tasks:


i. Create a Watson Machine Learning Service instance and associate it with the Watson Studio project that contains this notebook. Information on how to do this is [here](https://github.com/djccarew/timeseries-rnn-lab-part2)

ii. Add specific credentials to the Cloud Object Storage instance associated with the Watson Studio project that contains this notebook. Information on how to do this is [here](https://github.com/djccarew/timeseries-rnn-lab-part2)

iii. Copy the credentials to a text file so that they can be easily copied to this notebook. Information on how to do this is  [here](https://github.com/djccarew/timeseries-rnn-lab-part2)


### 1.1 Work with Cloud Object Storage(COS)

Import the boto library, which allows Python developers to manage Cloud Object Storage.

In [None]:
 # Some required imports
import ibm_boto3
from ibm_botocore.client import Config
import os
import json
import warnings
import time

Add your COS credentials.

Copy the credentials  you saved to a text file during setup into the cell below. Note that the variable ```cos_credentials``` is a Python dictionary and should be defined with your credentials as follows:

```
cos_credentials = {
  "apikey": "___",
  "cos_hmac_keys": {
    "access_key_id": "___",
    "secret_access_key": "___"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/d86af7367f70fba4f306d3c19c469d89:6244216d-4578-4ac4-a6d8-baca423111f9::",
  "iam_apikey_name": "auto-generated-apikey-5ed63735-bc55-4c4d-8cc2-8ac6b38f554d",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/d86af7367f70fba4f306d3c19c469d89::serviceid:ServiceId-2c690700-a604-4ef3-b11b-34966debc9b2",
  "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/d86af7367f70fba4f306d3c19c469d89:6244216d-4578-4ac4-a6d8-baca423111f9::"
}

```

In [None]:
# Copy and paste your Cloud Object Storage credentials here
## Start COS credentials
cos_credentials = {
   
}
## End COS credentials

api_key = cos_credentials['apikey']
service_instance_id = cos_credentials['resource_instance_id']
auth_endpoint = 'https://iam.bluemix.net/oidc/token'
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'

Initialize the Cloud Object Storage (COS) client

In [None]:
cos = ibm_boto3.resource('s3',
                         ibm_api_key_id=api_key,
                         ibm_service_instance_id=service_instance_id,
                         ibm_auth_endpoint=auth_endpoint,
                         config=Config(signature_version='oauth'),
                         endpoint_url=service_endpoint)

Create the buckets needed to store training data and training results. 

**Important:** Bucket names have to be globally unique  - replace `nnnn` in the bucket names below  with the last 4 digits of your phone number or something else unique.


In [None]:
# Important: Bucket names have to be globally unique  -
# replace nnnn in the bucket names below  with the last 4 digits of your phone number or something else unique.
buckets = ['oilprice-rnn-data-nnnn', 'oilprice-rnn-results-nnnn']
for bucket in buckets:
    if not cos.Bucket(bucket) in cos.buckets.all():
        print('Creating bucket "{}"...'.format(bucket))
        try:
            cos.create_bucket(Bucket=bucket)
        except ibm_boto3.exceptions.ibm_botocore.client.ClientError as e:
            print('Error: {}.'.format(e.response['Error']['Message']))

Now we should have our buckets created.

In [None]:
print(list(cos.buckets.all()))

### 1.2 Downloading oil price  data and upload  it to COS buckets¶
We will work with the weekly  oil prices for West Texas crude. Let's download the training data and upload  to the ``` oilprice-rnn-data``` bucket.

Run the code in the cell below to create the ```OILPRICE_RNN_DATA``` folder and download the data  file from the github repository.


In [None]:
!pip install wget
import wget, os

link = 'https://raw.githubusercontent.com/ibm-ai-education/timeseries-rnn-lab-part1/master/data/WCOILWTICO.csv'

data_dir = 'OILPRICE_RNN_DATA'
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)

if not os.path.isfile(os.path.join(data_dir, os.path.join(link.split('/')[-1]))):
    wget.download(link, out=data_dir)  
    
data_file_path = os.path.join(data_dir, os.path.join(link.split('/')[-1]))
        
!ls OILPRICE_RNN_DATA


Upload the data file to the  Cloud Object Storage bucket you just created

In [None]:
bucket_name = buckets[0]
bucket_obj = cos.Bucket(bucket_name)

for filename in os.listdir(data_dir):
    with open(os.path.join(data_dir, filename), 'rb') as data: 
        bucket_obj.upload_file(os.path.join(data_dir, filename), filename)
        print('{} is uploaded.'.format(filename))

Verify that the data file was uploaded to Cloud Object Storage

In [None]:
for obj in bucket_obj.objects.all():
    print('Object key: {}'.format(obj.key))
    print('Object size (kb): {}'.format(obj.size/1024))

We are done with Cloud Object Storage, we are ready to train our model.

### 1.3 Work with the Watson Machine Learning instance

In [None]:
# Required imports
import urllib3, requests, json, base64, time, os
warnings.filterwarnings('ignore')

Authenticate to the Watson Machine Learning service on IBM Cloud.

**Note:** Copy the Watson Machine Learning service  credentials  you saved to a text file during setup into the cell below. Note that the variable ```wml_credentials``` is a Python dictionary and should be defined with your credentials as follows:

```
wml_credentials = {
  "url": "https://ibm-watson-ml.mybluemix.net",
  "username": "___",
  "password": "___",
  "instance_id": "___"
}
```

In [None]:
# Copy and paste your Cloud Object Storage credentials here
## Start WML service credentials
wml_credentials = {
 
}
## End WML service credentials


**Install watson-machine-learning-client from pypi**

In [None]:
!pip install --upgrade watson-machine-learning-client

**Import watson-machine-learning-client and authenticate to service instance**

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient(wml_credentials)
print(client.version)

## 2. Training definitions

For purpose of this example one Keras model definition of an RNN has been prepared.

### 2.1 Save training definition
Prepare training definition metadata

In [None]:
model_definition_metadata = {
            client.repository.DefinitionMetaNames.NAME: "OILPRICE-RNN",
            client.repository.DefinitionMetaNames.FRAMEWORK_NAME: "tensorflow",
            client.repository.DefinitionMetaNames.FRAMEWORK_VERSION: "1.5",
            client.repository.DefinitionMetaNames.RUNTIME_NAME: "python",
            client.repository.DefinitionMetaNames.RUNTIME_VERSION: "3.6",
            client.repository.DefinitionMetaNames.EXECUTION_COMMAND: "python3 oilprice_rnn.py"
            }

**Get sample model definition content files from git (python keras script with RNN)**

In [None]:
code_filename = 'OILPRICERNN.zip'

if os.path.isfile(code_filename):
    !ls 'OILPRICERNN.zip'
else:
    !wget https://github.com/ibm-ai-education/timeseries-rnn-lab-part2/raw/master/model-source/OILPRICERNN.zip
    !ls 'OILPRICERNN.zip'
    
model_filename = 'oilprice_rnn.tgz'
if os.path.isfile(model_filename):
    !ls 'oilprice_rnn.tgz'
else:
    !wget https://github.com/ibm-ai-education/timeseries-rnn-lab-part2/raw/master/model-source/oilprice_rnn.tgz
    !ls 'oilprice_rnn.tgz'
    

**Publish training definition in Watson Machine Learning repository**

In [None]:
definition_details = client.repository.store_definition(code_filename, model_definition_metadata)

definition_url = client.repository.get_definition_url(definition_details)
definition_uid = client.repository.get_definition_uid(definition_details)
print(definition_url)

## 3. Experiment definition

### 3.1 Save experiment

**Experiment configuration dictionary**

Create experiment that will train models based on previously stored definitions.


TRAINING_DATA_REFERENCE - location of traininng data

In [None]:
TRAINING_DATA_REFERENCE = {
                            "connection": {
                                "endpoint_url": service_endpoint,
                                "access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                                "secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
                            },
                            "source": {
                                "bucket": buckets[0],
                            },
                            "type": "s3"
}



TRAINING_RESULTS_REFERENCE - location of training results


In [None]:
TRAINING_RESULTS_REFERENCE = {
                                "connection": {
                                    "endpoint_url": service_endpoint,
                                    "access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                                    "secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
                                },
                                "target": {
                                    "bucket": buckets[1],
                                },
                                "type": "s3"
}

Configure the hyper parameters optimizer for your experiment. The  objective is to find the combination of hyper parameters that minimizes the *val_loss* metric (i.e. mean squared error of the test data set)  so it is indicated  as the optimizer objective. 

The two hyper parameters that are to be optimized are:

i. **dropout_rate** - the dropout rate for the  Dropout layer in the model 

ii. **prev_periods** - the number of weeks of data to use to predict the next week's price . If set to 1, the input for the prediction  for *week n* will  be the price for *week n-1*. If set to 2, the input for the prediction  for *week n* will  be the prices for *week n-2* and *week n-1* and so on.

**num_optimizer_steps** tells the optimizer how many models we want to train based on hyper parameter value combinations. Here 4 are used in the interestr of time. Normally you would do 6, since there are 6 possible combinations of hyper parameter values

In [None]:
HPO = {
        "method": {
            "name": "random",
            "parameters": [
                client.experiments.HPOMethodParam("objective", "val_loss"),
                client.experiments.HPOMethodParam("maximize_or_minimize", "minimize"),
                client.experiments.HPOMethodParam("num_optimizer_steps", 4)
            ]
        },
        "hyper_parameters": [
            client.experiments.HPOParameter('dropout_rate', min=0.1, max=0.5, step=0.2),
            client.experiments.HPOParameter('prev_periods', min=1, max=2, step=1)
        ]
     }       

Configure your experiment. The experiment metadata links previously stored training definitions and provides information about compute_configuration that will be used to run the training.

In [None]:
experiment_metadata = {
            client.repository.ExperimentMetaNames.NAME: "Oil Price RNN Experiment",
            client.repository.ExperimentMetaNames.DESCRIPTION: "Best model for RNN oil price forecaster.",
            client.repository.ExperimentMetaNames.AUTHOR_EMAIL: "yourname@youremail.com",
            client.repository.ExperimentMetaNames.EVALUATION_METRICS: ["mae"],
            client.repository.ExperimentMetaNames.TRAINING_DATA_REFERENCE: TRAINING_DATA_REFERENCE,
            client.repository.ExperimentMetaNames.TRAINING_RESULTS_REFERENCE: TRAINING_RESULTS_REFERENCE,
            client.repository.ExperimentMetaNames.TRAINING_REFERENCES: [
                        {
                            "name": "OILPRICE_RNN",
                            "training_definition_url": definition_url,
                            "compute_configuration": {"name": "k80x2"},
                            "hyper_parameters_optimization": HPO
                            
                        }],
            }

**Store experiment in Watson Machine Learning repository**

In [None]:
experiment_details = client.repository.store_experiment(meta_props=experiment_metadata)

experiment_uid = client.repository.get_experiment_uid(experiment_details)
print(experiment_uid)

## 4. Run experiment

### 4.1 Running experiments

This kicks off the  experiment asynchronously. You'll have to monitor its progress below to know when it has completed

In [None]:
experiment_run_details = client.experiments.run(experiment_uid, asynchronous=True)
experiment_run_uid = client.experiments.get_run_uid(experiment_run_details)
print(experiment_run_uid)

**Note:** The training runs will take a few minutes. Now is a good time for a break. You can check on the status of your run by running the code cell below.
Keep running the cell below periodically  until all the training runs are in the COMPLETED state as shown below:
```
--------------------  ------------  ---------  --------------------  --------------------  ...
GUID (training)       NAME          STATE      SUBMITTED             FINISHED              ...
training-vw7UqMZiR    OILPRICE_RNN  completed  2018-04-14T13:46:10Z  2018-04-14T13:53:47Z  ...
training-vw7UqMZiR_0  OILPRICE_RNN  completed  2018-04-14T13:47:22Z  -                     ...
                                                                                           ...
training-vw7UqMZiR_1  OILPRICE_RNN  completed  2018-04-14T13:47:22Z  -                     ...
                                                                                           ...
training-vw7UqMZiR_2  OILPRICE_RNN  completed  2018-04-14T13:47:22Z  -                     ...
                                                                                           ...
training-vw7UqMZiR_3  OILPRICE_RNN  completed  2018-04-14T13:47:22Z  -                     ...                                                                         
                                                                                           ...
--------------------  ------------  ---------  --------------------  --------------------  ...
```

In [None]:
# Keep running this cell periodically  until all the training runs are in the COMPLETED state as illustrated above:
client.experiments.list_training_runs(experiment_run_uid)

Once the experiment is completed, the next order of business is to find out which training run performed the best and what are the corresponding hyper parameters for that run.

All that infomation is available via the ```client.experiments.get_run_details(...)``` call

In [None]:
experiment_run_details = client.experiments.get_run_details(experiment_run_uid)
# print(json.dumps(experiment_run_details, indent=2))

### 4.2 Assessing the results

Rather than navigate through the reams of information about the experiment, lets put the  stuff we're interested  in into a Data Frame so it's easier to work with. We'll get the results of each training run and the hyper parameters values used.

**Note:** In practice you could export this to a CSV file or stick it in a database so you can peruse it later at your leisure.

In [None]:
import pandas as pd
rows_list = []
for m in experiment_run_details['entity']['training_statuses']:
    if len (m['metrics']) > 0:
        last_metric = m['metrics'][ len (m['metrics']) - 1]
        for h in m['hyper_parameters']:
            if h['name'] == 'dropout_rate':
               dropout_rate = h['double_value']
            else:
               prev_periods = h['int_value']
        for v in last_metric['values']:
            if v['name'] == 'loss' or v['name'] == 'val_loss':
               val_loss = v['value']
            else:
               val_mae = v['value']
        one_row = [m['training_guid'],  last_metric['phase'], val_mae, val_loss, dropout_rate,  prev_periods]
        rows_list.append(one_row)
            
metrics_df = pd.DataFrame(rows_list,columns=['GUID', 'PHASE', 'MAE', 'VAL LOSS', 'DROPOUT', 'PREV PERIODS'])
metrics_df

And the winner is ?????

Look for the run that had the lowest validation loss (ie mean squared error) on the test data

In [None]:
best_run_df = metrics_df.nsmallest(1, 'VAL LOSS')
best_run_df

## 5. Create online deployment

You can deploy the model as a web service (online) using the Watson Machine Learning service API

### 5.1 Store trained model

Save the model in the Watson Machine Learning repository

In [None]:
metadata = {
    client.repository.ModelMetaNames.NAME: "Oil Price RNN Model",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "tensorflow",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "1.11",
    client.repository.ModelMetaNames.FRAMEWORK_LIBRARIES: [{'name':'keras', 'version': '2.2.4'}]
}
published_model = client.repository.store_model( model=model_filename, meta_props=metadata )

### 5.2 Create online deployment from stored model

In [None]:
# Deploy the stored model as an online web service deployment
published_model_uid = client.repository.get_model_uid(published_model)
deployment_details = client.deployments.create(published_model_uid, name="Oil Price RNN Deployment" )

Extract scoring endpoint from deployment details

In [None]:
scoring_url = client.deployments.get_scoring_url(deployment_details)
print(scoring_url)

## 6. Scoring

Prepare sample scoring data to score the deployed model

We'll use the last week(s) of data in the dataset as input to predict the price of the week of 4/6/2018 (the last week covered by the data set is 3/30/2018).

The input node of our model expects data with shape (1, n) where n is the value of the hyper parameter prev_periods so we need to make sure our data to be scored is shaped accordingly

Read in original price data and scale it between 0 and 1

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
prices_df = pd.read_csv(data_file_path, index_col='DATE')

# Create a scaled version of the data with oil prices normalized between 0 and 1
values = prices_df['WCOILWTICO'].values.reshape(-1,1)
values = values.astype('float32')
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

Get last week of available data to predict the following week 

In [None]:
# Grab last week of scaled data and reshape into shape expected by model 
scaled_last_prices = scaled[len(scaled) - 1:len(scaled),:]
scaled_last_prices = np.reshape(scaled_last_prices, (1, 1))


Prepare scoring payload and score.

In [None]:
scoring_data = {'values': [scaled_last_prices.tolist()]}
predictions = client.deployments.score(scoring_url, scoring_data)
scaled_prediction = predictions['values'][0][0][0]
print("Scaled prediction: " + str(scaled_prediction))

Convert scaled prediction to dollars

In [None]:
# Transform scaled prediction back to a USD price
next_price_inverse = scaler.inverse_transform(np.full((1,1),scaled_prediction).reshape(-1, 1))
print("Prediction in USD: " + str(next_price_inverse))


## 7. Summary and next steps

You successfully completed this notebook! You learned how to use the watson-machine-learning-client to run experiments. Check out the [Online Documentation](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html) for more samples, tutorials, documentation, how-tos, and blog posts.

Copyright © 2017, 2018 IBM. This notebook and its source code are released under the terms of the MIT License