# Deep Learning Experiments using IBM Watson DLaaS
In this notebook, we illustrate how to run deep learning experiments using Deep Learning as a Service (DLaaS) capabilities in Watson Machine Learning.

In order to leverage IBM DLaaS, you will need to have two IBM Cloud services in addition to Watson Studio where you're running this notebook:
- **Cloud Object Storage (COS)** which serves as storage for the training data as well as training results and logging/monitoring data. Actually, COS is needed for all Watson Studio prohects.
- **Watson Machine Learning (WML)** which handles sotring the training and experiment information, executing the training runs and experiments, and deploying trained models.

Training a deep learning neural networks involves the following steps:
1. Setup COS to define buckets for reading the training data and buckets for writing the training results.
2. Create one or more training definition which outline the neural network (NN) architecture and the references to the COS bucket containing the input training data and output COS bucket for writing training results.
3. **Optional** Execute a training run based on the training definition in step 2. This step is optional and only executed to validate the information is set up correctly for training. Practically, data scientists would run experiments with hyper parameter optimization (HPO) which consist of multiple training definitions and multiple parameters to optimize.
4. Create an experiment definition which would include the training definition in step 2, the COS bucket with input training data, the COS bucket to write the training results, and the optimization configuration parameters such as the optimization algorithm, parameters to vary, and the metrics to consider when comparing alternatives.
5. Monitor / visualize experiment results which is a critical step to understand the performance of your models.

In the rest of this notebook, we will go through these steps and discuss the details for each step.

1. [IBM Cloud Object Storage](#cos)
2. [Watson Machine Learning Setup](#wml)
3. [Training Definition](#training_def)
4. [Training Run](#training_run)
5. [Experiment Definition](#exp)
6. [Summary](#summary)


## DLaaS PreRequisites
Please review the [DLaaS Prerequisites](https://github.com/joe4k/deeplearningutils/blob/master/DLaaS_Prerequisites.pdf) tutorial which explains how to setup your IBM Cloud account and create the required IBM Cloud services so you're able to run deep learning training experiments using IBM Watson DLaaS capabilities.



<a id="cos"></a>
## 1. IBM Cloud Object Storage
In this section, we explain how to work with Cloud Object Storage (COS) for purposes of running deep learning training experiments.

Please make sure you went through the [DLaaS Prerequisites](https://github.com/joe4k/deeplearningutils/blob/master/DLaaS_Prerequisites.pdf) tutorial before proceeding. Assuming you've run through those steps, then you should have a Cloud Object Storage (COS) instance with the associated credentials.

### 1.1 COS Credentials
In the next cell, you need to specify the cloud object storage instance credentials. The DLaaS Prerequisities tutorial explains how to get those credentials.

The following link also offers instructions for creating the credentials for your Cloud Object Storage instance:
https://github.com/biosopher/unofficial-watson-studio-python-utils/wiki/Save-COS-Credentials-to-cos_credentials.json

**Note** Make sure you use the {"HMAC":true} parameter when creating the credentials.

The COS credentials should look as follows:
```
{
  "apikey": "********************",
  "cos_hmac_keys": {
    "access_key_id": "*************************",
    "secret_access_key": "*************************"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "***********************",
  "iam_apikey_name": "*****************************",
  "iam_role_crn": "***************************",
  "iam_serviceid_crn": "****************************",
  "resource_instance_id": "********************************"
} ```

Additionally, you need to specify the service endpoint for your COS instance. To get that endpoint:

- Navigate to your COS instance on your IBM Cloud account
- Click on the Endpoint link in the left navigation column
- Copy the public endpoint corresponding to your COS location. If your location is us-geo, then select the public endpoint for us-geo.

The service endpoint would look as follows: 'https://s3-api.us-geo.objectstorage.softlayer.net'

In [None]:
# replace with your credentials
cos_credentials = {
  "apikey": "**************",
  "cos_hmac_keys": {
    "access_key_id": "**************",
    "secret_access_key": "**************"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "**************",
  "iam_apikey_name": "**************",
  "iam_role_crn": "**************",
  "iam_serviceid_crn": "**************",
  "resource_instance_id": "**************"
}
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
# Load Python package to simplify working with COS
import boto3

In [None]:
# Define a client for your COS instance based on the credentials
cos_client = boto3.client('s3', 
                          endpoint_url = service_endpoint, 
                          aws_access_key_id=cos_credentials["cos_hmac_keys"]["access_key_id"], 
                          aws_secret_access_key=cos_credentials["cos_hmac_keys"]["secret_access_key"])

### 1.2 COS Utilities
In the next cell, we define multiple utilities that are useful when working with Cloud Object Storage.

- **get_all_buckets** returns all the buckets created in your COS instance.
- **get_objects_in_bucket** returns all the objects in a specific bucket in your COS instance.
- **create_unique_bucket** creates a new bucket in your COS instance.
- **upload_file_to_bucket** uploads file from the local notebook environment to a bucket in your COS instance.
- **download_file_from_bucket** downloads file from the bucket in your COS instance.
- **download_file_from_url** downloads file from a given url to the local notebook environment.
- **remove_files_from_dir** removes files from a local directory; mainly used to clean up files when no longer needed.

If the training data is provided via a URL, then you can use the download_file_from_url and upload_file_to_bucket to get the data to your COS bucket.

If the training data is provided via a COS bucket, then you can use the download_file_from_bucket and upload_file_to_bucket to get the data to your COS bucket. It may be better to just use the data in the COS bucket specified as opposed to copying to your own COS bucket.

In [None]:
# load some require python packages
import random
import string
import os
import urllib

# Return all buckets in your COS instance
def get_all_buckets(cos_client):
    response = cos_client.list_buckets()
    allbuckets = []
    for bucket in response['Buckets']:
        allbuckets.append(bucket['Name'])
    return allbuckets

# Return all the objects in a COS bucket
def get_objects_in_bucket(cos_client,bucket_name):
    return cos_client.list_objects(Bucket=bucket_name)

# Create a unique COS bucket
def create_unique_bucket(cos_client, bucket_prefix):
    # Create a random 10 digit string
    # this random string increases the likelihood of the bucket name to be unique
    lst = [random.choice(string.ascii_letters + string.digits) for n in range(10)]
    random_string = "".join(lst).lower()
    bucket = "%s-%s" % (bucket_prefix, random_string)
    
    #print("creating bucket: ", bucket)
    cos_client.create_bucket(Bucket=bucket)
    print("Bucket %s created" % bucket)
    return bucket

# Upload objects to COS bucket
def upload_file_to_bucket(cos_client,file,bucket):
    file_name = os.path.basename(file)
    print("Uploading %s to bucket: %s" % (file_name,bucket))
    cos_client.upload_file(file, bucket, file_name)

# Download objects from COS bucket
def download_file_from_bucket(cos_client, bucket, file_to_download, save_path, is_redownload=False):
    if not os.path.exists(save_path) or is_redownload:
        with open(save_path, 'wb') as file:
            print("Downloading %s" % file_to_download)  # "\r" allows us to overwrite the same line
            try:
                cos_client.download_fileobj(bucket, file_to_download, file)
            except:
                e = sys.exc_info()[0]
                print(e.__dict__)
                if e.response != None:
                    print("Detailed error: ", e.response)
                print('An error occured downloading %s from %s' % (file_to_download, bucket))
                os.remove(local_file)
            finally:
                file.close()

# Download objects from a URL 
def download_file_from_url(file_url,save_directory=None):
    # If save directory provided then don't delete local downloads
    working_directory = "temp_cos_files"
    if save_directory is not None:
        working_directory = save_directory
    os.makedirs(working_directory, exist_ok=True)

    file_name = os.path.basename(file_url)
    # Sometime url include parms and need to split those off to get valid file_name
    file_name = file_name.split('?')[0]
    # Delete file if present as perhaps download failed and file corrupted
    file_path = os.path.join(working_directory, file_name)
    if os.path.exists(file_path):
        os.remove(file_path)

    file_path, _ = urllib.request.urlretrieve(file_url, file_path)
    stat_info = os.stat(file_path)
    print('Downloaded', file_path, stat_info.st_size, 'bytes.')
    
    
# Remove all files from the specified directory in the local environment
def remove_all_files_from_dir(dir):
    for f in os.listdir(dir):
        file_path = os.path.join(dir, f)
        if os.path.exists(file_path):
            os.remove(file_path)
            
# Remove a specific file from a given dir
def remove_file_from_dir(filename, dir):
    file_path = os.path.join(dir, filename)
    if os.path.exists(file_path):
        os.remove(file_path)
            

In [None]:
# List all buckets in your COS instance
buckets = get_all_buckets(cos_client)
print(buckets)

**Optional** 
Start of Optional section

Executing next cell is optional depending on whether buckets are already created or not.

If COS buckets exist with the input training data and output training results, then skip next cell.

Otherwise, run the next cell to create one COS bucket to store the input training data and another COS bucket to write the output training results.

In [None]:
# Create two COS buckets
# One bucket to store the input training data
mnist_training_data_bucket_prefix = 'mnist-training-data'
mnist_training_data_bucket = create_unique_bucket(cos_client,mnist_training_data_bucket_prefix)
# One bucket to write the output training results
mnist_training_results_bucket_prefix = 'mnist-training-results'
mnist_training_results_bucket = create_unique_bucket(cos_client,mnist_training_results_bucket_prefix)

# Note that the create bucket method appends a random 10 character string so the bucket name is more likely to be unique


# Download training data from the following URLs and upload to COS bucket
data_links = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz']

bucketName = mnist_training_data_bucket
working_dir = "mnist_files"
allfiles = []
for file_url in data_links:
    file_name = os.path.basename(file_url)
    allfiles = allfiles.append(file_name)
    print("file url: %s " % file_url)
    print("file name: %s " % file_name)
    download_file_from_url(file_url,working_dir)
    file_path = os.path.join(working_dir, file_name)
    upload_file_to_bucket(cos_client,file_path,bucketName)

for f in allfiles:
    remove_file_from_dir(f, working_dir)

**Optional** End of Optional section

At this point, you should have two COS buckets, one for the input training data and one for the output training results.

The input training data bucket should also have the training data you'd like to use for your deep learning experiments.

In this specific notebook, we're using MNIST data but you could use your own data sets.

In the next cell, make sure you specify the correct names for the buckets.

In [None]:
# Specify the buckets for training data and training results
mnist_training_data_bucket = 'mnist-training-data-hxjh7iohms'
mnist_training_results_bucket = 'mnist-training-restuls-d4tklbfief'

In [None]:
# List all objects in the training data bucket to verify the required training files are in the bucket
objects = get_objects_in_bucket(cos_client,mnist_training_data_bucket)
contents = objects['Contents']
for c in contents:
    print('file: %s ' % c['Key'])

<a id="wml"></a>
## 2. Watson Machine Learning (WML) Setup
In the next section, we setup the access to WML which we'll use to setup the Deep Learning experiments.

The Watson Machine Learning credentials can be obtained from IBM Cloud account by finding the specific WML service instance and clicking the Service credentials in the left navigation column.

For more details on creating a Watson Machine Learning service and getting the credentials, please consult the [DLaaS Prerequisites](https://github.com/joe4k/deeplearningutils/blob/master/DLaaS_Prerequisites.pdf) tutorial.

WML credentials look as follows:

```
wml_credentials = {
  "apikey": "************",
  "iam_apikey_description": "************",
  "iam_apikey_name": "************",
  "iam_role_crn": "************",
  "iam_serviceid_crn": "************",
  "instance_id": "************",
  "password": "************",
  "url": "************",
  "username": "************"
}```

In [None]:
# Replace with your Watson Machine Learning credentials
wml_credentials = {
  "apikey": "**********",
  "iam_apikey_description": "**********",
  "iam_apikey_name": "**********",
  "iam_role_crn": "**********",
  "iam_serviceid_crn": "**********",
  "instance_id": "**********",
  "password": "**********",
  "url": "**********",
  "username": "**********"
}

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

In [None]:
# Create WML client to point to your WML service instance
client = WatsonMachineLearningAPIClient(wml_credentials)
# Display the client version number.
print(client.version)

<a id="trainingdef"></a>
## 3. Training Definition
Deep Learning scientist typically run many experiments to optimize various parameters of their deep learning models. 

As described in [Watson Machine Learning Documentation](https://dataplatform.ibm.com/docs/content/analyze-data/ml_dlaas_working_with_experiments.html?audience=wdp&context=analytics), an experiment is a logical grouping of one or more training definitions. When an experiment is run, it creates training runs for each training definition that is part of the experiment.

Training definitions are the organizing principle for using deep learning functions in IBM Watson Machine Learning. A typical scenario might consist of dozens to hundreds of training definitions. Each training definition is defined individually and consists of the following parts: the neural network defined by using one of the supported deep learning frameworks and location of the IBM Cloud Object Storage that contains your data set. 

For more details on training definitions, consult the documentation:
https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml_dlaas_working_with_training_definitions.html

In the next section, we will define the training definitions to run. Note that there are some general parameters like model name, description, author, and runtime.

Note the **EXECUTION_COMMAND** which effectively specifies what exactly the model consists of. In this case, the actual neural network being trained is a convolutional neural network (CNN) modeled in the **convolutional_network.py** python script. Note that so far, we've referenced the script but didn't provide it yet.

That script could be available via a web url and in that case, we can use wget or urllib to download it. Alternatively, the script could be available in Cloud Object Storage and in that case, we need to download it from COS.

The model should be represented as a zip file that consists of the python script and any other dependencies. For the current example, the model consists of a zip file that includes 2 python files, one is **convolutional_network.py** and the other is **input_data.py** which handles reading in the MNIST image data.

In [None]:
model_definition_1_metadata = {
            client.repository.DefinitionMetaNames.NAME: "mnist_tfmodel_hpo",
            client.repository.DefinitionMetaNames.DESCRIPTION: "tfmodel_description",
            client.repository.DefinitionMetaNames.AUTHOR_NAME: "Joe Kozhaya",
            client.repository.DefinitionMetaNames.AUTHOR_EMAIL: "kozhaya@us.ibm.com",
            client.repository.DefinitionMetaNames.FRAMEWORK_NAME: "tensorflow",
            client.repository.DefinitionMetaNames.FRAMEWORK_VERSION: "1.5",
            client.repository.DefinitionMetaNames.RUNTIME_NAME: "python",
            client.repository.DefinitionMetaNames.RUNTIME_VERSION: "3.5",
            client.repository.DefinitionMetaNames.EXECUTION_COMMAND: "python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz --trainLabelsFile ${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz --testLabelsFile ${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 20000"
            }

Download the model files into the local runtime. There are two common locations for the model files:
- On the web like in a github repo for example. In that case, we can download the model files using wget or urllib
- In Cloud Object Storage. In that case, we'll download the model files using COS client.

For the case where model files are local on a user's machine, then they can be uploaded to your COS instance either using COS console or via Watson Studio's Data integration.

In what follows, we illustrate the code for both scenarios.

In [None]:
# Download the model files from the web in a github repository
filename = 'tf-model-hpo.zip'
# Remove file from local dir to download most recent version
file_path = os.path.join('.', filename)
if os.path.exists(file_path):
    remove_file_from_dir(filename,'.')
file_url = 'https://github.com/pmservice/wml-sample-models/blob/master/tensorflow/hand-written-digit-recognition/definition/tf-model-hpo.zip?raw=true'
download_file_from_url(file_url,'.')

In [None]:
# Download the model files from COS
bucket_project = 'dltutorial-donotdelete-pr-vaqsdvot2bm0jp'
file_to_download = 'tf-model.zip'
download_file_from_bucket(cos_client, bucket, file_to_download, save_path, is_redownload=False):

In [None]:
# Verify the model files are downloaded to local environment
file_path = os.path.join('.', filename)
if os.path.exists(file_path):
    print(file_path)

#### 3.1 Model Definition in WML
In the next cell, we show how to store a model definition in Watson Machine Learning.

Specifically, we pass two parameters:
1. **filename** which is a zip file that contains the required Python scripts which represent the neural network model as well as any other dependency scripts. For example, it is common to include another Python script for parsing and manipulating input data sets. 
2. **model_metadata** which defines several parameters like the deep learning framework, version,  runtime, and the execution command.

In this example notebook, we've specific the filename as **tf-model-hpo.zip** which is downloaded from a github repository and we defined the model metadata earlier, **model_definition_1_metadata**, to specify how to execute the scripts contained in the **filename**.

In [None]:
definition_1_details = client.repository.store_definition(filename, model_definition_1_metadata)

definition_1_url = client.repository.get_definition_url(definition_1_details)
definition_1_uid = client.repository.get_definition_uid(definition_1_details)
print(definition_1_url)

In this case we defined and stored only one model definition. However, it is possible to define multiple model definition where different definitions could correspond to different neural network architectures or different parameters for the same NN architecture.

For example, you can define one model to apply a fully connected neural network and another model to apply a 2 layer CNN and a third model to apply a 4 layer CNN and so on. Alternatively, you can have the same NN architecture, like a 4 layer CNN, but one definition may apply one dropout value and the other definition applies a different dropout value.

Display list of stored model definitions.

In [None]:
client.repository.list_definitions()

<a id="training_run"></a>
## 4. Training Run
Now that we have a training definition for the model(s), we're ready to define the training run. To do so, we need to define where to find the input training data and where to write the output training results.

### 4.1 Training Run Config Parameters
Get a list of supported config parameters for Experiments.

In [None]:
client.training.ConfigurationMetaNames.show()

For every training run, you need to specify where to get the input training data and where to write the training results. The input data should typically be stored in a Cloud Object Storage bucket with read permissions and the training results are typically written to a Cloud Object Storage bucket with read/write permissions.

Input training data information is specified with the __TRAINING_DATA_REFERENCE__ parameter which maps to __DATA_DIR__.

Output training results is specified with the __TRAINING_RESULTS_REFERENCE__ parameter which maps to __RESULTS_DIR__.

In [None]:
# Specify the parameters to connect to COS bucket containing the input training data
TRAINING_DATA_REFERENCE = {
    "connection": {
                    "endpoint_url": service_endpoint,
                    "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                    "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
    },
    "source": {
                "bucket": mnist_training_data_bucket,
    },
    "type": "s3"
}

In [None]:
# Specify the parameters to connect to COS bucket where to write training results
TRAINING_RESULTS_REFERENCE = {
    "connection": {
                    "endpoint_url": service_endpoint,
                    "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                    "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
    },
    "target": {
                "bucket": mnist_training_results_bucket,
    },
    "type": "s3"
}


<a id="exp"></a>
## 5. Experiment Definition
Practically, data scientists would run **experiments** where each experiment consists of one or more training runs. 

Typically, running an experiment involves varying one or more parameters (like learning rate, convolution filter size, regularization parameter, weight initialization, ...) and recording the metric(s) of interest for the variety of these parameters.

### 5.1 Experiment Parameters
Get a list of supported config parameters for Experiments.

In [None]:
client.repository.ExperimentMetaNames.show()

For every experiment, you need to specify where to get the input training data and where to write the training results. The input data should typically be stored in a Cloud Object Storage bucket with read permissions and the training results are typically written to a Cloud Object Storage bucket with read/write permissions.

Input training data information is specified with the __TRAINING_DATA_REFERENCE__ parameter which maps to __DATA_DIR__.

Output training results is specified with the __TRAINING_RESULTS_REFERENCE__ parameter which maps to __RESULTS_DIR__.

As you can see from the Experiment config parameters command, at a minimum, the following information is required:
- NAME which is a name for the experiment
- TRAINING_REFERENCES
- TRAINING_DATA_REFERENCE which specifies the location where to read the input training data. 
- TRAINING_RESULTS_REFERENCE which specifies the location where to write the training results.



#### 5.1.1 TRAINING_DATA_REFERENCE
Specify where to read the input training data from.

In [None]:
TRAINING_DATA_REFERENCE = {
    "connection": {
                    "endpoint_url": service_endpoint,
                    "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                    "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
    },
    "source": {
                "bucket": mnist_training_data_bucket,
    },
    "type": "s3"
}

#### 5.1.2 TRAINING_RESULTS_REFERENCE
Specify where to write the training results to.

In [None]:
TRAINING_RESULTS_REFERENCE = {
    "connection": {
                    "endpoint_url": service_endpoint,
                    "aws_access_key_id": cos_credentials['cos_hmac_keys']['access_key_id'],
                    "aws_secret_access_key": cos_credentials['cos_hmac_keys']['secret_access_key']
    },
    "target": {
                "bucket": mnist_training_results_bucket,
    },
    "type": "s3"
}

#### 5.1.3 TRAINING_REFERENCES
Next, we need to specify the configuration parameters for the experiment.

An experiment consists of one or more training definitions and for each training definition, we can specify configuration for hyper parameter optimization.

Hyperparameter Optimization (HPO) is a mechanism for automatically exploring a search space of potential Hyperparameters, building a series of models and comparing the models using metrics of interest. To use HPO you must specify ranges of values to explore for each Hyperparameter.

Currently, two HPO algorithms are supported:
- **random** implements a simple algorithm which will randomly assign Hyperparameter values from the ranges specified for an experiment.
- **rbfopt** uses a technique called RBFOpt to explore the search space. RBFOpt is a Python library for black-box optimization (also known as derivative-free optimization). For more details, check the [user manual](https://github.com/coin-or/rbfopt/blob/master/manual.pdf)

In the configuration, we should select which algorithm to use for optimization (example below specifies **RBFOpt**), the objective to optimize (example below specifies **accuracy**), and the number of optimizer steps which sets an upper bound on the number of models which HPO will train (example below specifies **10**).

Note that the **accuracy** metric needs to be a metric reported by the model. So in the convulional_network.py, there must be code that computes the **accuracy** metric or else HPO can't really optimize based on that metric as it needs to compare models based on that metric.

The second subsection in HPO configuration is the hyper parameters; need to specify which hyper parameters to vary and how to vary them. In the example below, we select the learning_rate, dropout, and batch_size as parameters to vary. 

Note again that these should be defined in the NN model as described in the script modeling that NN. 

For more information on hyper parameter optimization:
https://dataplatform.test.cloud.ibm.com/docs/content/analyze-data/ml_dlaas_hpo.html?context=wdp

In [None]:
HPO = {
        "method": {
            "name": "rbfopt", # name of the algo -- choose rbfopt
            "parameters": [
                client.experiments.HPOMethodParam("objective", "accuracy"),
                client.experiments.HPOMethodParam("maximize_or_minimize", "maximize"),
                client.experiments.HPOMethodParam("num_optimizer_steps", 10)
            ]
        },
        "hyper_parameters": [
            client.experiments.HPOParameter('learning_rate', min=0.0001, max=0.01, step=0.0005),
            client.experiments.HPOParameter('dropout', min=0.01, max=0.99, step=0.1),
            client.experiments.HPOParameter('batch_size', min=32, max=256, step=32)
        ]
     }          

Configure your experiment. TRAINING_REFERENCES links previously stored training definitions and provides information about compute_configuration that will be used to run the training.

**Note** Change the Experiment Name so it is easier to track in the Watson Studio Experiments assets.

In [None]:
experiment_metadata = {
    client.repository.ExperimentMetaNames.NAME: "MNIST-HPO-Experiment",
    client.repository.ExperimentMetaNames.AUTHOR_NAME: "IBM Watson",
    client.repository.ExperimentMetaNames.DESCRIPTION: "MNIST Tensorflow Experiment, 1 training definition, 10 HPO models",
    client.repository.ExperimentMetaNames.EVALUATION_METHOD: "multiclass",
    client.repository.ExperimentMetaNames.EVALUATION_METRICS: ["accuracy"],
    client.repository.ExperimentMetaNames.TRAINING_DATA_REFERENCE: TRAINING_DATA_REFERENCE,
    client.repository.ExperimentMetaNames.TRAINING_RESULTS_REFERENCE: TRAINING_RESULTS_REFERENCE,
    client.repository.ExperimentMetaNames.TRAINING_REFERENCES: [
        {
            "name": "HPO-MNIST",
            "training_definition_url": definition_1_url,
            "compute_configuration": {"name": "k80x2"},
            "hyper_parameters_optimization": HPO
        }
    ]
}

### 5.2 Experiment Utilities
In the next few cells, we illustrate various experiment utilities with Watson Machine Learning. We can store an experiment in WML repository, list all stored experiments, get experiment definition, and update experiment definition.

In [None]:
# get the details
experiment_details = client.repository.store_experiment(meta_props=experiment_metadata)

experiment_uid = client.repository.get_experiment_uid(experiment_details)
print(experiment_uid)

In [None]:
# List stored experiments
# dump the experiments w/ metadata to stdout 
client.repository.list_experiments()

In [None]:
import json
# Get Experiment definition
#details is a python dict
details = client.repository.get_experiment_details(experiment_uid)
print(json.dumps(details, indent=2))

### 5.3 Running Experiments
Once an experiment is defined, we can run the experiment and monitor the details

In [None]:
# Run the experiment
experiment_run_details = client.experiments.run(experiment_uid, asynchronous=True)

In [None]:
# List experiment runs
client.experiments.list_runs()

In [None]:
# Get experiment run uid
experiment_run_uid = client.experiments.get_run_uid(experiment_run_details)
print(experiment_run_uid)

**LIST training runs triggered by experiment run**

Run the cell below several times during the run to see updates or monitor for a few minutes.

In [None]:
# query the service ~ every minute
import time 

f = lambda x: time.sleep(6) if x%10!=0 else client.experiments.list_training_runs(experiment_run_uid)
[f(i) for i in range (60)]

In [None]:
# Get experiment run details
experiment_run_details = client.experiments.get_run_details(experiment_run_uid)
print(json.dumps(experiment_run_details, indent=2))

In [None]:
# Get experiment run status
# It is useful to understand the status of the experiment while it runs in the background
client.experiments.get_status(experiment_run_uid)

In [None]:
# Print Experiment details
experiment_details = client.experiments.get_details(experiment_uid)
print(json.dumps(experiment_details, indent=2))

In [None]:
# Get Training Run UID
experiment_run_details = client.experiments.get_run_details(experiment_run_uid)
training_run_uids = client.experiments.get_training_uids(experiment_run_details)

for i in training_run_uids:
    print(i)

### 5.4 Monitoring Experiments
Once an experiment run is triggered, use the following utilities to monitor a running experiment.

You can monitor experiment run by calling client.experiments.monitor_logs(run_uid). This method will stream training logs content to console.

Tip: You can also monitor particular training run by calling client.training.monitor_logs(training_run_uid). To get training_run_uid you can call method client.experiments.list_training_runs(experiment_run_uid)

In [None]:
client.experiments.monitor_logs(experiment_run_uid)

In [None]:
training_run_uid = 'training-VHE39EKmR_2'
client.training.monitor_logs(training_run_uid)

### 5.5 Evaluation metrics

You can get final evaluation metrics by running the cell below.

In [None]:
metrics = client.experiments.get_latest_metrics(experiment_run_uid)
print(json.dumps(metrics, indent=2))

In [None]:
all_metrics = client.experiments.get_metrics(experiment_run_uid)
print(json.dumps(all_metrics, indent=2))

### 5.6 **Optional** Visualization of Experiment Results
It is important to visualize the results of your experiments so you can understand the performance of your NN architecture as well as decide on next steps as you explore different architectures and/or parameters for your problem.

The following cells illustrate one visualization approach in Watson Studio by using plotly.

Another popular approach to visualize the results of your experiments is to use the Experiment Assistant in Watson Studio. For this to work, you need to add your Watson Machine Learning to your Watson Studio project:
- Navigate to your project's home
- On your project page, click on the **Settings** tab.
- On the **Settings** tab, scroll down to **Associated services** and click on  __Add service__ drop down (on the right side of the page) and select **Watson**.
- This pops a window with all the Watson services including the Machine Learning service. Click the Add link in the Machine Learning service tile.
- On the page that loads, select the __Existing__ tab and select the specific Machine Learning service where you ran your experiment.

Once you've associated the correct Watson Machine Learning service with your project, you can visualize the experiment results using the Experiment Assistant:
- Navigate to your project's home
- On your project page, click on the **Assets** tab.
- Scroll down to **Experiments** and click on the name of the experiment you just ran.
- This loads the Experiment Assistant which shows an  __Overview__ of the experiment, the associated __Training Runs__, and the experiment metrics with charts under __Compare Runs__ tab showing how the metrics change for the different runs.


In [None]:
!pip install --quiet cufflinks

In [None]:
import sys
import pandas
import plotly.plotly as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
import plotly.graph_objs as go

In [None]:
init_notebook_mode(connected=True)
sys.path.append("".join([os.environ["HOME"]])) 

In [None]:
import pandas as pd

metrics_df = pd.DataFrame(columns=['GUID', 'NAME', 'METRIC NAME', 'METRIC VALUE'])

for m in metrics:
    for v in m['metrics']['values']:
            metrics_df = metrics_df.append({'GUID': m['training_guid'], 'NAME': m['training_reference_name'], 'METRIC NAME': v['name'], 'METRIC VALUE': v['value']}, ignore_index=True)
    
metrics_df

In [None]:
data = []

for i in list(pd.unique(metrics_df['METRIC NAME'])):
    data.append(go.Bar(x=metrics_df[metrics_df['METRIC NAME'].isin([i])]['GUID'] + ' (' + metrics_df[metrics_df['METRIC NAME'].isin([i])]['NAME'] + ')', y=metrics_df[metrics_df['METRIC NAME'].isin([i])]['METRIC VALUE'], name=i))


layout = go.Layout(
    barmode='group'
)

fig = go.Figure(data=data, layout=layout)

iplot(fig)

<a id="summary"></a>
## 6. Summary
This notebook outlined how to run deep learning experiments from Jupyter Notebooks in Watson Studio.
Check out the [Documentation](https://dataplatform.cloud.ibm.com/docs/content/analyze-data/wml-setup.html) for further details.

Also, the following [github repository](https://github.com/biosopher/unofficial-watson-studio-python-utils) has great assets and utilities to simplify setting up and running deep learning experiments using Watson Machine Learning.


## Authors
**Joe Kozhaya** is an IBM Master Inventor and WorldWide Enablement lead for Watson Data & AI solutions.

Copyright © 2017, 2018 IBM. This notebook and its source code are released under the terms of the MIT License.