# Hyperparameter tuning

by using **Azure Machine Learning service** ([AML or AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)).  

Specifically, we utilize TensorFlow's higher level Estimator API to build [wide-and-deep model](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) for a movie recommendation scenario. While doing that, we try to search optimal hyperparameters via [AML hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters).

### Prerequisite

* azureml -- You can skip this if you already know what values of hyperparameters you want to use


For details about how to install and setup AML, see following materials:
- [AML quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
- [Train a TensorFlow model](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)
- [Hyperparameter tuning](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)

In [None]:
import sys
sys.path.append("../../")

import os
import shutil
import itertools

import pandas as pd

import tensorflow as tf

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.dnn import TensorFlow
from azureml.train.hyperdrive import *
from azureml.widgets import RunDetails

from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split

print("Azure ML SDK Version:", azureml.core.VERSION)
print("Tensorflow Version:", tf.__version__)

In [None]:
# top k items to recommend
TOP_K = 10

# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### Model Hyperparameter Tuning via AML

This section assumes you already created a **Azure ML workspace** and have a `./aml_config/config.json` file to load the workspace from this notebook. If not, please follow instructions in the [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) to create a workspace and make a `./aml_config/config.json` file containing:
```
{
    "subscription_id": "your-subscription-id",
    "resource_group": "your-resource-group",
    "workspace_name": "your-workspace-name"
}
```
  
From the following cells, we will
1. Create a remote compute target (gpu-cluster) if it does not exist already,
2. Mount data store and upload the training set, and
3. Run a hyperparameter tuning experiment.

First, let's connect to the workspace.

In [None]:
# Connect to a workspace
ws = Workspace.from_config()
print("Workspace name: ", ws.name)

Create a remote compute target

In [None]:
CLUSTER_NAME = 'gpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_NC6',
        vm_priority='lowpriority',
        min_nodes=1,
        max_nodes=4
    )
    # create the cluster
    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())

# Check list of aml-computes
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

Prepare dataset
1. Download data and train/test split
2. Upload to storage

Next, upload the training set to the data store. This example uses the workspace's default **blob storage**.
  
We also prepare a training script [wide_deep_training.py](../../reco_utils/aml/wide_deep_training.py) for the hyperparameter tuning, which will log our target metrics (e.g. [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation)) to AML experiment so that we can track the metrics and optimize it via **hyperdrive**.

```
TODO - maybe attach a code snippet here for description
1. logging part

2. wide and deep model

```

In [None]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=['UserId','MovieId','Rating','Timestamp'],
    title_col=None,
    genres_col=None,  # TODO to use genres, should encode
)
data.head()

In [None]:
train_df, eval_df = python_random_split(data, ratio=0.75, seed=123)

In [None]:
DATA_DIR = "./data"
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
EVAL_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_eval.pkl"
train_df.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))
eval_df.to_pickle(os.path.join(DATA_DIR, EVAL_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

Prepare training script. All the script in the folder will be uploaded

In [None]:
SCRIPT_DIR = "../../reco_utils"
ENTRY_SCRIPT_NAME = "aml/wide_deep_training.py"
# SCRIPT_DIR = './aml_scripts'
# ENTRY_SCRIPT_NAME = 'wide_deep_training.py'

# os.makedirs(SCRIPT_DIR, exist_ok=True)
# TODO maybe upload the entire reco_utils folder? will that work? -- feedback
# shutil.copy('../../reco_utils/aml/wide_deep_training.py', SCRIPT_DIR)
# shutil.copy('../../reco_utils/aml/tf_log_hook.py', SCRIPT_DIR)
# shutil.copy('../../reco_utils/common/tf_utils.py', SCRIPT_DIR)
# shutil.copy('../../reco_utils/evaluation/python_evaluation.py', SCRIPT_DIR)

Now we define a search space for the hyperparameters. All the parameter values will be passed to the training script where they are parsed by `argparse`, e.g.:
```
TODO code snippet for argparse
```
    
AML hyperdrive provides some very useful searching strategies including `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each approach are beyond the scope of this notebook and you can find them from [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the random sampling for simplicity. 

> Note: Currently, this repo accepts either 'rmse' or 'mae' for `METRICS` as implemented in [tf_utils.py](../../reco_utils/common/tf_utils.py), but you can define any custom metrics and utilize it along with AML hyperdrive.

In [None]:
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_wide_deep_params"
METRICS_LIST = ['RMSE', 'NDCG']

script_params = {
    '--datastore': ds.as_mount(),
    '--train-datapath': "data/" + TRAIN_FILE_NAME,
    '--eval-datapath': "data/" + EVAL_FILE_NAME,
    '--user-col': 'UserId',
    '--item-col': 'MovieId',
    '--rating-col': 'Rating',
    '--timestamp-col': 'Timestamp',
#     '--item-feat-col': 'Genres',
    '--batch-size': 64,
    '--epochs': 50,
    '--metrics-list': METRICS_LIST,
}

# hyperparameters search space
hyper_params = {
    '--model-type': choice('wide', 'deep', 'wide-deep'),
    # Wide model hyperparameters
    '--linear-optimizer': choice('Ftrl', 'SGD'),
    '--linear-optimizer-lr': loguniform(-6, -2),
    # Deep model hyperparameters
    '--dnn-optimizer': choice('Adagrad', 'Adam'),
    '--dnn-optimizer-lr': loguniform(-6, -2),
    '--dnn-user-embedding-dim': choice(4, 16),
    '--dnn-item-embedding-dim': choice(8, 64),
    '--dnn-hidden-units': choice(
        "256,256,256,128",
        "256,128",
        "256,64,256",
        "512,128,32"
    ),
    '--dnn-batch-norm': choice(True, False),
}

ps = RandomParameterSampling(hyper_params)

We use `azureml.train.dnn.TensorFlow`, a custom AML `Estimator` class which utilizes a preset docker image in the cluster (see more information from [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)).

Once you submit the experiment, you can see the progress from the notebook by using `azureml.widgets.RunDetails`. You can directly check the details from the Azure portal as well. To get the link, run `run.get_portal_url()`.

> Since we will do hyperparameter tuning, we create a `HyperDriveRunConfig` and pass it to the experiment object. If you already know what hyperparameters to use and still want to utilize AML for other purposes (e.g. model management), you can set the hyperparameter values directly to `script_params` and run the experiment, `run = exp.submit(est)`, instead.  

In [None]:
est = TensorFlow(
    source_directory=SCRIPT_DIR,
    entry_script=ENTRY_SCRIPT_NAME,
    script_params=script_params,
    compute_target=compute_target,
    use_gpu=True,
    conda_packages=['pandas', 'scikit-learn'],
)

# early termnination policy
policy = MedianStoppingPolicy(delay_evaluation=5)
# BanditPolicy(evaluation_interval=2, slack_factor=0.1)

hd_config = HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=ps,
    policy=policy,  
    primary_metric_name='rmse',
    primary_metric_goal=PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=4,  #20,
    max_concurrent_runs=4
)

# Create an experiment to track the runs in the workspace
exp = Experiment(workspace=ws, name=EXP_NAME)

# run = exp.submit(config=hd_config)
# TODO is this possible? to:
with exp.submit(config=hd_config) as run:
#     print(run.get_portal_url())
    RunDetails(run).show()
    run.wait_for_completion(show_output=True)

In [None]:
# TODO get bestrun and printout metrics!!!
best_run = run.get_best_run_by_primary_metric()


### Maybe show the worst model (or avg model) to demonstrate the importance of hyperparam tuning...


### TODO prediction...

In [None]:
run.cancel()

### Test

To load a registered model in the future,
```
from azureml.core.model import Model

model = Model(ws, 'model_name')
```

In [None]:
MODEL_DIR = './model'

best_run = run.get_best_run_by_primary_metric()
# Check model files uploaded during the run
print(best_run.get_file_names())

# Register the model in the workspace so that can later query, examine, and deploy this model.
# TODO check model path...
model = best_run.register_model(model_name=MODEL_NAME, model_path='./outputs/model')
print(model.name, model.id, model.version)

# Download the model to local. (alternatively, run.download_file(name=f, output_file_path=output_file_path))
os.makedirs(MODEL_DIR, exist_ok=True)
model.download(target_dir=MODEL_DIR)






"""       
        


tf.reset_default_graph()

saver = tf.train.import_meta_graph("./model/mnist-tf.model.meta")
graph = tf.get_default_graph()

for op in graph.get_operations():
    if op.name.startswith('network'):
        print(op.name)

# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.
X = tf.get_default_graph().get_tensor_by_name("network/X:0")
# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.
output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")

with tf.Session() as sess:
    saver.restore(sess, './model/mnist-tf.model')
    k = output.eval(feed_dict={X : X_test})
# get the prediction, which is the index of the element that has the largest probability value.
y_hat = np.argmax(k, axis=1)

# print the first 30 labels and predictions
print('labels:  \t', y_test[:30])
print('predictions:\t', y_hat[:30])





# TODO...
model_root = Model.get_model_path('tf-dnn-mnist')
    saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))
    X = tf.get_default_graph().get_tensor_by_name("network/X:0")
    output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")
    
    sess = tf.Session()
    saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    out = output.eval(session=sess, feed_dict={X: data})
    y_hat = np.argmax(out, axis=1)
    return y_hat.tolist()

"""


In [None]:
from reco_utils.evaluation.python_evaluation import (
    rmse, mae, rsquared, exp_var,
    map_at_k, ndcg_at_k, precision_at_k, recall_at_k
)

# Prepare test data
X_test = test.copy()
y_test = X_test.pop('Rating')

# test_input_fn = tf.estimator.inputs.pandas_input_fn(
#     x=X_test,
#     num_epochs=1,
#     shuffle=False
# )


In [None]:
model = tf.contrib.predictor.from_saved_model(MODEL_DIR+"/model/1546314741")
# model = tf.contrib.estimator.SavedModelEstimator(MODEL_DIR+"/model/1546314741")

# Convert input data into serialized Example strings.
examples = []
for index, row in X_test.iterrows():
    feature = {}
    for col, value in row.iteritems():
        feature[col] = tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    example = tf.train.Example(
        features=tf.train.Features(
            feature=feature
        )
    )
    examples.append(example.SerializeToString())

predictions = model({'inputs': examples})

In [None]:
import pandas as pd

# def predict_input_fn():
#     example = tf.train.Example()
#     example.features.feature['feature1'].bytes_list.value.extend(['yellow'])
#     example.features.feature['feature2'].float_list.value.extend([1.])
#     return {'inputs':tf.constant([example.SerializeToString()])}

# If all modes were exported, you can immediately evaluate and predict, or
# continue training. Otherwise only predict is available.
# See https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/export_all_saved_models

# eval_results = model.evaluate(input_fn=input_fn, steps=1)
# print(eval_results)
# model.train(input_fn=input_fn, steps=20)



predictions = predict_fn(
    {"x": [[6.4, 3.2, 4.5, 1.5],
           [5.8, 3.1, 5.0, 1.7]]})
print(predictions['scores'])



pred_list = [p['predictions'][0] for p in list(model.predict(predict_input_fn))]
predictions = test.copy()
predictions['prediction']  = pd.Series(pred_list).values
print(predictions.head())

cols = {
    'col_user': 'UserId',
    'col_item': 'MovieId',
    'col_rating': 'Rating',
    'col_prediction': 'prediction'
}

predictions.drop('Rating', axis=1, inplace=True)

eval_rmse = rmse(test, predictions, **cols)
eval_mae = mae(test, predictions, **cols)
eval_rsquared = rsquared(test, predictions, **cols)
eval_exp_var = exp_var(test, predictions, **cols)

print("RMSE:\t\t%f" % eval_rmse,
      "MAE:\t\t%f" % eval_mae,
      "rsquared:\t%f" % eval_rsquared,
      "exp var:\t%f" % eval_exp_var, sep='\n')

# Load the downloaded model and test
# with tf.Session() as sess:
#     tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], MODEL_DIR)

# #     test_input_fn = tf.estimator.inputs.pandas_input_fn(
# #         x=X_test,
# #         y=y_test,
# #         batch_size=BATCH_SIZE,
# #         num_epochs=1,
# #         shuffle=False
# #     )
    
#     input_x_holder =sess.graph.get_operation_by_name("input_example_tensor").outputs[0]
# #check your dnn classifier txt pb to know which operation you should use.
# predictions_holder = sess.graph.get_operation_by_name("dnn/binary_logistic_head/predictions/probabilities").outputs[0]
    
#     predictor = tf.contrib.predictor.from_saved_model(MODEL_DIR)
#         model_input = tf.train.Example(features=tf.train.Features( feature={"words": tf.train.Feature(int64_list=tf.train.Int64List(value=features_test_set)) })) 
#         model_input = model_input.SerializeToString()
#         output_dict = predictor({"predictor_inputs":[model_input]})
#         y_predicted = output_dict["pred_output_classes"][0]
#         output_dict['scores']

#         input_tensor=tf.get_default_graph().get_tensor_by_name("input_tensors:0")
#         model_input=input_tensor.SerializeToString()        
#         output_dict= predictor({"inputs":[model_input]})
        

        

In [None]:
# Clean-up resources
ws.delete(delete_dependent_resources=True)

# optionally, delete the Azure Managed Compute cluster
compute_target.delete()

# Clean-up temporal local-copy of script, model and data files
shutil.rmtree(SCRIPT_DIR)
shutil.rmtree(DATA_DIR)
shutil.rmtree(MODEL_DIR)

### References

* [Fine-tune natural language processing models using Azure Machine Learning service](https://azure.microsoft.com/en-us/blog/fine-tune-natural-language-processing-models-using-azure-machine-learning-service/)
* [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)
