# Tutorial #1: Train an text classification model with Azure Machine Learning

In this tutorial, you train a machine learning model both locally and on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook.  You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**.  

////
This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning.  MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. 
////

Learn how to:

> * Set up your development environment
> * Access and examine the data
> * Train a simple logistic regression model locally using the popular scikit-learn machine learning library 
> * Train multiple models on a remote cluster
> * Review training results, find and register the best model

You'll learn how to select a model and deploy it in [part two of this tutorial](deploy-models.ipynb) later. 

## Prerequisites

Use [these instructions](https://aka.ms/aml-how-to-configure-environment) to:  
* Create a workspace and its configuration file (**config.json**)  
* Save your **config.json** to the same folder as this notebook

## Set up your development environment

All the setup for your development work can be accomplished in a Python notebook.  Setup includes:

* Importing Python packages
* Connecting to a workspace to enable communication between your local computer and remote resources
* Creating an experiment to track all your runs
* Creating a remote compute target to use for training

### Import packages

Import Python packages you need in this session. Also display the Azure Machine Learning SDK version.

In [2]:
%matplotlib inline
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

import azureml
from azureml.core import Workspace, Run

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.2


### Connect to workspace

Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`.

In [3]:
# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

Found the config file in: /data/home/erimen_cse/AMLProjects/MachineLearningNotebooks/tutorials/config.json
azurenlp	westus2	mshsharedrg	westus2


### Create experiment

Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. 

In [5]:
experiment_name = 'keras-reuters'

from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)

### Create or Attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.

In [29]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "gpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
# vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_NC6")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

creating a new compute target...
Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-01-21T02:00:28.714000+00:00', 'creationTime': '2019-01-21T01:59:40.500222+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-01-21T02:00:36.568963+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


You now have the necessary packages and compute resources to train a model in the cloud. 

## Explore data

Before you train a model, you need to understand the data that you are using to train it.  You also need to copy the data into the cloud so it can be accessed by your cloud training environment.  In this section you learn how to:

* Download the Reuters dataset
* Display some sample news text
* Upload data to the cloud

### Download the Reuters dataset

Download the MNIST dataset and save the files into a `data` directory locally.  Images and labels for both training and testing are downloaded.

In [14]:
import os
import urllib.request

os.makedirs('./data-reuters', exist_ok = True)

urllib.request.urlretrieve('https://s3.amazonaws.com/text-datasets/reuters.npz', filename='./data-reuters/reuters.npz')
urllib.request.urlretrieve('https://s3.amazonaws.com/text-datasets/reuters_word_index.json', filename='./data-reuters/reuters_word_index.json')

('./data-reuters/reuters_word_index.json',
 <http.client.HTTPMessage at 0x7fa2b4ac38d0>)

Now you have an idea of what these images look like and the expected prediction outcome.

### Upload data to the cloud

Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.

The MNIST files are uploaded into a directory named `mnist` at the root of the datastore.

In [23]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

ds.upload(src_dir='./data-reuters', target_path='reuters', overwrite=True, show_progress=True)

AzureBlob azurenlp7906480779 azureml-blobstore-f0b90d41-6629-4f43-a97d-d06934eb98ca


$AZUREML_DATAREFERENCE_7e91fdf9959e4d4b89d8ea8435343120

You now have everything you need to start training a model. 

## Train a local model

Train a simple logistic regression model using scikit-learn locally.

**Training locally can take a minute or two** depending on your computer configuration.

https://towardsdatascience.com/text-classification-in-keras-part-1-a-simple-reuters-news-classifier-9558d34d01d3

In [7]:
import keras
from keras.datasets import reuters

Using TensorFlow backend.


In [8]:
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)

Downloading data from https://s3.amazonaws.com/text-datasets/reuters.npz


In [9]:
word_index = reuters.get_word_index(path="reuters_word_index.json")

Downloading data from https://s3.amazonaws.com/text-datasets/reuters_word_index.json


In [20]:
print('# of Training Samples: {}'.format(len(x_train)))
print('# of Test Samples: {}'.format(len(x_test)))

num_classes = max(y_train) + 1
print('# of Classes: {}'.format(num_classes))
# of Training Samples: 8982
# of Test Samples: 2246
# of Classes: 46
index_to_word = {}
for key, value in word_index.items():
    index_to_word[value] = key
print(' '.join([index_to_word[x] for x in x_train[8981]]))
print(y_train[0])

# of Training Samples: 8982
# of Test Samples: 2246
# of Classes: 46
the mths always march prerecorded interest attend that in simon shares a in season secretary simon better envisaged liquidation 3 could added mln a for shares further last in below mln 120 march crowley congressional make a cash barred and overnight that japan 124 lme direct and competing in talks proposal a one per important for withheld subsidiaries general 3 that although 1 108 by in shares states will provisionally 1p liquidation mln in 8 mln in political output as owners 50 vanadium system assets closing 3 by day mill ec next 3 in review 1990s novel over bhd miles a in offer pct dlrs
3


In [None]:
for key, value in word_index.items():
    print (key)
    print (value)

In [21]:
import numpy
unique, counts = numpy.unique(y_train, return_counts=True)

In [22]:
counts

array([  55,  432,   74, 3159, 1949,   17,   48,   16,  139,  101,  124,
        390,   49,  172,   26,   20,  444,   39,   66,  549,  269,  100,
         15,   41,   62,   92,   24,   15,   48,   19,   45,   39,   32,
         11,   50,   10,   49,   19,   19,   24,   36,   30,   13,   21,
         12,   18])

In [11]:
from keras.preprocessing.text import Tokenizer

max_words = 10000

tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(x_train[0])
print(len(x_train[0]))

print(y_train[0])
print(len(y_train[0]))

[0. 1. 0. ... 0. 0. 0.]
10000
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
46


In [12]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.metrics_names)
['loss', 'acc']
batch_size = 32
epochs = 3

history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)

score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

['loss', 'acc']
Train on 8083 samples, validate on 899 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test loss: 0.8825503981548659
Test accuracy: 0.797417631397689


In [69]:
history.history['val_acc']

[0.7975528364849833, 0.8164627364400496, 0.8075639600218072]

With just a few lines of code, you have a 92% accuracy.

## Train on a remote cluster

Now you can expand on this simple model by building a model with a different regularization rate. This time you'll train the model on a remote resource.  

For this task, submit the job to the remote training cluster you set up earlier.  To submit a job you:
* Create a directory
* Create a training script
* Create an estimator object
* Submit the job 

### Create a directory

Create a directory to deliver the necessary code from your computer to the remote resource.

In [13]:
import os
script_folder = './keras-reuters'
os.makedirs(script_folder, exist_ok=True)

### Create a training script

To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version.

In [72]:
%%writefile $script_folder/train.py

import argparse
import os
import numpy as np

import keras
from keras.datasets import reuters
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

from azureml.core import Run
# from utils import load_data

# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--batch-size', type=int, dest='bs', default=32, help='batch size')
parser.add_argument('--epochs', type=int, dest='epochs', default=3, help='num of epochs')

args = parser.parse_args()

data_folder = os.path.join(args.data_folder, 'reuters')
print('Data folder:', data_folder)

# load train and test set into numpy arrays
# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.
# X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
# X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
# y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
# y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
# print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\n')

(x_train, y_train), (x_test, y_test) = reuters.load_data(path = os.path.join(data_folder, 'reuters.npz'),
                                                         num_words=None, 
                                                         test_split=0.2)
word_index = reuters.get_word_index(path=os.path.join(data_folder,"reuters_word_index.json"))

print(x_train.shape, y_train.shape, x_test.shape, y_test.shape, sep = '\n')
num_classes = max(y_train) + 1
max_words = 10000

tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print(x_train.shape, y_train.shape, x_test.shape, y_test.shape, sep = '\n')

# get hold of the current run
run = Run.get_context()

# print('Train a logistic regression model with regularizaion rate of', args.reg)
# clf = LogisticRegression(C=1.0/args.reg, random_state=42)
# clf.fit(X_train, y_train)

print ('Train a classification model')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.metrics_names)
['loss', 'acc']
# batch_size = 32
# epochs = 3

history = model.fit(x_train, y_train, batch_size=args.bs, epochs=args.epochs, verbose=1, validation_split=0.1)
run.log('validation_acc', np.float(history.history['val_acc'][-1]))

# print('Predict the test set')
# y_hat = clf.predict(X_test)

print('Predict the test set')
score = model.evaluate(x_test, y_test, batch_size=args.bs, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# # calculate accuracy on the prediction
# acc = np.average(y_hat == y_test)
# print('Accuracy is', acc)

# run.log('regularization rate', np.float(args.reg))
# run.log('accuracy', np.float(acc))

run.log('batch size', np.int32(args.bs))
run.log('epochs', np.int32(args.epochs))
run.log('accuracy', np.float(score[1]))

os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
#joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
model.save('outputs/reuters_model.h5')

Overwriting ./keras-reuters/train.py


### Create an estimator

An estimator object is used to submit the run.  Create your estimator by running the following code to define:

* The name of the estimator object, `est`
* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. 
* The compute target.  In this case you will use the AmlCompute you created
* The training script name, train.py
* Parameters required from the training script 
* Python packages needed for training

In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`).

In [60]:
from azureml.train.estimator import Estimator

script_params = {
    '--data-folder': ds.as_mount(),
    '--batch-size': 32,
    '--epochs' : 5
}

est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker = True,
                entry_script='train.py',
                conda_packages=['keras', 'tensorflow-gpu'],
                pip_packages = ['azure-cli-core<2.0.55'],
                use_gpu = True)

### Submit the job to the cluster

Run the experiment by submitting the estimator object.

In [61]:
run = exp.submit(config=est)
run

Experiment,Id,Type,Status,Details Page,Docs Page
keras-reuters,keras-reuters_1548056330117,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.

## Monitor a remote run

In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.

Here is what's happening while you wait:

- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. 

  This stage happens once for each Python environment since the container is cached for subsequent runs.  During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.

- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**

- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.

- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.


You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. 

### Jupyter widget

Watch the progress of the run with a Jupyter widget.  Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

In [62]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSET',…

### Get log results upon completion

Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete.



In [63]:
run.wait_for_completion(show_output=False) # specify True for a verbose log

{'runId': 'keras-reuters_1548056330117',
 'target': 'gpucluster',
 'status': 'Completed',
 'startTimeUtc': '2019-01-21T07:47:29.153386Z',
 'endTimeUtc': '2019-01-21T07:51:21.866915Z',
 'properties': {'azureml.runsource': 'experiment',
  'ContentSnapshotId': '1f740b9f-fa4b-4213-9c66-71ce014e33b8'},
 'runDefinition': {'Script': 'train.py',
  'Arguments': ['--data-folder',
   '$AZUREML_DATAREFERENCE_workspaceblobstore',
   '--batch-size',
   '32',
   '--epochs',
   '5'],
  'SourceDirectoryDataStore': None,
  'Framework': 0,
  'Communicator': 0,
  'Target': 'gpucluster',
  'DataReferences': {'workspaceblobstore': {'DataStoreName': 'workspaceblobstore',
    'Mode': 'Mount',
    'PathOnDataStore': None,
    'PathOnCompute': None,
    'Overwrite': False}},
  'JobName': None,
  'AutoPrepareEnvironment': True,
  'MaxRunDurationSeconds': None,
  'NodeCount': 1,
  'Environment': {'Python': {'InterpreterPath': 'python',
    'UserManagedDependencies': False,
    'CondaDependencies': {'name': 'proje

### Display run results

You now have a model trained on a remote cluster.  Retrieve the accuracy of the model:

In [64]:
print(run.get_metrics())

{'batch size': 32, 'epochs': 5, 'accuracy': 0.8045414069456812}


You can see files associated with that run.

In [66]:
run.get_file_names()

['azureml-logs/60_control_log.txt',
 'azureml-logs/80_driver_log.txt',
 'outputs/reuters_model.h5',
 'azureml-logs/azureml.log',
 'azureml-logs/55_batchai_execution.txt']

## Register model

The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this  directory is automatically uploaded to your workspace.  This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.

Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.

In [None]:
# TODO register model 
model = run.register_model(model_name='keras_reuters', model_path='outputs/reuters_model.h5')
print(model.name, model.id, model.version, sep = '\t')

## Intelligent hyperparameter tuning (Optional)
We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.

In [65]:
from azureml.train.hyperdrive import *

ps = RandomParameterSampling(
    {
        '--batch-size': choice(25, 50, 100),
        '--epochs' : choice(1,3,5,10)
    }
)

Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep.

In [73]:
from azureml.train.estimator import Estimator

script_params = {
    '--data-folder': ds.as_mount()
}

est_hd = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                use_docker = True,
                entry_script='train.py',
                conda_packages=['keras', 'tensorflow-gpu'],
                pip_packages = ['azure-cli-core<2.0.55'],
                use_gpu = True)

Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric.

In [68]:
policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster.

In [75]:
htc = HyperDriveRunConfig(estimator=est_hd, 
                          hyperparameter_sampling=ps, 
                          policy=policy, 
                          primary_metric_name='validation_acc', 
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                          max_total_runs=8,
                          max_concurrent_runs=4)

In [76]:
htr = exp.submit(config=htc)

We can use a run history widget to show the progress. Be patient as this might take a while to complete.

In [77]:
RunDetails(htr).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSE…

In [78]:
htr.wait_for_completion(show_output=True)

RunId: keras-reuters_1548058646680

Execution Summary
RunId: keras-reuters_1548058646680



{'runId': 'keras-reuters_1548058646680',
 'target': 'gpucluster',
 'status': 'Completed',
 'endTimeUtc': '2019-01-21T08:28:39.000Z',
 'properties': {'primary_metric_config': '{"name": "validation_acc", "goal": "maximize"}',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive'},
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://azurenlp7906480779.blob.core.windows.net/azureml/ExperimentRun/dcid.keras-reuters_1548058646680/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=QJ6po04QmWIhMsUaJWpk4TW4IWmtfGzoCm%2BZ9z9N0BA%3D&st=2019-01-21T19%3A47%3A52Z&se=2019-01-22T03%3A57%3A52Z&sp=r'}}

In [82]:
print(htr.get_metrics())

{'keras-reuters_1548058646680_0': {'validation_acc': [0.7964404881729831], 'batch size': [25], 'epochs': [10], 'accuracy': [0.8045414041326392]}, 'keras-reuters_1548058646680_1': {'validation_acc': [0.8109009944159409], 'batch size': [50], 'epochs': [3], 'accuracy': [0.8098842314811869]}, 'keras-reuters_1548058646680_2': {'validation_acc': [0.8109009944159409], 'batch size': [50], 'epochs': [3], 'accuracy': [0.8103294683585502]}, 'keras-reuters_1548058646680_3': {'validation_acc': [0.8186874333955555], 'batch size': [100], 'epochs': [3], 'accuracy': [0.807212823804531]}, 'keras-reuters_1548058646680_4': {'validation_acc': [0.7853170200370178], 'batch size': [50], 'epochs': [1], 'accuracy': [0.794300974264595]}, 'keras-reuters_1548058646680_5': {'validation_acc': [0.802002226550534], 'batch size': [25], 'epochs': [3], 'accuracy': [0.8049866433984344]}, 'keras-reuters_1548058646680_6': {'validation_acc': [0.7997775272081903], 'batch size': [100], 'epochs': [10], 'accuracy': [0.8045414015

## Find and register best model
When all the jobs finish, we can find out the one that has the highest accuracy.

In [79]:
best_run = htr.get_best_run_by_primary_metric()

Now let's list the model files uploaded during the run.

In [80]:
print(best_run.get_file_names())

['azureml-logs/60_control_log.txt', 'azureml-logs/80_driver_log.txt', 'outputs/reuters_model.h5', 'azureml-logs/azureml.log', 'azureml-logs/55_batchai_execution.txt']


We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment.

In [81]:
model = best_run.register_model(model_name='keras_reuters_best', model_path='outputs/reuters_model.h5')

Deprecated, use RunHistoryFacade.assets instead.
