# HyperDrive Run Recovery
In this notebook, we recover into Python the HyperDrive run that was created in the previous notebook, and use it to find the best child run discovered by the HyperDrive search.

The steps in this notebook are
- [import libraries](#import),
- [read in the Azure ML workspace](#workspace),
- [recover a run](#recover), and
- [get the results](#results).

## Imports  <a id='import'></a>

In [1]:
import os
import shutil
import json
import time
import pandas as pd
from azureml.core import Workspace, Experiment, Run, get_run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
import azureml.core
from msrest.exceptions import HttpOperationError
from get_auth import get_auth
print('azureml.core.VERSION={}'.format(azureml.core.VERSION))

tensorflow module is not present, models based on tensorflow would not work
azureml.core.VERSION=1.0.21


## Read in the Azure ML workspace  <a id='workspace'></a>
Read in the the workspace created in a previous notebook.

In [2]:
auth = get_auth()
ws = Workspace.from_config(auth=auth)
ws_details = ws.get_details()
print('Name:\t\t{}\nLocation:\t{}'
      .format(ws_details['name'],
              ws_details['location']))

Trying to create Workspace with CLI Authentication
Found the config file in: /data/home/mabou/Source/Repos/MLHyperparameterTuning/aml_config/config.json
Name:		hypetuning
Location:	eastus


## Recover the run  <a id='recover'></a>
Get an experiment that ran the search.

In [3]:
exp = Experiment(workspace=ws, name='hypetuning')

Get the ID of the HyperDrive run created in the last notebook. That ID was printed with the run when it was submitted in the previous notebook, and we also saved it in a file. You can also find that ID in Azure Portal on your experiment's page. To see it, you may need to add a `RunId` column to the experiment's table of runs.

In [4]:
run_id_path = "run_id.txt"
with open(run_id_path, "r") as fp:
    run_id = fp.read()
run_id

'AutoML_ef96bf9d-54e7-4c0e-b365-b2f507ef80d9'

Use the ID of the AutoML run to get a handle to it.

In [5]:
run = get_run(exp, run_id, rehydrate=True)
run

Experiment,Id,Type,Status,Details Page,Docs Page
hypetuning,AutoML_ef96bf9d-54e7-4c0e-b365-b2f507ef80d9,automl,Completed,Link to Azure Portal,Link to Documentation


## Get the results <a id='results'></a>
Get the metrics logged with each run.

In [14]:
run_metrics = run.get_metrics(recursive=True)

Get a series with each run's accuracy.

In [15]:
run_accuracy = pd.Series([x['accuracy'] for x in run_metrics.values()], index=run_metrics.keys(), name='accuracy')

Find the RunId of the best run.

In [16]:
best_run_id = run_accuracy.idxmax()

Use it to recover the best run.

In [17]:
best_run = get_run(exp, best_run_id, rehydrate=True)
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
hypetuning,AutoML_ef96bf9d-54e7-4c0e-b365-b2f507ef80d9_199,azureml.scriptrun,Completed,Link to Azure Portal,Link to Documentation


In [21]:
best_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'azureml-logs/55_batchai_execution.txt',
 'azureml-logs/60_control_log.txt',
 'azureml-logs/80_driver_log.txt',
 'azureml-logs/azureml.log',
 'confusion_matrix',
 'outputs/model.pkl']

In [24]:
best_run_model_path = os.path.join("outputs", best_run_id + ".pkl")
best_run.download_file("outputs/model.pkl", best_run_model_path)

In [26]:
from sklearn.externals import joblib
best_run_model = joblib.load(best_run_model_path)

In [27]:
best_run_model

Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(is_onnx_compatible=None, logger=None, task=None)), ('prefittedsoftvotingclassifier', PreFittedSoftVotingClassifier(classification_labels=None,
               estimators=[('LogisticRegression_35', Pipeline(memory=None,
     steps=[('standardscalerwrapper', <...666666666667, 0.26666666666666666, 0.06666666666666667, 0.06666666666666667, 0.13333333333333333]))])

Read in the test data.

In [33]:
data_path = "data"
test_path = os.path.join(data_path, "balanced_pairs_test.tsv")
test = pd.read_csv(test_path, sep='\t', encoding='latin1')

In [34]:
feature_columns = ["Text_x", "Text_y"]
label_column = "Label"
group_column = 'Id_x'
answerid_column = 'AnswerId_y'
test_X = (test.Text_x + ' ' + test.Text_y).values  # test[feature_columns]
test_y = test[label_column]

In [35]:
test['probabilities'] = best_run_model.predict_proba(test_X)[:, 1]

In [36]:
# Order the testing data by dupe Id and question AnswerId.
test.sort_values([group_column, answerid_column], inplace=True)

# Extract the ordered probabilities.
probabilities = (
    test.probabilities
    .groupby(test[group_column], sort=False)
    .apply(lambda x: tuple(x.values)))

# Get the individual records.
output_columns_x = ['Id_x', 'AnswerId_x', 'Text_x']
test_score = (test[output_columns_x]
              .drop_duplicates()
              .set_index(group_column))
test_score['probabilities'] = probabilities
test_score.reset_index(inplace=True)
test_score.columns = ['Id', 'AnswerId', 'Text', 'probabilities']

In [37]:
import numpy as np

def score_rank(scores):
    """Compute the ranks of the scores."""
    return pd.Series(scores).rank(ascending=False)


def label_index(label, label_order):
    """Compute the index of label in label_order."""
    loc = np.where(label == label_order)[0]
    if loc.shape[0] == 0:
        return None
    return loc[0]


def label_rank(label, scores, label_order):
    """Compute the rank of label using the scores."""
    loc = label_index(label, label_order)
    if loc is None:
        return len(scores) + 1
    return score_rank(scores)[loc]

In [39]:
print("Evaluating the model's performance.")

test_rank = test.groupby(group_column).apply(
    lambda x: label_rank(x.AnswerId_x.values,
                         x.probabilities.values,
                         x.AnswerId_y.values))

args_rank = 3
for i in range(1, args_rank+1):
    print('Accuracy @{} = {:.2%}'
          .format(i, (test_rank <= i).mean()))
mean_rank = test_rank.mean()
print('Mean Rank {:.4f}'.format(mean_rank))

Evaluating the model's performance.
Accuracy @1 = 18.66%
Accuracy @2 = 29.80%
Accuracy @3 = 34.50%
Mean Rank 19.2174


In [7]:
help(run)

Help on Run in module azureml.core.run object:

class Run(azureml._run_impl.run_base._RunBase)
 |  The base class for all experiment runs.
 |  
 |  A *run* represents a single trial of an experiment, and allows monitoring and logging.
 |  
 |  .. remarks::
 |  
 |      A *run* represents a single trial of an experiment.  A run is the object used to monitor the
 |      asynchronous execution of a trial, log metrics and store output of the trial,
 |      and to analyze results and access artifacts generated by the trial.
 |  
 |      Run is used inside of your experimentation code to log metrics and artifacts to the Run History service.
 |  
 |      Run is used  outside of your experiments monitor progress and to query and analyze
 |      the metrics and results that were generated.
 |  
 |      Functionality includes:
 |  
 |      *  Storing and retrieving metrics and data
 |      *  Uploading and downloading files
 |      *  Using tags as well as the child hierarchy for easy lookup of 

In [10]:
automl_run = AutoMLRun(exp, run_id)

In [13]:
help(automl_run)

Help on AutoMLRun in module azureml.train.automl.run object:

class AutoMLRun(azureml.core.run.Run)
 |  AutoMLRun has information of the experiment runs that correspond to the AutoML run.
 |  
 |  This class can be used to manage, check status, and retrieve run details
 |  once a AutoML run is submitted.
 |  
 |  :param experiment: The experiment associated to the run.
 |  :type experiement: azureml.core.Experiment
 |  :param run_id: The id associated to the run.
 |  :type run_id: str
 |  
 |  Method resolution order:
 |      AutoMLRun
 |      azureml.core.run.Run
 |      azureml._run_impl.run_base._RunBase
 |      azureml._logging.chained_identity.ChainedIdentity
 |      azureml.core._portal.HasRunPortal
 |      azureml.core._portal.HasExperimentPortal
 |      azureml.core._portal.HasWorkspacePortal
 |      azureml.core._portal.HasPortal
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, experiment, run_id, **kwargs)
 |      Initialize an AutoML run.
 |    

In [14]:
automl_output = automl_run.get_output()

In [16]:
automl_output[0]

Experiment,Id,Type,Status,Details Page,Docs Page
hypetuning,AutoML_ef96bf9d-54e7-4c0e-b365-b2f507ef80d9_199,azureml.scriptrun,Completed,Link to Azure Portal,Link to Documentation


In [17]:
automl_output[1]

Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(is_onnx_compatible=None, logger=None, task=None)), ('prefittedsoftvotingclassifier', PreFittedSoftVotingClassifier(classification_labels=None,
               estimators=[('LogisticRegression_35', Pipeline(memory=None,
     steps=[('standardscalerwrapper', <...666666666667, 0.26666666666666666, 0.06666666666666667, 0.06666666666666667, 0.13333333333333333]))])

In [18]:
dir(automl_run)

['DELIM',
 'EXPERIMENT_PATH',
 'PORTAL_URL',
 'RUN_PATH',
 'WORKSPACE_FMT',
 '_RUNSOURCE_PROPERTY',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_call_register',
 '_cleanup',
 '_client',
 '_container',
 '_context_manager',
 '_create',
 '_data_container_id',
 '_download_artifact_contents_to_string',
 '_dto_to_run',
 '_experiment',
 '_experiment_url',
 '_get_base_info_dict',
 '_heartbeat',
 '_identity',
 '_internal_run_dto',
 '_jasmine_client',
 '_kill',
 '_latest_status',
 '_load_scope',
 '_log_context',
 '_log_traceback',
 '_logger',
 '_outputs',
 '_register_kill_handler',
 '_registered_kill_handlers',
 '_rehydrate_runs',
 '_repr_html_',


In [19]:
automl_run.get_run_sdk_dependencies()

Iteration number is not passed, retrieve the environment for the parent run.
No issues found in the SDK package versions.


{'azureml-widgets': '1.0.21',
 'azureml-train': '1.0.21',
 'azureml-train-restclients-hyperdrive': '1.0.21',
 'azureml-train-core': '1.0.21',
 'azureml-train-automl': '1.0.21',
 'azureml-telemetry': '1.0.21',
 'azureml-sdk': '1.0.21',
 'azureml-pipeline': '1.0.21',
 'azureml-pipeline-steps': '1.0.21',
 'azureml-pipeline-core': '1.0.21.1',
 'azureml-dataprep': '1.0.17',
 'azureml-dataprep-native': '11.2.3',
 'azureml-core': '1.0.21',
 'azureml-contrib-notebook': '1.0.21.1'}

In [20]:
automl_run.get_properties()

{'num_iterations': '200',
 'training_type': 'TrainFull',
 'acquisition_function': 'EI',
 'primary_metric': 'accuracy',
 'train_split': '0',
 'MaxTimeSeconds': '0',
 'acquisition_parameter': '0',
 'num_cross_validation': None,
 'target': 'hypetuning',
 'DataPrepJsonString': None,
 'EnableSubsampling': 'False',
 'runTemplate': 'AutoML',
 'azureml.runsource': 'automl',
 'dependencies_versions': '{"azureml-widgets": "1.0.21", "azureml-train": "1.0.21", "azureml-train-restclients-hyperdrive": "1.0.21", "azureml-train-core": "1.0.21", "azureml-train-automl": "1.0.21", "azureml-telemetry": "1.0.21", "azureml-sdk": "1.0.21", "azureml-pipeline": "1.0.21", "azureml-pipeline-steps": "1.0.21", "azureml-pipeline-core": "1.0.21.1", "azureml-dataprep": "1.0.17", "azureml-dataprep-native": "11.2.3", "azureml-core": "1.0.21", "azureml-contrib-notebook": "1.0.21.1"}',
 'ContentSnapshotId': '99443c7d-67cc-4f1f-881c-38589a823756',
 'snapshotId': '99443c7d-67cc-4f1f-881c-38589a823756',
 'SetupRunId': 'Auto