Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# GenSen Deep Dive on AzureML
**Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning** [\[1\]](#References)

## What is sentence similarity?

Sentence similarity or semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are parahrase or duplicate identification.

## How to evaluate?

[SentEval](https://arxiv.org/abs/1803.05449) [\[2\]](#References) is an evaluation toolkit for evaluating sentence representations. It includes 17 downstream tasks, including common semantic textual similarity tasks. The semantic textual similarity (**STS**) benchmark tasks from 2012-2016 (STS12, STS13, STS14, STS15, STS16, STSB) measure the relatedness of two sentences based on the cosine similarity of the two representations. The evaluation criterion is Pearson correlation.

The SICK relatedness (**SICK-R**) task trains a linear model to output a score from 1 to 5 indicating the relatedness of two sentences. For the same dataset (**SICK-E**) can be treated as a three-class classification problem using the entailment labels (classes are ‘entailment’, ‘contradiction’, and ‘neutral’). The evaluation metric for SICK-R is Pearson correlation and classification accuracy for SICK-E.

The Microsoft Research Paraphrase Corpus [(**MRPC**)](https://www.microsoft.com/en-us/download/details.aspx?id=52398) corpus is a paraphrase identification dataset, where systems aim to identify if two sentences are paraphrases of each other. The evaluation metric is classification accuracy and F1.

## What is GenSen?

GenSen is a technique to learn general purpose, fixed-length representations of sentences via multi-task training. GenSen model is to combine the benefits of diverse sentence-representation learning objectives into a single multi-task framework. This is the first large-scale reusable sentence representation model obtained by combining a set of training objectives with the level of diversity explored here, i.e. multi-lingual NMT, natural language inference, constituency parsing and skip-thought vectors. These representations are useful for transfer and low-resource learning. GenSen is trained on several data sources with multiple training objectives on over 100 milion sentences.

The GenSen model is most similar to that of Luong et al. (2015) [\[4\]](#References), who train a many-to-many **sequence-to-sequence** model on a diverse set of weakly ralated tasks that includes machine translation, constituency parsing, image captioning, sequence autoencoding, and intra-sentence skip-thoughts. However, there are two key differences. GenSen uses an attention mechanism preventing learning a fixed-length vector representation for a sentence and it aims for learning re-usable sentence representations that transfers elsewhere, as opposed to Luong's work aims for improvements on the same tasks on which the model is trained.

### Sequence to Sequence Learning

![Sequence to sequence learning examples - (left) machine translation and (right) constituent parsing](https://nlpbp.blob.core.windows.net/images/seq2seq.png)**Sequence to sequence learning examples - (left) machine translation and (right) constituent parsing**

Sequence to sequence learning (*seq2seq*) aims to directly model the conditional probability $p(x|y)$ of mapping an input sequence, $x_1,...,x_n$, into an output sequence, $y_1,...,y_m$. It accomplishes such goal through the *encoder-decoder* framework. As illustrated in the above figure, the encoder computes a representation $s$ for each input sequence. Based on that input representation, the *decoder* generates an ouput sequence, one unit at a time, and hence, decomposes the conditional probability as:

$$
\log p(y|x)=\sum_{j=1}^{m} \log p(y_i|y_{<j}, x, s)
$$

## Why GenSen?

GenSen model performs the state-of-the-art results on multiple datasets, such as MRPC, SICK-R, SICK-E and STS, for sentence similarity. The reported results are as follows compared with other models [\[3\]](#References):

| Model | MRPC | SICK-R | SICK-E | STS |
| --- | --- | --- | --- | --- |
| GenSen (Subramanian et al., 2018) | 78.6/84.4 | 0.888 | 87.8 | 78.9/78.6 |
| [InferSent](https://arxiv.org/abs/1705.02364) (Conneau et al., 2017) | 76.2/83.1 | 0.884 | 86.3 | 75.8/75.5 |
| [TF-KLD](https://www.aclweb.org/anthology/D13-1090) (Ji and Eisenstein, 2013) | 80.4/85.9 | - | - | - |

This notebook serves as an introduction to an end-to-end NLP solution for sentence similarity building one of the advanced models - GenSen on AzureML platform. We show the advantages of AzureML when training large NLP models with GPU.

Regarding **AzureML**, please refer to:
* [Quickstart notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
* [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)

# Table of Contents
0. [Global Settings](#0-Global-Settings)
1. [Data Loading and Preprocessing](#1-Data-Loading-and-Preprocessing)    
    * 1.1. [Load SNLI Dataset](#1.1-Load-SNLI-Dataset)  
    * 1.2. [Tokenize](#1.2-Tokenize)  
    * 1.3. [Preprocess for GenSen Model](#1.3-Preprocess-for-GenSen-Model)  
    * 1.4. [Upload to Azure Blob Storage](#1.4-Upload-to-Azure-Blob-Storage)  
2. [Train GenSen Model with Distributed Pytorch with Horovod on AzureML](#2-Train-GenSen-Model-with-Distributed-Pytorch-with-Horovod-on-AzureML)  
    * 2.1. [Initialization](#2.1-Initialization) 
        * 2.1.1 [Initialize Workspace](#2.1.1-Initialize-Workspace)  
        * 2.1.2 [Create or Attach Existing AmlCompute](#2.1.2-Create-or-Attach-Existing-AmlCompute)  
    * 2.2. [Settings for GenSen](#2.2-Settings-for-GenSen)  
        * 2.2.1 [Access to a Project Directory](#2.2.1-Access-to-a-Project-Directory)  
        * 2.2.2 [Access to Datastore](#2.2.2-Access-to-Datastore)  
    * 2.3. [Train Model on the Remote Compute](#2.3-Train-Model-on-the-Remote-Compute)  
        * 2.3.1 [Prepare Training Script](#2.3.1-Prepare-Training-Script)  
        * 2.3.2 [Create an Experiment](#2.3.2-Create-an-Experiment)
        * 2.3.3 [Create a PyTorch Estimator](#2.3.3-Create-a-PyTorch-Estimator)
        * 2.3.4 [Submit or Cancel a job](#2.3.4-Submit-or-Cancel-a-job)
        * 2.3.5 [Monitor your run](#2.3.5-Monitor-your-run)
3. [Tune Model Hyperparameters](#3-Tune-Model-Hyperparameters)
    * 3.1 [Start a Hyperparameter Sweep](#3.1-Start-a-Hyperparameter-Sweep)
    * 3.2 [Monitor HyperDrive runs](#3.2-Monitor-HyperDrive-runs)
- [References](#References)

# 0 Global Settings

* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`

In [18]:
# set the environment path to find NLP
import sys
sys.path.append("../../")
import time
import os
# import papermill as pm
import pandas as pd
import shutil

import azureml as aml
import azureml.train.hyperdrive as hd

from azureml.telemetry import set_diagnostics_collection
from utils_nlp.dataset.preprocess import to_lowercase, to_nltk_tokens
from utils_nlp.dataset import snli
from utils_nlp.azureml import azureml_utils
from utils_nlp.model.gensen.gensen_utils import gensen_preprocess

print("System version: {}".format(sys.version))
print("Azure ML SDK Version:", aml.core.VERSION)
print("Pandas version: {}".format(pd.__version__))

System version: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Azure ML SDK Version: 1.0.33
Pandas version: 0.24.2


Opt-in diagnostics for better experience, quality, and security of future releases.

In [3]:
set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


# 1 Data Loading and Preprocessing

In this section, we will
1. Download the dataset and load the dataset.
2. Tokenize and reshape the dataset for Gensen.
3. Upload the training set to the default blob storage of the workspace.

We use the [SNLI](https://nlp.stanford.edu/projects/snli/) dataset in this example. For a more detailed walkthrough about data processing jump to [SNLI Data Prep](../data-prep/snli.ipynb)

**Set the data folder path.**

In [42]:
BASE_DATA_PATH = '../../data'

## 1.1 Load SNLI Dataset
We provide a function `load_pandas_df` which
* Downloads the SNLI zipfile at the specified directory location
* Extracts the file based on the specified split
* Loads the split as a pandas dataframe

In [7]:
# defaults to txt
train = snli.load_pandas_df(BASE_DATA_PATH, file_split="train")

#load dataframe from jsonl file format
dev = snli.load_pandas_df(BASE_DATA_PATH, file_split="dev")

#specify txt format 
test = snli.load_pandas_df(BASE_DATA_PATH, file_split="test")

train.head()

Unnamed: 0,gold_label,sentence1_binary_parse,sentence2_binary_parse,sentence1_parse,sentence2_parse,sentence1,sentence2,captionID,pairID,label1,label2,label3,label4,label5
0,neutral,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,( ( A person ) ( ( is ( ( training ( his horse...,(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,A person is training his horse for a competition.,3416050480.jpg#4,3416050480.jpg#4r1n,neutral,,,,
1,contradiction,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,( ( A person ) ( ( ( ( is ( at ( a diner ) ) )...,(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,"A person is at a diner, ordering an omelette.",3416050480.jpg#4,3416050480.jpg#4r1c,contradiction,,,,
2,entailment,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,"( ( A person ) ( ( ( ( is outdoors ) , ) ( on ...",(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,"A person is outdoors, on a horse.",3416050480.jpg#4,3416050480.jpg#4r1e,entailment,,,,
3,neutral,( Children ( ( ( smiling and ) waving ) ( at c...,( They ( are ( smiling ( at ( their parents ) ...,(ROOT (NP (S (NP (NNP Children)) (VP (VBG smil...,(ROOT (S (NP (PRP They)) (VP (VBP are) (VP (VB...,Children smiling and waving at camera,They are smiling at their parents,2267923837.jpg#2,2267923837.jpg#2r1n,neutral,,,,
4,entailment,( Children ( ( ( smiling and ) waving ) ( at c...,( There ( ( are children ) present ) ),(ROOT (NP (S (NP (NNP Children)) (VP (VBG smil...,(ROOT (S (NP (EX There)) (VP (VBP are) (NP (NN...,Children smiling and waving at camera,There are children present,2267923837.jpg#2,2267923837.jpg#2r1e,entailment,,,,


## 1.2 Tokenize
Now that we've loaded the data into a pandas.DataFrame, we can tokenize the sentences.
We also clean the data before tokenizing. This includes dropping unneccessary columns and renaming the relevant columns as score, sentence_1, and sentence_2.

In [8]:
def clean(df, file_split):
    src_file_path = os.path.join(BASE_DATA_PATH, "raw/snli_1.0/snli_1.0_{}.txt".format(file_split))
    if not os.path.exists(os.path.join(BASE_DATA_PATH, "clean/snli_1.0")):
        os.makedirs(os.path.join(BASE_DATA_PATH, "clean/snli_1.0"))
    dest_file_path = os.path.join(BASE_DATA_PATH, "clean/snli_1.0/snli_1.0_{}.txt".format(file_split))
    clean_df = snli.clean_snli(src_file_path).dropna() # drop rows with any NaN vals
    clean_df.to_csv(dest_file_path)
    return clean_df

train = clean(train, 'train')
dev = clean(dev, 'dev')
test = clean(test, 'test')

train.head()

Once we have the clean pandas dataframes, we do lowercase standardization and tokenization. We use the [NLTK](https://www.nltk.org/) library for tokenization.

In [5]:
train_tok = to_nltk_tokens(to_lowercase(train))
dev_tok = to_nltk_tokens(to_lowercase(dev))
test_tok = to_nltk_tokens(to_lowercase(test))

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\lishao\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\lishao\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\lishao\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### 1.3 Preprocess for GenSen Model
We need to prepare our data in a specific way in order for the Gensen model to be able to ingest it. We do this by
* Saving the tokens for each split in a `snli_1.0_{split}.txt.clean` file, with the sentence pairs and scores tab-separated and the tokens separated by a single space. Since some of the samples have invalid scores ("-"), we filter those out and save them separately in a `snli_1.0_{split}.txt.clean.noblank` file.
* Saving the tokenized sentence and labels separately, in the form `snli_1.0_{split}.txt.s1.tok` or `snli_1.0_{split}.txt.s2.tok` or `snli_1.0_{split}.txt.lab`.

In [6]:
gensen_preprocess(train_tok, dev_tok, test_tok, os.path.abspath(BASE_DATA_PATH))

C:\Users\lishao\Project\Rotation2\NLP\data\clean/snli_1.0/snli_1.0_train.txt
C:\Users\lishao\Project\Rotation2\NLP\data\clean/snli_1.0/snli_1.0_dev.txt
C:\Users\lishao\Project\Rotation2\NLP\data\clean/snli_1.0/snli_1.0_test.txt


'C:\\Users\\lishao\\Project\\Rotation2\\NLP\\data\\clean/snli_1.0'

## 1.4 Upload to Azure Blob Storage
We make the data accessible remotely by uploading that data from your local machine into Azure. Then it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload or download data. You can also interact with it from your remote compute targets. It's backed by an Azure Blob storage account.

**Note: If you already has all the files under `clean/snli_1.0/` in your default datastorage, you DO NOT need to redo this section.**

In [47]:
data_folder = os.path.join(BASE_DATA_PATH, "clean/snli_1.0/")
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name, data_folder)

ds.upload(src_dir=data_folder, target_path='data/preprocessed', overwrite=True, show_progress=True)

# 2 Train GenSen Model with Distributed Pytorch with Horovod on AzureML
In this tutorial, you will train a GenSen model with PyTorch on AML using distributed training across a GPU cluster. This could also be a generic guideline to train models using GPU cluster.

Once you've created your workspace and set up your development environment, training a model in Azure Machine Learning involves the following steps:
1. Create a remote compute target (note you can also use local computer as compute target)
2. Prepare your training data and upload it to datastore
3. Create your training script
4. Create an Estimator object
5. Submit the estimator to an experiment object under the workspace

## 2.1 Initialization
In this section, we will initialize workspace and create a AmlCompute for training.

### 2.1.1 Initialize Workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. For instructions on how to do this, see [here](README.md). `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [19]:
ws = azureml_utils.get_or_create_workspace(
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group="<RESOURCE_GROUP>",
    workspace_name="<WORKSPACE_NAME>",
    workspace_region="<WORKSPACE_REGION>"
)
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Performing interactive authentication. Please follow the instructions on the terminal.




Interactive authentication successfully completed.
Workspace name: MAIDAPTest
Azure region: eastus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: nlprg


### 2.1.2 Create or Attach Existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.

**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

**Use Standard_NC6 for now.**

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current AmlCompute. 
print(compute_target.get_status().serialize())

Found existing compute target.
{'currentNodeCount': 4, 'targetNodeCount': 4, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 4, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-05-31T21:24:32.828000+00:00', 'errors': None, 'creationTime': '2019-05-20T22:09:40.142683+00:00', 'modifiedTime': '2019-05-20T22:10:11.888950+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


## 2.2 Settings for GenSen
In this section, we set the GenSen code folder and data folder for training.

### 2.2.1 Access to a Project Directory
Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

`project_folder` contains all the code you want to submit to AmlCompute to run. The size of the folder can not exceed 300Mb. In GenSen model, it loads large pre-trained embedding files to the model. Thus, we need to save large files in datastore and only uploads code to `project_folder`.

In [3]:
import os

# Change the path to where your model code locates.

project_folder = '../../utils_nlp/model/gensen/'
os.makedirs(project_folder, exist_ok=True)

### 2.2.2 Access to Datastore
To download some of the data required to train a GenSen model, run the bash file [here](https://github.com/Maluuba/gensen/blob/master/get_data.sh). Make sure to upload all the large files to azure file share. You can access to datastore by using `ds.as_mount()`.

**Note: To download data required to train a GenSen model in the original paper, run code [here](https://github.com/Maluuba/gensen/blob/master/get_data.sh). By training on the original datasets (training time around 20 hours), it will reproduce the results in the [paper](https://arxiv.org/abs/1804.00079). For simplicity, we will train on a smaller dataset, which is SNLI preprocessed in [1 Data Loading and Preprocessing](#1-Data-Loading-and-Preprocessing) for showcasing the example.**

In [4]:
from azureml.core import Datastore
ds = Datastore.register_azure_file_share(workspace=ws,
                                        datastore_name= 'GenSen',
                                        file_share_name='azureml-filestore-792de9d4-7d0a-464c-b40a-58584f23f5ec',
                                        account_name='maidaptest3334372853',
                                        account_key='p0qz3rO4YWDeRRyhU+aQycW8kD2vvF061OyURSLwwQxkfQmhfch48tC+kFzBdZlJPDR/Jk8JoFxSLxKbUaZ1lQ==')

**Prerequisites:**

Upload the all the files under `data_folder` in [1.4 Upload to Azure Blob Storage](#1.4-Upload-to-Azure-Blob-Storage) to the path `./data/processed/` on the above datastore.

In [5]:
ds.as_mount()

$AZUREML_DATAREFERENCE_gensen

## 2.3 Train model on the Remote Compute
Now that we have the AmlCompute ready to go, let's run our distributed training job.

### 2.3.1 Prepare Training Script
Now you will need to create your training script. In this tutorial, the script for distributed training of GENSEN is already provided for you at `train.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.

However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.

To do so, in `train.py`, we will first access the Azure ML `Run` object within the script:
```Python
from azureml.core.run import Run
run = Run.get_context()
```
Later within the script, we log the loss metric to our run:
```Python
run.log('loss', loss.item())
```

### 2.3.2 Create an Experiment
Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial.

In [5]:
from azureml.core import Experiment, get_run

experiment_name = 'pytorch-gensen'
experiment = Experiment(ws, name=experiment_name)


### 2.3.3 Create a PyTorch Estimator
The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch).

`sample_config.json` defines all the hyper parameters and paths when training GenSen model. The trained model will be saved in `data/models/example` to Azure Blob Storage.

In [6]:
from azureml.train.dnn import PyTorch
from azureml.train.estimator import Estimator

script_params = {
    '--config': 'sample_config.json',
    '--data_folder': ds.as_mount()}

estimator = PyTorch(source_directory=project_folder,
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='train.py',
                    node_count=4,
                    process_count_per_node=1,
                    distributed_backend='mpi',
                    use_gpu=True,
                    conda_packages=['scikit-learn=0.20.3']
                   )


The above code specifies that we will run our training script on `4` nodes, with one worker per node. In order to execute a distributed run using GPU, you must provide the argument `use_gpu=True`. To execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. If you are the first time to create a experiment, it may take longer to set up conda environments under `.azureml/conda_dependencies.yml`. After the first run, it will use the existing conda environments and directly run the code. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters. The more required packages are stored in `.azureml/conda_dependencies.yml` file.

**Requirements:**
- python=3.6.2
- numpy=1.15.1
- numpy-base=1.15.1
- pip=10.0.1
- python=3.6.6
- python-dateutil=2.7.3
- scikit-learn=0.20.3
- azureml-defaults
- h5py
- nltk

### 2.3.4 Submit or Cancel a job
Run your experiment by submitting your estimator object. Note that this call is asynchronous.

In [13]:
run = experiment.submit(estimator)
print(run)

Run(Experiment: pytorch-gensen,
Id: pytorch-gensen_1559577451_8b3c6f42,
Type: azureml.scriptrun,
Status: Queued)


**Cancel the job**

It's better to cancel the job manually to make sure you does not waste resources.

In [12]:
# Cancel the job with id.
# job_id = "pytorch-gensen_1555533596_d9cc75fe"
# run = get_run(experiment, job_id)

# Cancel jobs.
run.cancel()

### 2.3.5 Monitor your run
You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run.

In [14]:
from azureml.widgets import RunDetails

RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 's…

Alternatively, you can block until the script has completed training before running more code.

In [37]:
run.wait_for_completion(show_output=True) # this provides a verbose log

RunId: pytorch-gensen_1559153095_0e7f4645

Streaming azureml-logs/80_driver_log_rank_0.txt

Building vocabulary ...
Building common source vocab ...
Found existing vocab file. Reloading ...
Building target vocabs ...
Found existing vocab file. Reloading ...
Reloading vocab for snli 
Fetching sentences ...
Processing corpus : 0 task snli 
Reached end of dataset, reseting file pointer ...
Fetching sentences ...
Processing corpus : 0 task snli 
Fetched 1000000 sentences
Fetched 1000000 sentences
2019-05-29 18:05:35,740 - INFO - Finished creating iterator ...
2019-05-29 18:05:35,747 - INFO - Found 19966 words in source : 
2019-05-29 18:05:35,753 - INFO - Found 30004 target words in task snli 
2019-05-29 18:05:35,758 - INFO - Model Parameters : 
2019-05-29 18:05:35,763 - INFO - Task : multi-seq2seq-nli 
2019-05-29 18:05:35,768 - INFO - Source Word Embedding Dim  : 512
2019-05-29 18:05:35,772 - INFO - Target Word Embedding Dim  : 512
2019-05-29 18:05:35,777 - INFO - Source RNN Hidden Dim  : 

2019-05-29 18:27:25,602 - INFO - Seq2Seq Examples Processed : 96000 snli Loss : 3.44516 Num snli minibatches : 180
2019-05-29 18:27:25,708 - INFO - Round: 2000 NLI Epoch : 0 NLI Examples Processed : 9648 NLI Loss : 0.99616
2019-05-29 18:27:25,714 - INFO - Average time per mininbatch : 0.57163
2019-05-29 18:27:25,728 - INFO - ******************************************************
2019-05-29 18:29:17,873 - INFO - Seq2Seq Examples Processed : 105600 snli Loss : 3.59119 Num snli minibatches : 180
2019-05-29 18:29:17,894 - INFO - Round: 2200 NLI Epoch : 0 NLI Examples Processed : 10608 NLI Loss : 0.99196
2019-05-29 18:29:17,899 - INFO - Average time per mininbatch : 0.56068
2019-05-29 18:29:17,951 - INFO - ******************************************************
2019-05-29 18:31:09,519 - INFO - Seq2Seq Examples Processed : 115200 snli Loss : 3.35993 Num snli minibatches : 180
2019-05-29 18:31:09,529 - INFO - Round: 2400 NLI Epoch : 0 NLI Examples Processed : 11568 NLI Loss : 0.98847
2019-05-2

2019-05-29 19:06:32,652 - INFO - Seq2Seq Examples Processed : 307200 snli Loss : 3.07575 Num snli minibatches : 180
2019-05-29 19:06:32,708 - INFO - Round: 6400 NLI Epoch : 0 NLI Examples Processed : 30768 NLI Loss : 0.78086
2019-05-29 19:06:32,715 - INFO - Average time per mininbatch : 0.50191
2019-05-29 19:06:32,722 - INFO - ******************************************************
2019-05-29 19:08:12,195 - INFO - Seq2Seq Examples Processed : 316800 snli Loss : 3.01557 Num snli minibatches : 180
2019-05-29 19:08:12,203 - INFO - Round: 6600 NLI Epoch : 0 NLI Examples Processed : 31728 NLI Loss : 0.85198
2019-05-29 19:08:12,225 - INFO - Average time per mininbatch : 0.49733
2019-05-29 19:08:12,255 - INFO - ******************************************************
2019-05-29 19:09:50,561 - INFO - Seq2Seq Examples Processed : 326400 snli Loss : 3.01893 Num snli minibatches : 180
2019-05-29 19:09:50,631 - INFO - Round: 6800 NLI Epoch : 0 NLI Examples Processed : 32688 NLI Loss : 0.79130
2019-05

2019-05-29 19:42:10,143 - INFO - Seq2Seq Examples Processed : 518400 snli Loss : 2.81220 Num snli minibatches : 180
2019-05-29 19:42:10,181 - INFO - Round: 10800 NLI Epoch : 0 NLI Examples Processed : 51888 NLI Loss : 0.80043
2019-05-29 19:42:10,186 - INFO - Average time per mininbatch : 0.47473
2019-05-29 19:42:10,191 - INFO - ******************************************************
2019-05-29 19:43:45,532 - INFO - Seq2Seq Examples Processed : 528000 snli Loss : 2.79347 Num snli minibatches : 180
2019-05-29 19:43:45,556 - INFO - Round: 11000 NLI Epoch : 0 NLI Examples Processed : 52848 NLI Loss : 0.76758
2019-05-29 19:43:45,561 - INFO - Average time per mininbatch : 0.47667
2019-05-29 19:43:45,566 - INFO - ******************************************************
2019-05-29 19:45:19,931 - INFO - Seq2Seq Examples Processed : 537600 snli Loss : 2.77058 Num snli minibatches : 180
2019-05-29 19:45:19,940 - INFO - Round: 11200 NLI Epoch : 0 NLI Examples Processed : 53808 NLI Loss : 0.75474
2019

{'runId': 'pytorch-gensen_1559153095_0e7f4645',
 'target': 'gpucluster',
 'status': 'CancelRequested',
 'startTimeUtc': '2019-05-29T18:05:02.390551Z',
 'properties': {'azureml.runsource': 'experiment',
  'AzureML.DerivedImageName': 'azureml/azureml_f6cd7804b6a4e89cea33d34d8659fed9',
  'ContentSnapshotId': 'f0eb2538-559b-4051-9d66-5a6a79570c3d',
  'azureml.git.repository_uri': 'https://github.com/Microsoft/NLP.git',
  'azureml.git.branch': 'liqun-first-pull',
  'azureml.git.commit': 'ba716d109a6db89aa94d95255afe7f972a97f0b8',
  'azureml.git.dirty': 'True',
  'azureml.git.build_id': None,
  'azureml.git.build_uri': None,
  'mlflow.source.git.branch': 'liqun-first-pull',
  'mlflow.source.git.commit': 'ba716d109a6db89aa94d95255afe7f972a97f0b8',
  'mlflow.source.git.repoURL': 'https://github.com/Microsoft/NLP.git'},
 'runDefinition': {'script': 'train.py',
  'arguments': ['--config',
   'sample_config.json',
   '--data_folder',
   '$AZUREML_DATAREFERENCE_gensen'],
  'sourceDirectoryDataStor

## 3 Tune Model Hyperparameters
Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities.

## 3.1 Start a Hyperparameter Sweep
First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate parameter. In this example we will use random sampling to try different configuration sets of hyperparameters to minimize our primary metric, the best validation accuracy (`best_val_loss`).

Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_loss` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).
Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available.

In [None]:
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveRunConfig, uniform, PrimaryMetricGoal

param_sampling = RandomParameterSampling( {
        'learning_rate': uniform(0.0001, 0.001)
    }
)

early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)

hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,
                                            hyperparameter_sampling=param_sampling, 
                                            policy=early_termination_policy,
                                            primary_metric_name='best_val_loss',
                                            primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                                            max_total_runs=8,
                                            max_concurrent_runs=4)

Finally, lauch the hyperparameter tuning job.

In [None]:
# start the HyperDrive run
hyperdrive_run = experiment.submit(hyperdrive_run_config)

## 3.2 Monitor HyperDrive runs
You can monitor the progress of the runs with the following Jupyter widget. 

In [None]:
from azureml.widgets import RunDetails

RunDetails(hyperdrive_run).show()

**Cancel the hyper drive run to save the resources**

In [None]:
hyperdrive_run.cancel()

## References

1. Subramanian, Sandeep and Trischler, Adam and Bengio, Yoshua and Pal, Christopher J, [*Learning general purpose distributed sentence representations via large scale multi-task learning*](https://arxiv.org/abs/1804.00079), ICLR, 2018.
2. A. Conneau, D. Kiela, [*SentEval: An Evaluation Toolkit for Universal Sentence Representations*](https://arxiv.org/abs/1803.05449).
3. Semantic textual similarity. url: http://nlpprogress.com/english/semantic_textual_similarity.html
4. Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. [*Multi-task sequence to sequence learning*](https://arxiv.org/abs/1511.06114), 2015.