<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Evaluation on GenSen model by SentEval

SentEval is the evaluation toolkit for sentence embeddings. SentEval is a library for evaluating the quality of sentence embeddings. It is used to assess their generalization power by using them as features on a broad and diverse set of "transfer" tasks. SentEval currently includes 17 downstream tasks.

This notebook will show you how to run SentEval and evaluate trained GenSen model locally. We used the [SentEval](https://github.com/facebookresearch/SentEval) toolkit to run most of our transfer learning experiments. To replicate these numbers, clone their repository and follow setup instructions. Once complete, copy this notebook and `gensen.py` into their examples folder and run the following commands to reproduce different rows in Table 2 of our paper. Note: Please set the path to the pretrained glove embeddings (`glove.840B.300d.h5`) and model folder as appropriate.

## 0 Global settings

Most of the functions used in the notebook can be found in the `gensen.py` file. Set the `PATH_SENTEVAL` as SentEval Data path and `PATH_TO_DATA` as model data path, which you should put your trained model here: pre-trained models under `embedding/` and trained models under `models/`.

### Initialize workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [1]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: MAIDAPNLP
Azure region: eastus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: nlprg


### Access to Datastore

We can access and upload senteval data and models to datastore blob.

In [3]:
from azureml.core import Datastore
ds = Datastore.register_azure_file_share(workspace=ws,
                                        datastore_name= 'GenSen',
                                        file_share_name='azureml-filestore-09b72610-7938-4ed2-86a2-5004896b12d9',
                                        account_name='maidapnlp0056795534',
                                        account_key='8LtGFZErNlvI6fSrgODqCxJCckkVgq3AL/5S/8ma7Re7xUHgWrNRCfTFnP/QDhF7KDY6ScAORsUpSm7ziog5/Q==')

ds.as_mount()
# Upload files from local.
# ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)

$AZUREML_DATAREFERENCE_gensen

### Set up parameters

In [15]:
from __future__ import absolute_import, division, unicode_literals

import sys
sys.path.append('.')
import torch
import logging

import argparse
import os
from gensen import GenSen, GenSenSingle

# Set SentEval Data Path.
# Set data path after running bash file to get all the transfer tasks datasets:
# https://github.com/facebookresearch/SentEval/blob/master/data/downstream/get_transfer_data.bash
#PATH_SENTEVAL = os.path.join(os.environ['AZUREML_DATAREFERENCE_gensen'], '/senteval/')
PATH_SENTEVAL = ds.path('/senteval/')
# Path to model data.
# PATH_TO_DATA = os.path.join(os.environ['AZUREML_DATAREFERENCE_gensen'], '/data/')
PATH_TO_DATA = ds.path('/data/')

# Set the senteval code folder.
sys.path.insert(0, '../')
import senteval

# set gpu device
torch.cuda.set_device(0)

print("System version: {}".format(sys.version))
print("Torch version: {}".format(torch.__version__))
print("Senteval data path: {}".format(PATH_SENTEVAL))
print("Trained model path: {}".format(PATH_TO_DATA))

System version: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Torch version: 1.0.1
Senteval data path: $AZUREML_DATAREFERENCE_252b942da6a34c2cb9b0982ec4fbade7
Trained model path: $AZUREML_DATAREFERENCE_cefa1446b5e34ae79832eb6a65846b5b


## 1 Use SentEval
To evaluate your sentence embeddings, SentEval requires that you implement two functions:

1. **prepare** (sees the whole dataset of each task and can thus construct the word vocabulary, the dictionary of word vectors etc)
2. **batcher** (transforms a batch of text sentences into sentence embeddings)

### 1.) prepare(params, samples) (optional)

*batcher* only sees one batch at a time while the *samples* argument of *prepare* contains all the sentences of a task.

```
prepare(params, samples)
```
* *params*: senteval parameters.
* *samples*: list of all sentences from the tranfer task.
* *output*: No output. Arguments stored in "params" can further be used by *batcher*.

### 2.) batcher(params, batch)
```
batcher(params, batch)
```
* *params*: senteval parameters.
* *batch*: numpy array of text sentences (of size params.batch_size)
* *output*: numpy array of sentence embeddings (of size params.batch_size)

### 1.1 Prepare function

In [16]:
def prepare(params, samples):
    print('Preparing task : %s ' % (params.current_task))
    vocab = set()
    for sample in samples:
        if params.current_task != 'TREC':
            sample = ' '.join(sample).lower().split()
        else:
            sample = ' '.join(sample).split()
        for word in sample:
            if word not in vocab:
                vocab.add(word)

    vocab.add('<s>')
    vocab.add('<pad>')
    vocab.add('<unk>')
    vocab.add('</s>')
    # If you want to turn off vocab expansion just comment out the below line.
    params['gensen'].vocab_expansion(vocab)

### 1.2 Batcher function

In [17]:
def get_batcher(local_strategy):
    
    def batcher(params, batch):
        # batch contains list of words
        max_tasks = ['MR', 'CR', 'SUBJ', 'MPQA', 'ImageCaptionRetrieval']
        if local_strategy == 'best':
            if params.current_task in max_tasks:
                strategy = 'max'
            else:
                strategy = 'last'
        else:
            strategy = local_strategy

        sentences = [' '.join(s).lower() for s in batch]
        _, embeddings = params['gensen'].get_representation(
            sentences, pool=strategy, return_numpy=True
        )
        return embeddings
    
    return batcher

## 2 Evaluation of GenSen trained model on Transfter Tasks (SentEval)

### 2.1 Parameters for SentEval

The current list of available tasks is:
```python
['CR', 'MR', 'MPQA', 'SUBJ', 'SST2', 'SST5', 'TREC', 'MRPC', 'SNLI',
'SICKEntailment', 'SICKRelatedness', 'STSBenchmark', 'ImageCaptionRetrieval',
'STS12', 'STS13', 'STS14', 'STS15', 'STS16',
'Length', 'WordContent', 'Depth', 'TopConstituents','BigramShift', 'Tense',
'SubjNumber', 'ObjNumber', 'OddManOut', 'CoordinationInversion']
```
Users can chose the subset of above tasks.

1) to perform the actual evaluation, first import senteval and set its parameters:
```python
import senteval
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
```

2) (optional) set the parameters of the classifier (when applicable):
```python
params['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 64,
                                 'tenacity': 5, 'epoch_size': 4}
```
You can choose **nhid=0** (Logistic Regression) or **nhid>0** (MLP) and define the parameters for training.

In [18]:
# define transfer tasks
transfer_tasks = ['MR', 'CR', 'SUBJ', 'MPQA', 'SST2', 'SST5', 'TREC', 'SICKRelatedness',\
                  'SICKEntailment', 'MRPC', 'STS14', 'STSBenchmark', 'STS12', 'STS13', 'STS15', 'STS16']
params_senteval = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
params_senteval['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 64,
                                 'tenacity': 5, 'epoch_size': 4}

# Set up logger
logging.basicConfig(format='%(asctime)s : %(message)s', level=logging.INFO)

### 2.2 Create an instance of the class SE:

```python
se = senteval.engine.SE(params, batcher, prepare)
```

In [19]:
# All the parameters by default.
folder_path = './data/models'
prefix_1 = 'nli_large_bothskip_parse'
prefix_2 = 'nli_large_bothskip'
pretrain = './data/embedding/glove.840B.300d.h5'
strategy = 'best'
cuda = torch.cuda.is_available()

In [20]:
def gensen_eval(folder_path, prefix_1, prefix_2, pretrain, cuda, strategy):
    gensen_1 = GenSenSingle(
        model_folder=folder_path,
        filename_prefix=prefix_1,
        pretrained_emb=pretrain,
        cuda=cuda
    )
    gensen_2 = GenSenSingle(
        model_folder=folder_path,
        filename_prefix=prefix_2,
        pretrained_emb=pretrain,
        cuda=cuda
    )
    gensen = GenSen(gensen_1, gensen_2)
    params_senteval['gensen'] = gensen
    se = senteval.engine.SE(params_senteval, get_batcher(strategy), prepare)
    results_transfer = se.eval(transfer_tasks)

    print('--------------------------------------------')
    print('Table 2 of Our Paper : ')
    print('--------------------------------------------')
    print('MR                [Dev:%.1f/Test:%.1f]' % (results_transfer['MR']['devacc'], results_transfer['MR']['acc']))
    print('CR                [Dev:%.1f/Test:%.1f]' % (results_transfer['CR']['devacc'], results_transfer['CR']['acc']))
    print('SUBJ              [Dev:%.1f/Test:%.1f]' % (results_transfer['SUBJ']['devacc'], results_transfer['SUBJ']['acc']))
    print('MPQA              [Dev:%.1f/Test:%.1f]' % (results_transfer['MPQA']['devacc'], results_transfer['MPQA']['acc']))
    print('SST2              [Dev:%.1f/Test:%.1f]' % (results_transfer['SST2']['devacc'], results_transfer['SST2']['acc']))
    print('SST5              [Dev:%.1f/Test:%.1f]' % (results_transfer['SST5']['devacc'], results_transfer['SST5']['acc']))
    print('TREC              [Dev:%.1f/Test:%.1f]' % (results_transfer['TREC']['devacc'], results_transfer['TREC']['acc']))
    print('MRPC              [Dev:%.1f/TestAcc:%.1f/TestF1:%.1f]' % (results_transfer['MRPC']['devacc'], results_transfer['MRPC']['acc'], results_transfer['MRPC']['f1']))
    print('SICKRelatedness   [Dev:%.3f/Test:%.3f]' % (results_transfer['SICKRelatedness']['devpearson'], results_transfer['SICKRelatedness']['pearson']))
    print('SICKEntailment    [Dev:%.1f/Test:%.1f]' % (results_transfer['SICKEntailment']['devacc'], results_transfer['SICKEntailment']['acc']))
    print('STS12             [Pearson:%.3f/Spearman:%.3f]' % (results_transfer['STS12']['all']['pearson']['mean'], results_transfer['STS12']['all']['spearman']['mean']))
    print('STS13             [Pearson:%.3f/Spearman:%.3f]' % (results_transfer['STS13']['all']['pearson']['mean'], results_transfer['STS13']['all']['spearman']['mean']))
    print('STS14             [Pearson:%.3f/Spearman:%.3f]' % (results_transfer['STS14']['all']['pearson']['mean'], results_transfer['STS14']['all']['spearman']['mean']))
    print('STS15             [Pearson:%.3f/Spearman:%.3f]' % (results_transfer['STS15']['all']['pearson']['mean'], results_transfer['STS15']['all']['spearman']['mean']))
    print('STS16             [Pearson:%.3f/Spearman:%.3f]' % (results_transfer['STS16']['all']['pearson']['mean'], results_transfer['STS16']['all']['spearman']['mean']))
    print('STSBenchmark      [Dev:%.5f/Pearson:%.5f/Spearman:%.5f]' % (results_transfer['STSBenchmark']['devpearson'], results_transfer['STSBenchmark']['pearson'], results_transfer['STSBenchmark']['spearman']))
    print('--------------------------------------------')


### 2.3 Results from SentEval

In [21]:

gensen_eval(folder_path, prefix_1, prefix_2, pretrain, cuda, strategy)


TypeError: unsupported operand type(s) for +: 'DataReference' and 'str'

## References

1. [1] A. Conneau, D. Kiela, [*SentEval: An Evaluation Toolkit for Universal Sentence Representations*](https://arxiv.org/abs/1803.05449).