# SentEval with AzureML
SentEval is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)

This notebook shows how to use SentEval with the AzureML SDK. Running SentEval locally is easy, but not necessarily efficient depending on the model specs. For example, it can quickly become expensive if you are trying to benchmark a model that runs on GPU, even if you are starting with pretrained weights (loading the embeddings and vocabulary for inferencing can take a nontrivial amount of time). In this example we show how to run SentEval for Gensen, where
- the model weights are on AzureML Datastore
- the pretrained embeddings are on AzureML Datastore
- the data for the SentEval transfer tasks are on AzureML Datastore
- evaluation runs on the AzureML Workspace GPU Compute Target (no extra provisioning/config needed)

### Global Settings

In [None]:
import os

AZUREML_VERBOSE = False

src_dir = "./senteval-pytorch-gensen"
os.makedirs(src_dir, exist_ok=True)

PATH_TO_GENSEN = (
    "../../../gensen"
)  # Set this path to where you have cloned the gensen source code
PATH_TO_SENTEVAL = (
    "../../../SentEval"
)  # Set this path to where you have cloned the senteval source code

cluster_name = "eval-gpu"
ds_root = "senteval_pytorch_gensen"  # Root path for the datastore

### Define the AzureML Workspace

In [None]:
import azureml.core
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
if AZUREML_VERBOSE:
    print("Workspace name: {}".format(ws.name))
    print("Resource group: {}".format(ws.resource_group))

Attach the gpu-enabled compute target, or create a new one if it doesn't already exist.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print("Found compute target: {}".format(cluster_name))
except ComputeTargetException:
    print("Creating new compute target: {}".format(cluster_name))
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_NC6", max_nodes=4
    )
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True)

if AZUREML_VERBOSE:
    print(compute_target.get_status().serialize())

Define the datastore. Here we will use the default datastore and then upload our external dependencies. 

If your data is already on the cloud, you can register your resource on any Azure storage account as the datastore. (Currently, the list of supported Azure storage services that can be registered as datastores are Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System. Learn more about the Datastore module [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore?view=azure-ml-py).)

In [None]:
from azureml.core import Datastore

ds = ws.get_default_datastore()
if AZUREML_VERBOSE:
    print("Default datastore: {}".format(ds.name))

In [None]:
import azureml.data
from azureml.data.azure_storage_datastore import AzureFileDatastore

# Upload the gensen dependency
ds.upload(
    src_dir=os.path.join(PATH_TO_GENSEN),
    target_path=os.path.join(ds_root, "gensen_lib"),
    overwrite=False,
    show_progress=AZUREML_VERBOSE,
)

# Upload the senteval dependency
ds.upload(
    src_dir=os.path.join(PATH_TO_SENTEVAL),
    target_path=os.path.join(ds_root, "senteval_lib"),
    overwrite=False,
    show_progress=AZUREML_VERBOSE,
)

# Upload the utils_nlp/eval/senteval.py dependency (this defines the azureml-compatible wrapper for senteval)
ds.upload_files(
    files=["../../utils_nlp/eval/senteval.py"],
    target_path=os.path.join(ds_root, "utils_nlp/eval"),
    overwrite=False,
    show_progress=AZUREML_VERBOSE,
)

Note that after the upload is complete, you can safely delete the dependencies from your local machine to free up some memory.

### Create the evaluation script

In [None]:
%%writefile $src_dir/evaluate.py
import os
import sys
import argparse
import torch
import pandas as pd

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--ds_gensen", type=str, dest="ds_gensen")
    parser.add_argument("--ds_senteval", type=str, dest="ds_senteval")
    parser.add_argument("--ds_utils", type=str, dest="ds_utils")
    args = parser.parse_args()
    
    # Import the dependencies
    sys.path.insert(0, args.ds_gensen)
    from gensen import GenSen, GenSenSingle
    sys.path.insert(0, args.ds_utils)
    from eval.senteval import SentEvalRunner

    # Define the model
    model_params = {}
    model_params["folder_path"] = os.path.join(args.ds_gensen, "data/models")
    model_params["prefix_1"] = "nli_large_bothskip_parse"
    model_params["prefix_2"] = "nli_large_bothskip"
    model_params["pretrain"] = os.path.join(
        args.ds_gensen, "data/embedding/glove.840B.300d.h5"
    )
    model_params["cuda"] = torch.cuda.is_available()

    gensen_1 = GenSenSingle(
        model_folder=model_params["folder_path"],
        filename_prefix=model_params["prefix_1"],
        pretrained_emb=model_params["pretrain"],
        cuda=model_params["cuda"],
    )
    gensen_2 = GenSenSingle(
        model_folder=model_params["folder_path"],
        filename_prefix=model_params["prefix_2"],
        pretrained_emb=model_params["pretrain"],
        cuda=model_params["cuda"],
    )
    gensen = GenSen(gensen_1, gensen_2)

    # Define the SentEval Runner, an AzureML-compatible wrapper class for SentEval
    ser = SentEvalRunner(path_to_senteval=args.ds_senteval, use_azureml=True)
    ser.set_transfer_data_path(relative_path="data")
    ser.set_transfer_tasks(
        ["STSBenchmark", "STS12", "STS13", "STS14", "STS15", "STS16"]
    )
    ser.set_model(gensen)
    ser.set_params_senteval()  # accepts defaults

    # Define the batcher and prepare functions for SentEval
    def prepare(params, samples):
        vocab = set()
        for sample in samples:
            if params.current_task != "TREC":
                sample = " ".join(sample).lower().split()
            else:
                sample = " ".join(sample).split()
            for word in sample:
                if word not in vocab:
                    vocab.add(word)

        vocab.add("<s>")
        vocab.add("<pad>")
        vocab.add("<unk>")
        vocab.add("</s>")
        # Optional vocab expansion
        # params["model"].vocab_expansion(vocab)

    def batcher(params, batch):
        # batch contains list of words
        max_tasks = ["MR", "CR", "SUBJ", "MPQA", "ImageCaptionRetrieval"]
        if params.current_task in max_tasks:
            strategy = "max"
        else:
            strategy = "last"

        sentences = [" ".join(s).lower() for s in batch]
        _, embeddings = params["model"].get_representation(
            sentences, pool=strategy, return_numpy=True
        )
        return embeddings

    # Run SentEval
    results = ser.run(batcher, prepare)

    # Print results as table
    eval_metrics = ser.print_mean(
        results,
        selected_metrics=["pearson", "spearman"],
    )
    print(eval_metrics.head(eval_metrics.shape[0]))

### Create a Pytorch Estimator to submit the evaluation script to the compute target

In [None]:
from azureml.train.dnn import PyTorch
from azureml.core.runconfig import MpiConfiguration

est = PyTorch(
    source_directory=src_dir,
    script_params={
        "--ds_gensen": ds.path("{}/gensen_lib".format(ds_root)).as_mount(),
        "--ds_senteval": ds.path("{}/senteval_lib".format(ds_root)).as_mount(),
        "--ds_utils": ds.path("{}/utils_nlp".format(ds_root)).as_mount(),
    },
    compute_target=compute_target,
    entry_script="evaluate.py",
    node_count=4,
    process_count_per_node=1,
    distributed_training=MpiConfiguration(),
    use_gpu=True,
    framework_version="1.0",
    conda_packages=["h5py", "nltk"],
    pip_packages=["pandas"],
)

### Run Evaluation

In [None]:
from azureml.core import Experiment

experiment = Experiment(ws, name="senteval-pytorch-gensen")
run = experiment.submit(est)

Visualize the run via a Jupyter widget.

In [None]:
from azureml.widgets import RunDetails

RunDetails(run).show()

Alternatively, block until the script has completed.

In [None]:
#run.wait_for_completion(show_output=AZUREML_VERBOSE)