# SentEval on Local

SentEval is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)

Running SentEval locally is simple. Clone the [repository](https://github.com/facebookresearch/SentEval), follow their setup instructions to get the data for the transfer tasks, and implement two functions `prepare(params, samples)` and `batcher(params, batch)` specific to your model. The authors provide some guidance on how to do this in the [examples](https://github.com/facebookresearch/SentEval/tree/master/examples) directory of their repository. In this notebook we show an example for evaluating the GenSen model on the available STS downstream tasks.

### 00 Global Settings

In [None]:
import os
import sys
import json
import torch
import pandas as pd

sys.path.append("../../")
from utils_nlp.eval.senteval import SentEvalRunner

print("System version: {}".format(sys.version))
print("Torch version: {}".format(torch.__version__))

### 01 SentEval Settings

In [None]:
PATH_TO_SENTEVAL = (
    "../../../../SentEval"
)  # Set this path to where you have cloned the senteval source code
sys.path.insert(0, PATH_TO_SENTEVAL)
import senteval

transfer_tasks = ["STSBenchmark", "STS12", "STS13", "STS14", "STS15", "STS16"]

params_senteval = {
    "task_path": os.path.join(PATH_TO_SENTEVAL, "data"),
    "usepytorch": True,
    "kfold": 10,
}
params_senteval["classifier"] = {
    "nhid": 0,
    "optim": "adam",
    "batch_size": 64,
    "tenacity": 5,
    "epoch_size": 4,
}

### 02 GenSen Settings

In [None]:
PATH_TO_GENSEN = (
    "../../../../gensen"
)  # Set this path to where you have cloned the gensen source code
sys.path.append(PATH_TO_GENSEN)
from gensen import GenSen, GenSenSingle

model_params = {}
model_params["folder_path"] = os.path.join(PATH_TO_GENSEN, "data/models")
model_params["prefix_1"] = "nli_large_bothskip_parse"
model_params["prefix_2"] = "nli_large_bothskip"
model_params["pretrain"] = os.path.join(
    PATH_TO_GENSEN, "data/embedding/glove.840B.300d.h5"
)
model_params["cuda"] = torch.cuda.is_available()

print("model params: {}".format(json.dumps(model_params, indent=4)))

### 03 SentEval Functions

As specified in the SentEval [repo](https://github.com/facebookresearch/SentEval#how-to-use-senteval), we implement 2 functions:

<b>prepare</b> (sees the whole dataset of each task and can thus construct the word vocabulary, the dictionary of word vectors etc)         
<b>batcher</b> (transforms a batch of text sentences into sentence embeddings)

In [None]:
def prepare(params, samples):
    vocab = set()
    for sample in samples:
        if params.current_task != "TREC":
            sample = " ".join(sample).lower().split()
        else:
            sample = " ".join(sample).split()
        for word in sample:
            if word not in vocab:
                vocab.add(word)

    vocab.add("<s>")
    vocab.add("<pad>")
    vocab.add("<unk>")
    vocab.add("</s>")
    # Optional vocab expansion
    # params["model"].vocab_expansion(vocab)


def batcher(params, batch):
    # batch contains list of words
    max_tasks = ["MR", "CR", "SUBJ", "MPQA", "ImageCaptionRetrieval"]
    if params.current_task in max_tasks:
        strategy = "max"
    else:
        strategy = "last"

    sentences = [" ".join(s).lower() for s in batch]
    _, embeddings = params["model"].get_representation(
        sentences, pool=strategy, return_numpy=True
    )
    return embeddings

### 04 Run SentEval on GenSen

In [None]:
gensen_1 = GenSenSingle(
    model_folder=model_params["folder_path"],
    filename_prefix=model_params["prefix_1"],
    pretrained_emb=model_params["pretrain"],
    cuda=model_params["cuda"],
)
gensen_2 = GenSenSingle(
    model_folder=model_params["folder_path"],
    filename_prefix=model_params["prefix_2"],
    pretrained_emb=model_params["pretrain"],
    cuda=model_params["cuda"],
)
gensen = GenSen(gensen_1, gensen_2)

ser = SentEvalRunner(path_to_senteval=PATH_TO_SENTEVAL, use_azureml=False)
ser.set_transfer_data_path(os.path.join(PATH_TO_SENTEVAL, "data"))
ser.set_transfer_tasks(transfer_tasks)
ser.set_model(gensen)
ser.set_params_senteval()
results = ser.run(batcher, prepare)

Print selected metrics from the model's results on the transfer tasks as a table.

In [None]:
eval_metrics = ser.print_mean(results, selected_metrics=["pearson", "spearman"])
print(eval_metrics.head(eval_metrics.shape[0]))