# Multiple Negatives Ranking NLI Training with Negatives

In [1]:
import datasets

dataset = datasets.load_dataset('snli', split='train')

dataset

  from .autonotebook import tqdm as notebook_tqdm
Reusing dataset snli (/home/jupyter/.cache/huggingface/datasets/snli/plain_text/1.0.0/1f60b67533b65ae0275561ff7828aad5ee4282d0e6f844fd148d05d3c6ea251b)


Dataset({
    features: ['premise', 'hypothesis', 'label'],
    num_rows: 550152
})

## Building Triplets

We can start by fine-tuning on $(anchor, positive, negative)$ triplets only. To do this, we must transform the pair, label format of the dataset into a triplet format, eg:

| Premise (anchor) | Hypothesis (positive or negative) | Label |
| --- | --- | --- |
| ... | ... | 1 *(neutral)* |
| ... | ... | 0 *(entailment)* |
| ... | ... | 2 *(contradiction)* |

Into:

| Anchor | Positive | Negative |
| --- | --- | --- |
| ... | ... | ... |

In [2]:
print(f"before: {len(dataset)} rows")
dataset = dataset.filter(
    lambda x: True if x['label'] != 1 else False
)
print(f"after: {len(dataset)} rows")

before: 550152 rows


100%|██████████| 551/551 [00:03<00:00, 151.02ba/s]

after: 367388 rows





Convert to list of dictionaries in the format:

```json
{"anchor": ["positive", "negative"]}
```

In [6]:
from tqdm.auto import tqdm

triplets = {}

for row in tqdm(dataset):
    anchor = row['premise']
    if anchor not in triplets.keys():
        triplets[anchor] = [None, None]
    if row['label'] == 0:
        # this is positive
        triplets[anchor][0] = row['hypothesis']
    elif row['label'] == 2:
        # this is negative
        triplets[anchor][1] = row['hypothesis']
        
# save space
del dataset

100%|██████████| 367388/367388 [00:30<00:00, 11897.34it/s]


## Training Setup

Now we can start preparing the data for fine-tuning via the sentence-transformers library. We start by collating all training examples using `InputExample` objects.

In [7]:
from sentence_transformers import InputExample
from tqdm.auto import tqdm  # so we see progress bar

train_samples = []
for anchor in tqdm(triplets.keys()):
    # check sample has all data
    positive = triplets[anchor][0]
    negative = triplets[anchor][1]
    if positive is not None and negative is not None:
        train_samples.append(InputExample(
            texts=[anchor, positive, negative]
        ))

# save space
del triplets

100%|██████████| 150734/150734 [00:00<00:00, 278375.16it/s]


In [8]:
len(train_samples)

149145

Then we use a `NoDuplcatesDataLoader` to *load* them into the model during training.

In [9]:
from sentence_transformers import datasets

batch_size = 32

loader = datasets.NoDuplicatesDataLoader(
    train_samples, batch_size=batch_size
)

In [10]:
import torch
from sentence_transformers import models, SentenceTransformer

device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

transformer = models.Transformer('microsoft/mpnet-base')
#transformer.max_seq_length = 512
pooler = models.Pooling(
    transformer.get_word_embedding_dimension(),
    pooling_mode_mean_tokens=True
)

model = SentenceTransformer(
    modules=[transformer, pooler],
    device=device
)
print(model)

Using cuda:0 device


Some weights of the model checkpoint at microsoft/mpnet-base were not used when initializing MPNetModel: ['lm_head.dense.bias', 'lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing MPNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MPNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of MPNetModel were not initialized from the model checkpoint at microsoft/mpnet-base and are newly initialized: ['mpnet.pooler.dense.weight', 'mpnet.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predi

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)


Initialize MNR loss

In [11]:
from sentence_transformers import losses

loss = losses.MultipleNegativesRankingLoss(model)

Start training

In [None]:
epochs = 1
warmup_steps = int(len(loader) * epochs * 0.1)

model.fit(
    train_objectives=[(loader, loss)],
    epochs=epochs,
    warmup_steps=warmup_steps,
    output_path='./mpnet-nli-negatives',
    show_progress_bar=True,
    checkpoint_path='./mpnet-nli-negatives-ckpts',
    checkpoint_save_steps=50_000
)

2022-09-20 10:34:59.435520: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Epoch:   0%|          | 0/1 [00:00<?, ?it/s]
Iteration:   0%|          | 0/4660 [00:00<?, ?it/s][A
Iteration:   0%|          | 1/4660 [00:01<1:52:33,  1.45s/it][A
Iteration:   0%|          | 2/4660 [00:01<1:00:23,  1.29it/s][A
Iteration:   0%|          | 3/4660 [00:01<42:36,  1.82it/s]  [A
Iteration:   0%|          | 4/4660 [00:01<33:42,  2.30it/s][A
Iteration:   0%|          | 5/4660 [00:01<28:19,  2.74it/s][A
Iteration:   0%|          | 6/4660 [00:02<24:47,  3.13it/s][A
Iteration:   0%|          | 7/4660 [00:02<22:14,  3.49it/s][A
Iteration:   0%|          | 8/4660 [00:02<20:22,  3.81it/s][A
Iteration:   0%|          | 9/4660 [00:02<19:14,  4.03it/s][A
Iteration:   0%|          | 10/46