# First BERT Experiments

In this notebook we do some first experiments with BERT: we finetune a BERT model+classifier on each of our datasets separately and compute the accuracy of the resulting classifier on the test data.

For these experiments we use the `pytorch_transformers` package. It contains a variety of neural network architectures for transfer learning and pretrained models, including BERT and XLNET.

Two different BERT models are relevant for our experiments: 

- BERT-base-uncased: a relatively small BERT model that should already give reasonable results,
- BERT-large-uncased: a larger model for real state-of-the-art results.

In [1]:
BERT_MODEL = 'bert-base-uncased'
BATCH_SIZE = 16 if "base" in BERT_MODEL else 2
GRADIENT_ACCUMULATION_STEPS = 1 if "base" in BERT_MODEL else 8
MAX_SEQ_LENGTH = 100
PREFIX = "voting_so"

## Data

We use the same data as for all our previous experiments. Here we load the training, development and test data for a particular prompt.

In [2]:
import sys
sys.path.append('../')

import ndjson
import glob
import numpy as np

from quillnlp.models.bert.preprocessing import preprocess, create_label_vocabulary

data_file = f"../data/interim/{PREFIX}_withprompt.ndjson"

with open(data_file) as i:
    data = ndjson.load(i)
        
label2idx = create_label_vocabulary(data)
idx2label = {v:k for k,v in label2idx.items()}
target_names = [idx2label[s] for s in range(len(idx2label))]

data_items = preprocess(data, BERT_MODEL, label2idx, MAX_SEQ_LENGTH)
data_items = np.array(data_items)

I1029 10:17:05.816948 140462989842240 file_utils.py:39] PyTorch version 1.1.0 available.
I1029 10:17:05.939224 140462989842240 modeling_xlnet.py:194] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
I1029 10:17:06.429839 140462989842240 tokenization_utils.py:374] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/yves/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084


## Training

In [3]:
import torch

from quillnlp.models.bert.train import train, evaluate
from quillnlp.models.bert.models import get_bert_classifier

from quillnlp.models.bert.preprocessing import get_data_loader
from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state=1)
all_correct, all_predicted = [], []
all_test_data = []
for train_idx, test_idx in kf.split(data_items):

    train_and_dev_data = data_items[train_idx]
    cutoff = int(len(train_and_dev_data)/4*3)
    
    train_data = train_and_dev_data[:cutoff]
    dev_data = train_and_dev_data[cutoff:]
    test_data = data_items[test_idx]

    train_dataloader = get_data_loader(train_data, BATCH_SIZE)
    dev_dataloader = get_data_loader(dev_data, BATCH_SIZE)
    test_dataloader = get_data_loader(test_data, BATCH_SIZE, shuffle=False)

    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = get_bert_classifier(BERT_MODEL, len(label2idx), device=device)
    output_model_file = train(model, train_dataloader, dev_dataloader, BATCH_SIZE, GRADIENT_ACCUMULATION_STEPS, device)
    
    print("Loading model from", output_model_file)
    device="cpu"

    model = get_bert_classifier(BERT_MODEL, len(label2idx), model_file=output_model_file, device=device)
    model.eval()
    
    _, test_correct, test_predicted = evaluate(model, test_dataloader, device)
    all_correct.extend(test_correct)
    all_predicted.extend(test_predicted)
    all_test_data.extend(test_data)


I1029 10:17:07.245580 140462989842240 configuration_utils.py:151] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/yves/.cache/torch/transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.bf3b9ea126d8c0001ee8a1e8b92229871d06d36d8808208cc2449280da87785c
I1029 10:17:07.248544 140462989842240 configuration_utils.py:168] Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 7,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

I1029 10:17:07.693456

HBox(children=(IntProgress(value=0, description='Training iteration', max=30, style=ProgressStyle(description_…







ValueError: Expected input batch_size (16) to match target batch_size (112).

## Evaluation

In [None]:
from sklearn.metrics import precision_recall_fscore_support, classification_report

print("Test performance:", precision_recall_fscore_support(all_correct, all_predicted, average="micro"))
print(classification_report(all_correct, all_predicted, target_names=target_names))

In [None]:
c = 0
for item, predicted, correct in zip(all_test_data, all_predicted, all_correct):
    assert item.label_id == correct
    c += (item.label_id == predicted)
    print("{}#{}#{}".format(item.text, idx2label[correct], idx2label[predicted]))
    
print()
print(c, "/", len(all_test_data), "=", c/len(all_test_data))