# First BERT Experiments

In this notebook we do some first experiments with BERT: we finetune a BERT model+classifier on each of our datasets separately and compute the accuracy of the resulting classifier on the test data.

For these experiments we use the `pytorch_transformers` package. It contains a variety of neural network architectures for transfer learning and pretrained models, including BERT and XLNET.

Two different BERT models are relevant for our experiments: 

- BERT-base-uncased: a relatively small BERT model that should already give reasonable results,
- BERT-large-uncased: a larger model for real state-of-the-art results.

In [1]:
BERT_MODEL = 'bert-large-uncased'
BATCH_SIZE = 16 if "base" in BERT_MODEL else 2
GRADIENT_ACCUMULATION_STEPS = 1 if "base" in BERT_MODEL else 8
MAX_SEQ_LENGTH = 100
PREFIX = "eatingmeat_but_xl"

## Data

We use the same data as for all our previous experiments. Here we load the training, development and test data for a particular prompt.

In [2]:
import sys
sys.path.append('../')

import ndjson
import glob
import json

from quillnlp.models.bert.preprocessing import preprocess, create_label_vocabulary, get_data_loader

train_file = f"../data/interim/{PREFIX}_train_withprompt.ndjson"
synth_files = glob.glob(f"../data/interim/{PREFIX}_train_withprompt_*.ndjson")
dev_file = f"../data/interim/{PREFIX}_dev_withprompt.ndjson"
test_file = f"../data/interim/{PREFIX}_test_withprompt.ndjson"

with open(train_file) as i:
    train_data = ndjson.load(i)

synth_data = []
for f in synth_files:
    if "allsynth" in f:
        continue
    with open(f) as i:
        synth_data += ndjson.load(i)
    
with open(dev_file) as i:
    dev_data = ndjson.load(i)
    
with open(test_file) as i:
    test_data = ndjson.load(i)
    
label2idx = create_label_vocabulary(train_data)
idx2label = {v:k for k,v in label2idx.items()}
target_names = [idx2label[s] for s in range(len(idx2label))]

with open("/tmp/labels.json", "w") as o:
    json.dump(label2idx, o)

train_dataloader = get_data_loader(preprocess(train_data, BERT_MODEL, label2idx, MAX_SEQ_LENGTH), BATCH_SIZE)
dev_dataloader = get_data_loader(preprocess(dev_data, BERT_MODEL, label2idx, MAX_SEQ_LENGTH), BATCH_SIZE)
test_dataloader = get_data_loader(preprocess(test_data, BERT_MODEL, label2idx, MAX_SEQ_LENGTH), BATCH_SIZE, shuffle=False)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
I1101 17:08:45.477289 140251368802112 file_utils.py:39] PyTorch version 1.1.0 available.
I1101 17:08:45.594974 140251368802112 modeling_xlnet.py:194] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
I1101 17:08:46.147496 140251368802112 tokenization_utils.py:374] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-

## Model

In [3]:
import torch
from quillnlp.models.bert.models import get_bert_classifier

device = "cuda" if torch.cuda.is_available() else "cpu"
model = get_bert_classifier(BERT_MODEL, len(label2idx), device=device)

I1101 17:08:48.486875 140251368802112 configuration_utils.py:151] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-config.json from cache at /home/yves/.cache/torch/transformers/6dfaed860471b03ab5b9acb6153bea82b6632fb9bbe514d3fff050fe1319ee6d.4c88e2dec8f8b017f319f6db2b157fee632c0860d9422e4851bd0d6999f9ce38
I1101 17:08:48.489198 140251368802112 configuration_utils.py:168] Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_labels": 11,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

I1101 17:08:49.019

## Training

In [4]:
from quillnlp.models.bert.train import train

output_model_file = train(model, train_dataloader, dev_dataloader, BATCH_SIZE, GRADIENT_ACCUMULATION_STEPS, device)

Epoch:   0%|          | 0/20 [00:00<?, ?it/s]

HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…

  outputs = softmax(logits.to('cpu'))



Loss history: []
Dev loss: 1.2212570243411593


Epoch:   5%|▌         | 1/20 [01:27<27:41, 87.47s/it]

HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…


Loss history: [1.2212570243411593]
Dev loss: 0.47678150622932997


Epoch:  10%|█         | 2/20 [02:55<26:17, 87.62s/it]

HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…


Loss history: [1.2212570243411593, 0.47678150622932997]
Dev loss: 0.3504361499238897


Epoch:  15%|█▌        | 3/20 [04:23<24:52, 87.79s/it]

HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…

Epoch:  20%|██        | 4/20 [05:50<23:19, 87.49s/it]


Loss history: [1.2212570243411593, 0.47678150622932997, 0.3504361499238897]
Dev loss: 0.3736043373743693


HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…

Epoch:  25%|██▌       | 5/20 [07:17<21:49, 87.27s/it]


Loss history: [1.2212570243411593, 0.47678150622932997, 0.3504361499238897, 0.3736043373743693]
Dev loss: 0.3758372289163095


HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…

Epoch:  30%|███       | 6/20 [08:43<20:19, 87.12s/it]


Loss history: [1.2212570243411593, 0.47678150622932997, 0.3504361499238897, 0.3736043373743693, 0.3758372289163095]
Dev loss: 0.3773738808102078


HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…

Epoch:  35%|███▌      | 7/20 [10:10<18:51, 87.01s/it]


Loss history: [1.2212570243411593, 0.47678150622932997, 0.3504361499238897, 0.3736043373743693, 0.3758372289163095, 0.3773738808102078]
Dev loss: 0.3864011565844218


HBox(children=(IntProgress(value=0, description='Training iteration', max=758, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=54, style=ProgressStyle(descriptio…


Loss history: [1.2212570243411593, 0.47678150622932997, 0.3504361499238897, 0.3736043373743693, 0.3758372289163095, 0.3773738808102078, 0.3864011565844218]
Dev loss: 0.3680140022878294
No improvement on development set. Finish training.





## Evaluation

In [5]:
from quillnlp.models.bert.train import evaluate
from sklearn.metrics import precision_recall_fscore_support, classification_report

print("Loading model from", output_model_file)
device="cpu"

model = get_bert_classifier(BERT_MODEL, len(label2idx), model_file=output_model_file, device=device)
model.eval()

_, _, test_correct, test_predicted = evaluate(model, test_dataloader, device)

print("Test performance:", precision_recall_fscore_support(test_correct, test_predicted, average="micro"))
print(classification_report(test_correct, test_predicted, target_names=target_names))

Loading model from /tmp/model.bin


I1101 17:20:36.675119 140251368802112 configuration_utils.py:151] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-config.json from cache at /home/yves/.cache/torch/transformers/6dfaed860471b03ab5b9acb6153bea82b6632fb9bbe514d3fff050fe1319ee6d.4c88e2dec8f8b017f319f6db2b157fee632c0860d9422e4851bd0d6999f9ce38
I1101 17:20:36.677800 140251368802112 configuration_utils.py:168] Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_labels": 11,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

I1101 17:20:37.189

HBox(children=(IntProgress(value=0, description='Evaluation iteration', max=82, style=ProgressStyle(descriptio…


Test performance: (0.9329268292682927, 0.9329268292682927, 0.9329268292682927, None)
                                                                          precision    recall  f1-score   support

                                   Change without mentioning consumption       0.00      0.00      0.00         1
                   Less meat consumption could harm economy and cut jobs       1.00      0.98      0.99        42
The meat industry is important/thriving and/or exports/demand increasing       0.95      1.00      0.97        18
                             Eating meat is necessary for good nutrition       0.67      0.67      0.67         6
                                Eating meat is part of culture/tradition       0.86      1.00      0.93        19
                                  Meat creates jobs and benefits economy       0.97      0.97      0.97        40
                                              Outside of article's scope       0.92      1.00      0.96        11
 

  'precision', 'predicted', average, warn_for)


In [6]:
c = 0
for item, predicted, correct in zip(test_data, test_predicted, test_correct):
    assert item["label"] == idx2label[correct]
    c += (item["label"] == idx2label[predicted])
    print("{}#{}#{}".format(item["text"], idx2label[correct], idx2label[predicted]))
    
print(c)
print(c/len(test_data))

Large amounts of meat consumption are harming the environment, but decreasing meat consumption could harm meat industry, the economy, and decrease jobs.#Less meat consumption could harm economy and cut jobs#Less meat consumption could harm economy and cut jobs
Large amounts of meat consumption are harming the environment, but still remain a large and growing part of many cultures' diets around the world#Eating meat is part of culture/tradition#Eating meat is part of culture/tradition
Large amounts of meat consumption are harming the environment, but the meat industry provides many jobs.#Meat creates jobs and benefits economy#Meat creates jobs and benefits economy
Large amounts of meat consumption are harming the environment, but livestock production is also a boon to our economy.#Meat creates jobs and benefits economy#Meat creates jobs and benefits economy
Large amounts of meat consumption are harming the environment, but eliminating consumption of meat would hurt the economy and take 