# BERT Fine-Tuning on Stanford Sentiment Treebank (SST-2)

## Setup

Pull the git repo that contains the preprocessed SST-2 data. The `clone` command will error if it is already downloaded. 

In [1]:
!git clone https://github.com/ronakdm/input-marginalization.git

fatal: destination path 'input-marginalization' already exists and is not an empty directory.


In [2]:
%%bash
cd input-marginalization
git pull
cd ..

Already up to date.


Mount a Google Drive folder so that the model and stats can be saved. Change this to a directory in your Drive.

In [12]:
from google.colab import drive
drive.mount('/content/gdrive')
save_dir = "/content/gdrive/My Drive/input-marginalization"

Mounted at /content/gdrive


In [3]:
import pickle
import numpy as np
import time
import datetime
import random
import torch

In [4]:
import sys
sys.path.append("input-marginalization")

from utils import generate_dataloaders, train, test

In [5]:
try:
    from transformers import BertForSequenceClassification, AdamW, BertConfig, get_linear_schedule_with_warmup
except ModuleNotFoundError:
    !pip install transformers
    from transformers import BertForSequenceClassification, AdamW, BertConfig, get_linear_schedule_with_warmup

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/98/87/ef312eef26f5cecd8b17ae9654cdd8d1fae1eb6dbd87257d6d73c128a4d0/transformers-4.3.2-py3-none-any.whl (1.8MB)
[K     |▏                               | 10kB 19.2MB/s eta 0:00:01[K     |▍                               | 20kB 14.8MB/s eta 0:00:01[K     |▌                               | 30kB 13.1MB/s eta 0:00:01[K     |▊                               | 40kB 12.4MB/s eta 0:00:01[K     |█                               | 51kB 8.7MB/s eta 0:00:01[K     |█                               | 61kB 8.0MB/s eta 0:00:01[K     |█▎                              | 71kB 9.0MB/s eta 0:00:01[K     |█▌                              | 81kB 10.1MB/s eta 0:00:01[K     |█▋                              | 92kB 10.4MB/s eta 0:00:01[K     |█▉                              | 102kB 8.4MB/s eta 0:00:01[K     |██                              | 112kB 8.4MB/s eta 0:00:01[K     |██▏                             | 122kB

## Model, Data, and Optimizer

Set hyperparameters and construct dataloaders.

In [6]:
LEARNING_RATE = 2e-5
ADAMW_TOLERANCE = 1e-8
BATCH_SIZE = 32
EPOCHS = 2

In [8]:
if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"
print("Running on '%s'." % device)

train_dataloader, validation_dataloader, test_dataloader = generate_dataloaders(BATCH_SIZE)

Running on 'cuda'.
6,919 training samples.
  876 validation samples.
1,822 test samples.


We used the pretrained uncased BERT model. Other models can be swapped in.

In [10]:
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels = 2, 
    output_attentions = False,
    output_hidden_states = False,
).to(device)

save_filename = "bert_sst2"

optimizer = AdamW(model.parameters(), lr = LEARNING_RATE, eps = ADAMW_TOLERANCE)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = EPOCHS * BATCH_SIZE * len(train_dataloader))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

## Training and Evaluation

If using other models, edit the `save_filename` and make sure you can output the loss and logits via your model (you might have to have a separate loss module).

In [13]:
try:
    train(model, EPOCHS, train_dataloader, validation_dataloader, optimizer, scheduler, save_dir, save_filename, device)
except KeyboardInterrupt:
    print("Graceful Exit")


Training...
  Batch    40  of    217.    Elapsed: 0:00:16.
  Batch    80  of    217.    Elapsed: 0:00:33.
  Batch   120  of    217.    Elapsed: 0:00:49.
  Batch   160  of    217.    Elapsed: 0:01:06.
  Batch   200  of    217.    Elapsed: 0:01:24.

  Average training loss: 0.38
  Training epcoh took: 0:01:31

Running Validation...
  Accuracy: 0.92
  Validation Loss: 0.22
  Validation took: 0:00:03

Training...
  Batch    40  of    217.    Elapsed: 0:00:18.
  Batch    80  of    217.    Elapsed: 0:00:35.
  Batch   120  of    217.    Elapsed: 0:00:53.
  Batch   160  of    217.    Elapsed: 0:01:12.
  Batch   200  of    217.    Elapsed: 0:01:30.

  Average training loss: 0.16
  Training epcoh took: 0:01:37

Running Validation...
  Accuracy: 0.92
  Validation Loss: 0.22
  Validation took: 0:00:03

Training complete!
Total training took 0:03:15 (h:mm:ss)


In [14]:
test(model, test_dataloader, device, save_dir, save_filename)


Testing...
  Accuracy: 0.92
  Test Loss: 0.19
  Test took: 0:00:08
