BERT Assignment 

The task objective is to code sentiment analysis by BERT 

* An assignment part is denoted by (Assingment) 

* Grading criteria: Points are given if all your code in this notebook is runnable and the final acc is upper than 0.5

* Points are not given if the testing cell at the end of the notebook is modified or extra cells (including text) are added after the last cell. Do not change **epochs** for testing efficiently.

* Testing your model with the testing cell is recommended. 

* Please do not re-use the code from the example code. You have to write the code yourself.


## Assignmnet List 
* (Assignment) 1.2 Upload tokenizer 
* (Assignment) 1.3 Make your Dataset by BertDataset function 
* (Assignment) 2.1 Make BERT Model by pretrained_model
* (Assignment) 3.1 Freeze parameters 
* (Assignment) 3.2 Make finetune function 

#1. Preparing Data



In [1]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[K     |████████████████████████████████| 5.5 MB 7.4 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 42.5 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 70.3 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.2 transformers-4.24.0


In [2]:
import pandas as pd
import numpy as np
import transformers
import torch
import torch.nn as nn

import torch.optim as optim
import torch.nn.functional as F
from torchsummary import summary
from tqdm import tqdm

## 1.1 Make BertDataset

In [3]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class BertDataset(Dataset):
    def __init__(self, tokenizer,max_length):
        super(BertDataset, self).__init__()
        self.train_csv=pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)
        self.tokenizer=tokenizer
        self.target=self.train_csv.iloc[:,1]
        self.max_length=max_length
        
    def __len__(self):
        return len(self.train_csv)
    
    def __getitem__(self, index):
        
        text1 = self.train_csv.iloc[index,0]
        inputs = self.tokenizer.encode_plus(
            text1 ,
            None,
            pad_to_max_length=True,
            add_special_tokens=True,
            return_attention_mask=True,
            max_length=self.max_length,
        )
        ids = inputs["input_ids"]
        token_type_ids = inputs["token_type_ids"]
        mask = inputs["attention_mask"]

        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
            'target': torch.tensor(self.train_csv.iloc[index, 1], dtype=torch.long)
            }


## (Assignment) 1.2 Upload tokenizer 
* Fill out `tokenizer` by using BertTokenizer of "prajjwal1/bert-tiny" #(TODO:Assignment).


In [4]:
#(TODO:Assignment)
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('prajjwal1/bert-tiny', do_lower_case=True)

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/285 [00:00<?, ?B/s]

## (Assignment) 1.3 Make your Dataset by `BertDataset`
* Fill out `dataset` by using BertDataset` 
* Fill out `dataloader` by using `DataLoader` of torch.utils.data
* Set max_length=100 
* Set batch_size=32 

In [5]:
#(TODO:Assignment)
dataset= BertDataset(tokenizer, max_length=100)
dataloader= DataLoader(dataset, batch_size=32, shuffle=True)

# 2. Construction your Model 

## (Assignment) 2.1 Make BERT Model by pretrained_model 
* Use the pretrained_model `prajjwal1/bert-tiny`
* Simplest example code is below. But you can design BERT Model appropriately.
* Do not stack more than three layers. The grading time should not exceed 3 minutes.

In [6]:
# Example code
import torch.nn as nn
class BERT(nn.Module):
    def __init__(self):
        super(BERT, self).__init__()
        self.bert_model = transformers.BertModel.from_pretrained("prajjwal1/bert-tiny")
        self.out = nn.Linear(128, 1)
        
    def forward(self,ids,mask,token_type_ids):
        _,o2= self.bert_model(ids,attention_mask=mask,token_type_ids=token_type_ids, return_dict=False)
        out= self.out(o2)
        return out
    
model=BERT()

Downloading:   0%|          | 0.00/17.8M [00:00<?, ?B/s]

Some weights of the model checkpoint at prajjwal1/bert-tiny were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
import torch.nn as nn
class BERT(nn.Module):
    def __init__(self):
        super(BERT, self).__init__()
        self.bert_model = transformers.BertModel.from_pretrained("prajjwal1/bert-tiny")
        #(TODO:Assignment) 1) Define your BERT Model 
        self.dropout = nn.Dropout(0.25)
        self.classifier = nn.Sequential(
            nn.Linear(128, 32),
            nn.ReLU(True),
            nn.Linear(32, 1)
        )

    def forward(self,ids,mask,token_type_ids):
        _,o2= self.bert_model(ids,attention_mask=mask,token_type_ids=token_type_ids, return_dict=False)
        #(TODO:Assignment) 2) Forward your BERT Model 
        out = self.dropout(o2)
        out = self.classifier(out)
        return out
    
model=BERT()

Some weights of the model checkpoint at prajjwal1/bert-tiny were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# 3. Train your Moel 

In [14]:
loss_fn = nn.BCEWithLogitsLoss()
optimizer= optim.Adam(model.parameters(),lr= 0.0001)

## (Assignment) 3.1 Freeze parameters 
* Fill out `#(TODO:Assignment)` to freeze parameters of BERT. 
* We should update prameters of non-BERT

In [15]:
for param in model.bert_model.parameters():
    param.requires_grad = False
  #(TODO:Assignment) 1) Set requires_grad to be False

In [16]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 4,161 trainable parameters


## (Assignment) 3.2 Make finetune function 
* Fill out `#(TODO:Assignment)` to make `def fineture`

1) Define loop by dataloader 

2) Set the grad of optimzer to be zero

3) Calculate the loss_fn(output, label) and the backward of the loss_fn

4) Estimate the everage accuracy of the model for each epoch 

5) Estimate the total everage accuracy of the model

In [26]:
def finetune(epochs,dataloader,model,loss_fn,optimizer):
    model.train()
    for epoch in range(epochs):
        acc = 0 
        N = 0 
        #(TODO:Assignment) 1) Define loop by dataloader 
        for batch, dl in enumerate(dataloader):
            ids=dl['ids']
            token_type_ids=dl['token_type_ids']
            mask= dl['mask']
            label=dl['target']
            label = label.unsqueeze(1)

            #(TODO:Assignment) 2) Set the grad of optimzer to be zero
            optimizer.zero_grad()
            output = model(
                        ids=ids,
                        mask=mask,
                        token_type_ids=token_type_ids)
            label = label.type_as(output)
            
            #(TODO:Assignment) 3) Calculate the loss_fn(output, label) and the backward of the loss_fn
            loss = loss_fn(output, label)
            loss.backward()
            optimizer.step()
            #(TODO:Assignment) 4) Estimate the average accuracy of the model for each epoch 
            prediction = torch.where(output > 0, 1, 0)
            count_true = torch.sum(prediction == label).item()
            
            acc = acc + count_true
            N = N + prediction.shape[0]
    
    #(TODO:Assignment) 5) Estimate the total average accuracy of the model
    acc = float(acc)/float(N)
    return model, acc

# DO NOT CHANGE THE CODE AFTER (Assignment) 3.2 Make finetune function 

## 3.2 Train! 

In [27]:
epochs=5
model, acc =finetune(epochs, dataloader, model, loss_fn, optimizer)


In [28]:
accuracy = acc
print(accuracy)

0.6111271676300578
