<a href="https://colab.research.google.com/github/rahiakela/transformers-for-natural-language-processing/blob/main/2-fine-tuning-BERT-models/BERT_fine_tuning_for_sentence_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## BERT Fine-Tuning for Sentence Classification

In this notebook, we will fine-tune a BERT model to predict the downstream task of Acceptability Judgements and measure the predictions with the Matthews Correlation Coefficient (MCC).


[Reference Article by Chris McCormick and Nick Ryan](https://mccormickml.com/2019/07/22/BERT-fine-tuning/)

## Setup

Pretraining a multi-head attention transformer model requires the parallel
processing GPUs can provide.

The program first starts by checking if the GPU is activated:

In [1]:
%tensorflow_version 2.x     # magic command instructing to use TensorFlow version 2+
import tensorflow as tf

device_name = tf.test.gpu_device_name()
if device_name != "/device:GPU:0":
  raise SystemError("GPU device not found")
print("Found GPU at: {}".format(device_name))

print(tf.__version__)

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x     # magic command instructing to use TensorFlow version 2+`. This will be interpreted as: `2.x`.


TensorFlow 2.x selected.
Found GPU at: /device:GPU:0
2.4.1


Hugging Face provides modules in TensorFlow and PyTorch. I recommend that a
developer feels comfortable with both environments. Excellent AI research teams use either or both environments.

In [2]:
!pip install -q transformers

[K     |████████████████████████████████| 2.1MB 8.7MB/s 
[K     |████████████████████████████████| 3.2MB 37.8MB/s 
[K     |████████████████████████████████| 890kB 57.8MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


In [3]:
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer, BertConfig
from transformers import AdamW, BertForSequenceClassification, get_linear_schedule_with_warmup

from tqdm import tqdm, trange

import pandas as pd
import io
import numpy as np
import matplotlib.pyplot as plt

We will now specify that torch uses the Compute Unified Device Architecture
(CUDA) to put the parallel computing power of the NVIDIA card to work for our
multi-head attention model:

In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)

'Tesla T4'

## Loading the dataset