### Check Hardware & RAM availability:
Commands to check for available GPU and RAM allocation on runtime

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Tue Oct 26 05:25:00 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### References:
* https://huggingface.co/
* https://arxiv.org/abs/1907.11692

### Install Required Libraries for Transformer Models:
* Pre-Trained Transformer models are part of Hugging Face Library(transformers).
* Similarly, any datatset part of Hugging Face can be called from the datasets library.
* Finally we will use a high level abstraction package called k-train to simplify our modelling and predictions

In [2]:
!pip install ktrain
!pip install transformers
!pip install datasets

Collecting ktrain
  Downloading ktrain-0.28.2.tar.gz (25.3 MB)
[K     |████████████████████████████████| 25.3 MB 1.5 MB/s 
[?25hCollecting scikit-learn==0.23.2
  Downloading scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 33.2 MB/s 
Collecting langdetect
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[K     |████████████████████████████████| 981 kB 53.2 MB/s 
Collecting cchardet
  Downloading cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263 kB)
[K     |████████████████████████████████| 263 kB 61.1 MB/s 
Collecting syntok
  Downloading syntok-1.3.1.tar.gz (23 kB)
Collecting seqeval==0.0.19
  Downloading seqeval-0.0.19.tar.gz (30 kB)
Collecting transformers<=4.10.3,>=4.0.0
  Downloading transformers-4.10.3-py3-none-any.whl (2.8 MB)
[K     |████████████████████████████████| 2.8 MB 51.2 MB/s 
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 

### Import Libraries:

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ktrain
from ktrain import text
import tensorflow as tf
from sklearn.model_selection import train_test_split
from datasets import list_datasets
from datasets import load_dataset
import timeit
import warnings

pd.set_option('display.max_columns', None)
warnings.simplefilter(action="ignore")

In [4]:
tf.__version__

'2.6.0'

### Load Dataset:

In [5]:
emotion_train = load_dataset('emotion', split='train')
emotion_val = load_dataset('emotion', split='validation')

Downloading:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Using custom data configuration default


Downloading and preparing dataset emotion/default (download: 1.97 MiB, generated: 2.07 MiB, post-processed: Unknown size, total: 4.05 MiB) to /root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705...


Downloading:   0%|          | 0.00/1.66M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/204k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/207k [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset emotion downloaded and prepared to /root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705. Subsequent calls will reuse this data.


Using custom data configuration default
Reusing dataset emotion (/root/.cache/huggingface/datasets/emotion/default/0.0.0/348f63ca8e27b3713b6c04d723efe6d824a56fb3d1449794716c0f0296072705)


In [6]:
print("\nTrain Dataset Features for Emotion: \n", emotion_train.features)
print("\nValidation Dataset Features for Emotion: \n", emotion_val.features)


Train Dataset Features for Emotion: 
 {'text': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=6, names=['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'], names_file=None, id=None)}

Validation Dataset Features for Emotion: 
 {'text': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=6, names=['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'], names_file=None, id=None)}


In [7]:
emotion_train_df = pd.DataFrame(data=emotion_train)
emotion_val_df = pd.DataFrame(data=emotion_val)

In [8]:
class_label_names = ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']

### Split Train & Validation data:

In [9]:
X_train = emotion_train_df[:]["text"]
y_train = emotion_train_df[:]["label"]
X_test = emotion_val_df[:]["text"]
y_test = emotion_val_df[:]["label"]

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(16000,) (16000,) (2000,) (2000,)


### Instantiating a AlBERT Instance:
Create a AlBERT instance with the model name, max token length, the labels to be used for each category and the batch size.

In [10]:
albert_transformer = text.Transformer('albert-base-v1', maxlen=512, classes=class_label_names, batch_size=6)

Downloading:   0%|          | 0.00/684 [00:00<?, ?B/s]

### Perform Data Preprocessing:

In [11]:
albert_train = albert_transformer.preprocess_train(X_train.to_list(), y_train.to_list())
albert_val = albert_transformer.preprocess_test(X_test.to_list(), y_test.to_list())

preprocessing train...
language: en
train sequence lengths:
	mean : 19
	95percentile : 41
	99percentile : 52


Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Is Multi-Label? False
preprocessing test...
language: en
test sequence lengths:
	mean : 19
	95percentile : 40
	99percentile : 52


### Compile AlBERT in a K-Train Learner Object:
Since we are using k-train as a high level abstration package, we need to wrap our model in a k-train Learner Object for further compuation

In [12]:
albert_model = albert_transformer.get_classifier()

Downloading:   0%|          | 0.00/63.0M [00:00<?, ?B/s]

In [13]:
albert_learner_ins = ktrain.get_learner(model=albert_model,
                            train_data=albert_train,
                            val_data=albert_val,
                            batch_size=6)

### BERT Model Summary:

In [14]:
albert_learner_ins.model.summary()

Model: "tf_albert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
albert (TFAlbertMainLayer)   multiple                  11683584  
_________________________________________________________________
dropout_4 (Dropout)          multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  4614      
Total params: 11,688,198
Trainable params: 11,688,198
Non-trainable params: 0
_________________________________________________________________


### BERT Optimal Learning Rates:¶
BERT follows Knowledge Distillation on BERT, hence we can use the established batch sizes and learning rates as used in BERT:

* Batch Sizes => {16, 32}
* Learning Rates => {1e−5, 2e−5, 3e−5}
We will choose the maximum among these for our fine-tuning and evaluation purposes.

### Fine Tuning BERT on Emotion Dataset:
We take our Dbpedia Ontology dataset along with the BERT model we created, define the learning-rate & epochs to be used and start fine-tuning.

In [15]:
albert_fine_tuning_start= timeit.default_timer()
albert_learner_ins.fit_onecycle(lr=2e-5, epochs=3)
albert_fine_tuning_stop = timeit.default_timer()



begin training using onecycle policy with max lr of 2e-05...
Epoch 1/3
Epoch 2/3
Epoch 3/3


In [16]:
print("\nFine-Tuning time for AlBERT on Emotion dataset: \n", (albert_fine_tuning_stop - albert_fine_tuning_start)/60, " min")


Fine-Tuning time for AlBERT on Emotion dataset: 
 59.36922997883334  min


### Checking BERT performance metrics:

In [17]:
albert_validation_start= timeit.default_timer()
albert_learner_ins.validate()
albert_validation_stop= timeit.default_timer()

              precision    recall  f1-score   support

           0       0.97      0.96      0.96       550
           1       0.97      0.95      0.96       704
           2       0.88      0.92      0.90       178
           3       0.90      0.95      0.93       275
           4       0.86      0.91      0.88       212
           5       0.98      0.74      0.85        81

    accuracy                           0.94      2000
   macro avg       0.93      0.91      0.91      2000
weighted avg       0.94      0.94      0.94      2000



In [18]:
print("\nInference time for AlBERT on Emotion dataset: \n", (albert_validation_stop - albert_validation_start), " sec")


Inference time for AlBERT on Emotion dataset: 
 9.228876909000064  sec


In [19]:
albert_learner_ins.validate(class_names=class_label_names)

              precision    recall  f1-score   support

     sadness       0.97      0.96      0.96       550
         joy       0.97      0.95      0.96       704
        love       0.88      0.92      0.90       178
       anger       0.90      0.95      0.93       275
        fear       0.86      0.91      0.88       212
    surprise       0.98      0.74      0.85        81

    accuracy                           0.94      2000
   macro avg       0.93      0.91      0.91      2000
weighted avg       0.94      0.94      0.94      2000



array([[527,   3,   0,   8,  12,   0],
       [  4, 670,  22,   6,   1,   1],
       [  1,  13, 164,   0,   0,   0],
       [  8,   2,   0, 262,   3,   0],
       [  5,   0,   0,  15, 192,   0],
       [  1,   6,   0,   0,  14,  60]])

In [20]:
albert_learner_ins.view_top_losses(preproc=albert_transformer)

----------
id:177 | loss:7.49 | true:sadness | pred:joy)

----------
id:1509 | loss:7.2 | true:joy | pred:fear)

----------
id:1500 | loss:7.08 | true:anger | pred:sadness)

----------
id:1801 | loss:6.57 | true:love | pred:sadness)



### Saving BERT Model:

In [21]:
albert_predictor = ktrain.get_predictor(albert_learner_ins.model, preproc=albert_transformer)
albert_predictor.get_classes()

['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']

In [22]:
albert_predictor.save('/content/albert-predictor-on-emotion')

In [23]:
!zip -r /content/albert-predictor-on-emotion /content/albert-predictor-on-emotion

  adding: content/albert-predictor-on-emotion/ (stored 0%)
  adding: content/albert-predictor-on-emotion/tf_model.preproc (deflated 47%)
  adding: content/albert-predictor-on-emotion/tokenizer_config.json (deflated 46%)
  adding: content/albert-predictor-on-emotion/spiece.model (deflated 49%)
  adding: content/albert-predictor-on-emotion/config.json (deflated 56%)
  adding: content/albert-predictor-on-emotion/special_tokens_map.json (deflated 46%)
  adding: content/albert-predictor-on-emotion/tf_model.h5 (deflated 7%)
  adding: content/albert-predictor-on-emotion/tokenizer.json (deflated 60%)
