### Check Hardware & RAM availability:
Commands to check for available GPU and RAM allocation on runtime

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Mon Oct 25 05:28:20 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### References:
* https://huggingface.co/
* https://arxiv.org/abs/1907.11692

### Install Required Libraries for Transformer Models:
* Pre-Trained Transformer models are part of Hugging Face Library(transformers).
* Similarly, any datatset part of Hugging Face can be called from the datasets library.
* Finally we will use a high level abstraction package called k-train to simplify our modelling and predictions

In [2]:
!pip install ktrain
!pip install transformers



### Import Libraries:

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ktrain
from ktrain import text
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import timeit
import warnings

pd.set_option('display.max_columns', None)
warnings.simplefilter(action="ignore")

In [4]:
tf.__version__

'2.6.0'

### Load Dataset:

In [5]:
dbpedia_14_train = pd.read_csv("/content/dbpedia_14_train.csv")
dbpedia_14_test = pd.read_csv("/content/dbpedia_14_test.csv")

### Dataset Information:

In [6]:
dbpedia_14_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 560000 entries, 0 to 559999
Data columns (total 2 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   Labels   560000 non-null  object
 1   Content  560000 non-null  object
dtypes: object(2)
memory usage: 8.5+ MB


In [7]:
dbpedia_14_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70000 entries, 0 to 69999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Labels   70000 non-null  object
 1   Content  70000 non-null  object
dtypes: object(2)
memory usage: 1.1+ MB


In [8]:
dbpedia_14_train.head()

Unnamed: 0,Labels,Content
0,Company,Abbott of Farnham E D Abbott Limited was a Br...
1,Company,Schwan-STABILO is a German maker of pens for ...
2,Company,Q-workshop is a Polish company located in Poz...
3,Company,Marvell Software Solutions Israel known as RA...
4,Company,Bergan Mercy Medical Center is a hospital loc...


In [9]:
dbpedia_14_test.head()

Unnamed: 0,Labels,Content
0,Company,TY KU /taɪkuː/ is an American alcoholic bever...
1,Company,OddLot Entertainment founded in 2001 by longt...
2,Company,Henkel AG & Company KGaA operates worldwide w...
3,Company,The GOAT Store (Games Of All Type Store) LLC ...
4,Company,RagWing Aircraft Designs (also called the Rag...


### Split Train & Validation data:

In [10]:
X_train = dbpedia_14_train[:]["Content"]
y_train = dbpedia_14_train[:]["Labels"]
X_test = dbpedia_14_test[:]["Content"]
y_test = dbpedia_14_test[:]["Labels"]

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(560000,) (560000,) (70000,) (70000,)


### Instantiating a AlBERT Instance:
Create a AlBERT instance with the model name, max token length, the labels to be used for each category and the batch size.

In [11]:
class_names_list = ['Company',
 'EducationalInstitution',
 'Artist',
 'Athlete',
 'OfficeHolder',
 'MeanOfTransportation',
 'Building',
 'NaturalPlace',
 'Village',
 'Animal',
 'Plant',
 'Album',
 'Film',
 'WrittenWork']

In [12]:
albert_transformer = text.Transformer('albert-base-v1', maxlen=512, classes=class_names_list, batch_size=6)

### Perform Data Preprocessing:

In [13]:
dbpedia_ont_train = albert_transformer.preprocess_train(X_train.to_list(), y_train.to_list())
dbpedia_ont_val = albert_transformer.preprocess_test(X_test.to_list(), y_test.to_list())

preprocessing train...
language: en
train sequence lengths:
	mean : 46
	95percentile : 80
	99percentile : 86


Is Multi-Label? False
preprocessing test...
language: en
test sequence lengths:
	mean : 46
	95percentile : 80
	99percentile : 86


### Compile AlBERT in a K-Train Learner Object:
Since we are using k-train as a high level abstration package, we need to wrap our model in a k-train Learner Object for further compuation

In [14]:
albert_model = albert_transformer.get_classifier()

In [15]:
albert_learner_ins = ktrain.get_learner(model=albert_model,
                            train_data=dbpedia_ont_train,
                            val_data=dbpedia_ont_val,
                            batch_size=6)

### AlBERT Model Summary:

In [16]:
albert_learner_ins.model.summary()

Model: "tf_albert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
albert (TFAlbertMainLayer)   multiple                  11683584  
_________________________________________________________________
dropout_4 (Dropout)          multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  10766     
Total params: 11,694,350
Trainable params: 11,694,350
Non-trainable params: 0
_________________________________________________________________


### AlBERT Optimal Learning Rates:¶
AlBERT follows Knowledge Distillation on BERT, hence we can use the established batch sizes and learning rates as used in BERT:

* Batch Sizes => {16, 32}
* Learning Rates => {1e−5, 2e−5, 3e−5}
We will choose the maximum among these for our fine-tuning and evaluation purposes.

### Fine Tuning AlBERT on Dbpedia Ontology Dataset:
We take our Dbpedia Ontology dataset along with the AlBERT model we created, define the learning-rate & epochs to be used and start fine-tuning.

In [17]:
albert_fine_tuning_start= timeit.default_timer()
albert_learner_ins.fit_onecycle(lr=2e-5, epochs=1)
albert_fine_tuning_stop = timeit.default_timer()



begin training using onecycle policy with max lr of 2e-05...


In [18]:
print("\nFine-Tuning time for AlBERT on Dbpedia Ontology dataset: \n", (albert_fine_tuning_stop - albert_fine_tuning_start)/60, " min")


Fine-Tuning time for AlBERT on Dbpedia Ontology dataset: 
 711.36180129835  min


### Checking AlBERT performance metrics:

In [19]:
albert_validation_start= timeit.default_timer()
albert_learner_ins.validate()
albert_validation_stop= timeit.default_timer()

              precision    recall  f1-score   support

           0       0.99      1.00      1.00      5000
           1       1.00      1.00      1.00      5000
           2       0.99      0.99      0.99      5000
           3       1.00      1.00      1.00      5000
           4       0.98      0.98      0.98      5000
           5       0.98      0.97      0.98      5000
           6       0.99      0.99      0.99      5000
           7       1.00      0.99      0.99      5000
           8       0.99      1.00      0.99      5000
           9       1.00      1.00      1.00      5000
          10       0.99      0.99      0.99      5000
          11       1.00      1.00      1.00      5000
          12       1.00      1.00      1.00      5000
          13       0.99      0.99      0.99      5000

    accuracy                           0.99     70000
   macro avg       0.99      0.99      0.99     70000
weighted avg       0.99      0.99      0.99     70000



In [20]:
print("\nInference time for AlBERT on Dbpedia Ontology dataset: \n", (albert_validation_stop - albert_validation_start), " sec")


Inference time for AlBERT on Dbpedia Ontology dataset: 
 1457.0013864950015  sec


In [21]:
albert_learner_ins.validate(class_names=class_names_list)

                        precision    recall  f1-score   support

               Company       0.99      1.00      1.00      5000
EducationalInstitution       1.00      1.00      1.00      5000
                Artist       0.99      0.99      0.99      5000
               Athlete       1.00      1.00      1.00      5000
          OfficeHolder       0.98      0.98      0.98      5000
  MeanOfTransportation       0.98      0.97      0.98      5000
              Building       0.99      0.99      0.99      5000
          NaturalPlace       1.00      0.99      0.99      5000
               Village       0.99      1.00      0.99      5000
                Animal       1.00      1.00      1.00      5000
                 Plant       0.99      0.99      0.99      5000
                 Album       1.00      1.00      1.00      5000
                  Film       1.00      1.00      1.00      5000
           WrittenWork       0.99      0.99      0.99      5000

              accuracy                

array([[4983,    0,    6,    1,    0,    0,    0,    7,    0,    0,    0,
           0,    0,    3],
       [   0, 4990,    0,    0,    0,    1,    0,    0,    0,    0,    0,
           8,    0,    1],
       [   3,    0, 4932,    7,    0,    2,    0,    0,    0,    0,   53,
           0,    0,    3],
       [   1,    0,    2, 4990,    0,    0,    0,    0,    0,    0,    7,
           0,    0,    0],
       [   0,    0,    1,    0, 4918,   29,   23,    0,   10,   13,    0,
           0,    5,    1],
       [   3,    2,    9,    1,   42, 4868,   36,    0,   16,    1,    2,
           0,    1,   19],
       [   0,    0,    0,    0,   17,   29, 4947,    0,    0,    0,    5,
           0,    1,    1],
       [  15,    0,    1,    0,    0,    5,    0, 4959,    1,    0,    0,
           0,    0,   19],
       [   0,    0,    0,    0,    3,   16,    0,    1, 4977,    2,    0,
           0,    0,    1],
       [   0,    0,    0,    0,    9,    1,    0,    0,    0, 4982,    0,
           0,    

In [22]:
albert_learner_ins.view_top_losses(preproc=albert_transformer)

----------
id:10373 | loss:9.57 | true:Artist | pred:WrittenWork)

----------
id:2594 | loss:9.28 | true:Company | pred:Album)

----------
id:21130 | loss:9.25 | true:OfficeHolder | pred:Building)

----------
id:19179 | loss:9.23 | true:Athlete | pred:OfficeHolder)



### Saving AlBERT Model:

In [23]:
albert_predictor = ktrain.get_predictor(albert_learner_ins.model, preproc=albert_transformer)
albert_predictor.get_classes()

['Album',
 'Animal',
 'Artist',
 'Athlete',
 'Building',
 'Company',
 'EducationalInstitution',
 'Film',
 'MeanOfTransportation',
 'NaturalPlace',
 'OfficeHolder',
 'Plant',
 'Village',
 'WrittenWork']

In [24]:
albert_predictor.save('/content/albert-predictor-on-dbpedia')

In [25]:
!zip -r /content/albert-predictor-on-dbpedia /content/albert-predictor-on-dbpedia

  adding: content/albert-predictor-on-dbpedia/ (stored 0%)
  adding: content/albert-predictor-on-dbpedia/special_tokens_map.json (deflated 46%)
  adding: content/albert-predictor-on-dbpedia/config.json (deflated 61%)
  adding: content/albert-predictor-on-dbpedia/tokenizer_config.json (deflated 46%)
  adding: content/albert-predictor-on-dbpedia/tf_model.h5 (deflated 7%)
  adding: content/albert-predictor-on-dbpedia/spiece.model (deflated 49%)
  adding: content/albert-predictor-on-dbpedia/tf_model.preproc (deflated 56%)
  adding: content/albert-predictor-on-dbpedia/tokenizer.json (deflated 60%)
