# **TESTING FOR TRANSFORMERS**

The Transformers unit tests are divided into 2 separate documents. The reason for this is that the storage capacity of Google Colab does not allow (without payment options) to run the code for all pre-trained models.
Therefore, it has been decided to split the pre-trained models:
- **Part 1**: BERT / BERT LARGE / ROBERTA / DISTILBERT
- **Part 2**: ALBERT / ALBERT XXLARGE / DEBERTA

### IMPORT PACKAGES

In [None]:
!pip install transformers
!pip install simpletransformers #specific for ALBERT transformer
import transformers
from transformers import TFAutoModel
import pandas as pd
from nltk.stem.wordnet import WordNetLemmatizer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.utils.np_utils import to_categorical
from keras.layers import Input, Dropout, Dense, Bidirectional, LSTM, GRU
from keras import Model, optimizers, callbacks
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import pydotplus
import keras
import re
import nltk
nltk.download('omw-1.4')
nltk.download('stopwords')
nltk.download('wordnet')
from nltk.corpus import stopwords  # stopwords
from nltk import word_tokenize, sent_tokenize  # tokenizing
from nltk.stem import PorterStemmer, LancasterStemmer  # using the Porter Stemmer and Lancaster Stemmer and others
from nltk.stem.snowball import SnowballStemmer
from nltk.stem import WordNetLemmatizer  # lammatizer from WordNet
from google.colab import files

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m52.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m64.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1
Looking in indexes: https://pypi.org/simple, https://us

[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


## DOWNLOAD PRETRAINED MODELS FROM HUGGINGFACE
For each model:

* Tokenizer
* Pretrained model



In [None]:
pre_models = {
    "BERT":( transformers.BertTokenizer.from_pretrained('bert-base-uncased'), 
            TFAutoModel.from_pretrained('bert-base-uncased', from_pt=True)
            ),
    "BERT LARGE": ( transformers.BertTokenizer.from_pretrained('bert-large-cased'),
                   TFAutoModel.from_pretrained('bert-large-cased', from_pt=True)
                   ),
    "ROBERTA": ( transformers.RobertaTokenizer.from_pretrained('roberta-large'),
                TFAutoModel.from_pretrained('roberta-large', from_pt=True)
                ),
    "DISTILBERT": ( transformers.DistilBertTokenizer.from_pretrained('distilbert-base-uncased'),
                   TFAutoModel.from_pretrained('distilbert-base-uncased', from_pt=True)
                   )
}

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing TFRobertaModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFRobertaModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaModel for predictions without further training.


Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertModel: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFDistilBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


In [None]:
def preprocessing(text):
    text = text.lower()
    text = re.sub(r'[^0-9a-z_+\-*]', ' ', text).strip()
    lemm = WordNetLemmatizer()
    title = []
    for token in text.split():
        title.append(token)
    return ' '.join(str(elem) for elem in title)

In [None]:
def encode(text, tokenizer, max_len):
    code = tokenizer.batch_encode_plus(batch_text_or_text_pairs=text,
                                       max_length=max_len,
                                       return_tensors='pt',
                                       return_token_type_ids=False,
                                       truncation=True,
                                       padding='max_length')
    return np.array(code['input_ids'])

## IMPORT DATA FROM FILES
- Sample provided is splitted into train and test by a default value of 25%

In [None]:
uploaded = files.upload()
data = pd.read_excel('/content/sample_400_Data&Headers.xlsx')
train_data, test_data, train_label, test_label = train_test_split(data['headline'],data['labels'] ,random_state=104,test_size=0.25, shuffle=True)

Saving sample_400_Data&Headers.xlsx to sample_400_Data&Headers.xlsx


## DATA PREPARATION

In [None]:
train_data = train_data.apply(lambda x: preprocessing(str(x)).lower()) 
test_data = test_data.apply(lambda x: preprocessing(str(x)).lower())

In [None]:
max_len = 20

In [None]:
label_encoder = LabelEncoder()
num_label = train_label.nunique()
train_label_transformed = label_encoder.fit_transform(train_label)
test_label_transformed = label_encoder.fit_transform(test_label)
train_label_encoded = to_categorical(train_label_transformed, num_classes=num_label, dtype='int32')
test_label_encoded = to_categorical(test_label_transformed, num_classes=num_label, dtype='int32')

In [None]:
def build_model(transformer, loss='categorical_crossentropy', max_len=max_len):
    input_word_ids = Input(shape=(max_len,), dtype='int32', name="input_word_ids")
    print(input_word_ids)
    sequence_output = transformer(input_word_ids)[0]
    print(sequence_output)
    drop = Dropout(0.5, name='Dropout_1')(sequence_output)
    lstm = Bidirectional(LSTM(100, name='LSTM'))(drop)
    layer = Dropout(0.5, name='Dropout_2')(lstm)
    out = Dense(num_label, name='Dense')(layer)

    model = Model(inputs=input_word_ids, outputs=out)
    model.compile(optimizer=tf.optimizers.Adam(), loss=loss, metrics=['accuracy'])
    return model

In [None]:
BATCH_SIZE = 32
num_epochs = 4

In [None]:
def create_datasets(train_data_encoded, train_label_encoded, test_data_encoded):
  AUTO = tf.data.experimental.AUTOTUNE
  train_dataset = (
      tf.data.Dataset
      .from_tensor_slices((train_data_encoded, tf.convert_to_tensor(train_label_encoded)))
      .repeat()
      .shuffle(2048)
      .batch(BATCH_SIZE)
      .prefetch(AUTO)
  )
  test_dataset = (
      tf.data.Dataset
      .from_tensor_slices(test_data_encoded)
      .batch(BATCH_SIZE)
  )
  n_steps = train_data.shape[0] // BATCH_SIZE
  return train_dataset, test_dataset, n_steps

##Setting test parameters & executing test

In [None]:
import unittest
class Test(unittest.TestCase):
  def __init__(self, testName, output):
    super(Test, self).__init__(testName) 
    self.output = output

  def test_type_prediction(self):
    error_message = "Output of model is not a dinamic array"
    self.assertEqual(self.output, np.ndarray, error_message)
  
  def test_length_predictions(self):
    error_message = "The length of the prediction is not the expected"
    self.assertEqual(self.output, 100, error_message)
  
  def test_ANN_structure(self):
    layers = self.output.layers
    layer_structure = list()
    for layer in layers:
      if str(type(layer)).__contains__('transformers.models'):
        layer_structure.append('transformers.models')
      else:
        layer_structure.append(type(layer))

    lista_layers = [
        keras.engine.input_layer.InputLayer,
        'transformers.models',
        keras.layers.regularization.dropout.Dropout,
        keras.layers.rnn.bidirectional.Bidirectional,
        keras.layers.regularization.dropout.Dropout,
        keras.layers.core.dense.Dense
    ]
    error_message = "The ANN structure is not the expected"
    self.assertEqual(layer_structure, lista_layers, error_message)

In [None]:
output = []
models = []
for pre_model in pre_models:
  tokenizer = pre_models[pre_model][0]
  transformer = pre_models[pre_model][1]
  model = build_model(transformer, max_len=max_len)
  train_data_encoded = encode(train_data.astype('str'), tokenizer, max_len=max_len)
  test_data_encoded = encode(test_data.astype('str'), tokenizer, max_len=max_len)
  train_dataset, test_dataset, n_steps = create_datasets(train_data_encoded, train_label_encoded, test_data_encoded)

  model.fit(train_dataset, steps_per_epoch=n_steps, epochs=num_epochs)
  output.append((type(model.predict(test_data_encoded)), len(model.predict(test_data_encoded))))
  models.append(model)

KerasTensor(type_spec=TensorSpec(shape=(None, 20), dtype=tf.int32, name='input_word_ids'), name='input_word_ids', description="created by layer 'input_word_ids'")
KerasTensor(type_spec=TensorSpec(shape=(None, 20, 768), dtype=tf.float32, name=None), name='tf_bert_model/bert/encoder/layer_._11/output/LayerNorm/batchnorm/add_1:0', description="created by layer 'tf_bert_model'")
Epoch 1/4




Epoch 2/4
Epoch 3/4
Epoch 4/4
KerasTensor(type_spec=TensorSpec(shape=(None, 20), dtype=tf.int32, name='input_word_ids'), name='input_word_ids', description="created by layer 'input_word_ids'")
KerasTensor(type_spec=TensorSpec(shape=(None, 20, 1024), dtype=tf.float32, name=None), name='tf_bert_model_1/bert/encoder/layer_._23/output/LayerNorm/batchnorm/add_1:0', description="created by layer 'tf_bert_model_1'")
Epoch 1/4




Epoch 2/4
Epoch 3/4
Epoch 4/4
KerasTensor(type_spec=TensorSpec(shape=(None, 20), dtype=tf.int32, name='input_word_ids'), name='input_word_ids', description="created by layer 'input_word_ids'")
KerasTensor(type_spec=TensorSpec(shape=(None, 20, 1024), dtype=tf.float32, name=None), name='tf_roberta_model/roberta/encoder/layer_._23/output/LayerNorm/batchnorm/add_1:0', description="created by layer 'tf_roberta_model'")
Epoch 1/4




Epoch 2/4
Epoch 3/4
Epoch 4/4
KerasTensor(type_spec=TensorSpec(shape=(None, 20), dtype=tf.int32, name='input_word_ids'), name='input_word_ids', description="created by layer 'input_word_ids'")
KerasTensor(type_spec=TensorSpec(shape=(None, 20, 768), dtype=tf.float32, name=None), name='tf_distil_bert_model/distilbert/transformer/layer_._5/output_layer_norm/batchnorm/add_1:0', description="created by layer 'tf_distil_bert_model'")
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [None]:
suite = unittest.TestSuite()
output_list = output
model_list = models
for i, model in enumerate(pre_models):
  (output_type, output_len) = output_list[i]
  suite.addTest(Test('test_type_prediction', output_type))
  suite.addTest(Test('test_length_predictions', output_len))
  suite.addTest(Test('test_ANN_structure', model_list[i]))
unittest.TextTestRunner(verbosity=3).run(suite)

test_type_prediction (__main__.Test) ... ok
test_length_predictions (__main__.Test) ... ok
test_ANN_structure (__main__.Test) ... ok
test_type_prediction (__main__.Test) ... ok
test_length_predictions (__main__.Test) ... ok
test_ANN_structure (__main__.Test) ... ok
test_type_prediction (__main__.Test) ... ok
test_length_predictions (__main__.Test) ... ok
test_ANN_structure (__main__.Test) ... ok
test_type_prediction (__main__.Test) ... ok
test_length_predictions (__main__.Test) ... ok
test_ANN_structure (__main__.Test) ... ok

----------------------------------------------------------------------
Ran 12 tests in 0.059s

OK


<unittest.runner.TextTestResult run=12 errors=0 failures=0>