## BERT baselines

Here, we compute baseline performances for DistilBert and Bert. <br>
ATM, we do it throught ```test_only``` use of the trainer class, but it would be way more elegant to introduce a pure ```Tester``` class...

In [1]:
from reddit.utils import (load_tfrecord, pad_and_stack,
                          split_dataset)
from reddit.models import BatchTransformer
from reddit.losses import TripletLossBase
from reddit.training import Trainer
from transformers import TFDistilBertModel, TFBertModel
import glob
from pathlib import Path
import tensorflow as tf
from official.nlp.optimization import create_optimizer

In [2]:
METRICS_PATH = Path('..') / 'logs' / 'baselines'
METRICS_PATH.mkdir(parents=True, exist_ok=True)

### Strategy

In [3]:
gpus = tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(gpus))

Num GPUs Available:  4


In [4]:
try:
    tf.config.experimental.set_visible_devices(gpus[:2], 'GPU')
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
    print(e)

4 Physical GPUs, 2 Logical GPU


In [5]:
strategy = tf.distribute.MirroredStrategy(devices=logical_gpus)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')


### Dataset
Load dataset, pad to desired length, batch and distribute

In [6]:
ds_params = {'n_anchor': 10,
             'n_pos': 1,
             'n_neg': 1,
             'batch_size': 2}

In [7]:
fs = glob.glob('../reddit/data/datasets/triplet/1pos_1neg_random/*')
fs = fs[10:60]
ds = load_tfrecord(fs)
ds = pad_and_stack(ds, pad_to=[ds_params['n_anchor'], 
                               ds_params['n_pos'],
                               ds_params['n_neg']]).batch(ds_params['batch_size'], drop_remainder=True)

In [8]:
n_ex = 0
for i in ds:
    n_ex += 1

In [8]:
n_ex = 248802

### Initialize parameters for Trainer

In [10]:
train_params = {'weights': 'distilbert-base-uncased',
                'model': TFDistilBertModel,
                'loss_margin': 1,
                'test_steps': n_ex,
                'test_vars': ['test_losses', 'test_metrics',
                              'test_dist_pos', 'test_dist_neg',
                              'test_dist_anchor']}

### Initialize optimizer, model, loss, and trainer object

This is a hacky way to avoid TF yielding OOM error

In [11]:
%%capture
from transformers import DistilBertModel
DistilBertModel.from_pretrained('distilbert-base-uncased')

This is the actual initialization of all we need for training

In [12]:
%%capture
with strategy.scope():
    model = BatchTransformer(train_params['model'], 
                             train_params['weights'])
    loss = TripletLossBase(train_params['loss_margin'],
                           n_pos=ds_params['n_pos'],
                           n_neg=ds_params['n_neg'])

Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertModel: ['vocab_projector', 'vocab_transform', 'activation_13', 'vocab_layer_norm']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


In [16]:
trainer = Trainer(model=model,
                  loss_object=loss,
                  strategy=strategy, 
                  steps_per_epoch = train_params['test_steps'],
                  test_steps=train_params['test_steps'],
                  test_vars=train_params['test_vars'], 
                  log_path=str(METRICS_PATH),
                  distributed=True)

## Pre-trained DistilBert baseline

In [17]:
trainer.train(dataset_test=ds, 
              test_only=True)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Mean test loss: 1.7364810705184937; Mean test metric: 0.7353317141532898


## Pretrained BERT baseline

In [18]:
train_params['model'] = TFBertModel
train_params['weights'] = 'bert-base-uncased'
with strategy.scope():
    model = BatchTransformer(train_params['model'], 
                             train_params['weights'])

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [19]:
trainer_bert = Trainer(model=model,
                       loss_object=loss,
                       strategy=strategy, 
                       steps_per_epoch = train_params['test_steps'],
                       test_steps=train_params['test_steps'],
                       test_vars=train_params['test_vars'], 
                       log_path=str(METRICS_PATH),
                       distributed=True)

In [20]:
trainer_bert.train(dataset_test=ds, 
                   test_only=True)

Mean test loss: 5.481191158294678; Mean test metric: 0.7073476314544678
