# Test an ecoder-only model built for a classification task

__Objective:__ test the implementation of an encoder-only model with a classification head (a dense layer).

In [None]:
import sys
from transformers import AutoConfig, AutoTokenizer
import tensorflow as tf
from tensorflow.keras import Input, Model

sys.path.append('../modules/')

from encoder_text_classifier import TransformerForSequenceClassification

%load_ext autoreload
%autoreload 2

## Config

Load config for the model (in this case we refer to a pretrained model just to get the values for the hyperparameters).

In [None]:
model_ckpt = 'distilbert-base-uncased'

config = AutoConfig.from_pretrained(model_ckpt)

## Tokenizer

Instantiate a tokenizer.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

In [None]:
text = [
    "Can't hear what he's saying",
    "when he's talking in his sleep",
    "He finally found the sound",
    "but he's in too deep"
]

token_ids = tokenizer(
    text,
    padding=True,
    return_tensors='tf'
)['input_ids']

token_ids

Check converting IDs back to tokens.

In [None]:
tokenizer.convert_ids_to_tokens(token_ids[0])

## Test the forward pass of the classifier

Instantiate the classifier.

In [None]:
# Add missing parameters (ormodify some) in the config.
config.hidden_dropout_prob = 0.1
config.num_labels = 3

encoder_classifier = TransformerForSequenceClassification(config=config)

Test the forward pass of the classifier.

__Note:__ because of how it's implemented, for each sample in the batch the model returns the unnormalize logits for each possible class (we should apply softmax to get probabilities over the classes.

In [None]:
encoder_classifier(token_ids)

## Build a model from the classifier layer

The `TransformerForSequenceClassification` is a subclass of Keras' `Layer`, so it's a layer object, not a model. If we want to fit the model we have to build a Keras model from it, which can be done with the functional API specifying inputs (paceholder with the right shape) and outputs.

__Notes:__
- Keras' `Input` object needs to be passed a shape that __does not include the batch shape__.
- The loss function is chosen just for code testing purposes, for it to make sense we should map the logits to the probabilities (with `softmax`) and then use categorical cross-entropy. Also, we are generating fake targets.

In [None]:
inputs = Input(
    shape=(token_ids.shape[-1],)
)

outputs = encoder_classifier(inputs)

model = Model(
    inputs=inputs,
    outputs=outputs
)

In [None]:
model.compile(
    optimizer="rmsprop",
    loss='mse',
)

In [None]:
print('Number of parameters in the model:', model.count_params())

Generate fake targets and fit the model.

In [None]:
fake_targets = tf.ones_like(model(token_ids))

model.fit(
    x=token_ids,
    y=fake_targets,
    epochs=1
)

In [None]:
model(token_ids)