In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf

import coral_ordinal as coral
NUM_CLASSES = 3

# !conda install emoji==0.6.0
# !conda install --upgrade transformers
# !conda install --upgrade datasets

In [2]:
tf.keras.backend.clear_session()

physical_devices = tf.config.list_physical_devices('GPU') 
for device in physical_devices:
    tf.config.experimental.set_memory_growth(device, True)

The Hugging Face transformers library...

Let's start by importing Distilbert, the base pre-trained transformer model that Hugging Face recommends for sentiment analysis.
This binary classification model evaluates a line of text and returns the predicted sentiment label and a score assessing the
model's confidence. We can try this model out without any fine-tuning using the pipeline function by passing a few famous movie
lines as examples.

In [3]:
from transformers import pipeline

cf_default = pipeline("sentiment-analysis")
cf_default("Today, I consider myself the luckiest man on the face of the earth.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9997339844703674}]

In [4]:
cf_default("I'm as mad as hell, and I'm not going to take this anymore!")

[{'label': 'NEGATIVE', 'score': 0.9993259906768799}]

The tweet_eval dataset available on Hugging Face consists of several labeled datasets of English language
tweets for various classification tasks. The 'sentiment' sub-dataset contains tweets with one of three labels
(indicating positive, negative, and neutral sentiment) and is divided into training, validation, and test splits.

The Hugging Face datasets library's load_dataset() function provides a straightforward way to import
the tweet_eval sentiment labeled dataset. Each of the splits will be typed as an instance of the simply named Dataset
class, while the full sentiment data will be a DatasetDict including all of the splits.

In [5]:
from datasets import load_dataset

dataset = load_dataset("tweet_eval", "sentiment")

Reusing dataset tweet_eval (C:\Users\Kaya\.cache\huggingface\datasets\tweet_eval\sentiment\1.1.0\12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


  0%|          | 0/3 [00:00<?, ?it/s]

In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 45615
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 12284
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [7]:
dataset['train'].features

{'text': Value(dtype='string', id=None),
 'label': ClassLabel(num_classes=3, names=['negative', 'neutral', 'positive'], id=None)}

In [8]:
dataset['train'][50]

{'text': 'Thanks manager for putting me on the schedule for Sunday"',
 'label': 0}

Because the tweet_eval sentiment data has three label classes, the default binary classifier isn't a readily
meaningful comparison point for assessing baseline performance without retraining. Instead we'll compile a three-way
classifier starting with BERTweet, a refinement of BERT pre-trained on a corpus of twitter data. We can then re-train
and evaluate BERTweet using the tweet_eval sentiment data that we've previously loaded. 

We'll start by using AutoTokenizer to load a tokenizer specifically suggested for use with BERTweet.

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [10]:
print(tokenizer("The quick brown fox jumps over the lazy dog."))

{'input_ids': [0, 47, 1600, 3345, 9646, 13545, 141, 6, 2307, 10638, 4, 2], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


Next we'll tokenize the dataset. Each tokenized split will be a Dataset object.

In [11]:
def tokenize_fn(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(tokenize_fn, batched=True)



  0%|          | 0/46 [00:00<?, ?ba/s]

  0%|          | 0/13 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

In [12]:
tokenized_dataset['train']

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 45615
})

In [13]:
tokenized_dataset['validation']

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 2000
})

Now we convert the tokenized data to a TensorFlow compatible format.

In [14]:
from transformers.data.data_collator import tf_default_data_collator

data_collator = tf_default_data_collator

In [15]:
import tensorflow as tf
import datasets

tf_train_dataset = tokenized_dataset['train'].to_tf_dataset(
    columns=["attention_mask", "input_ids", "token_type_ids"],
    label_cols=["label"],
    shuffle=True,
    collate_fn=data_collator,
    batch_size=8,
)

tf_validation_dataset = tokenized_dataset['validation'].to_tf_dataset(
    columns=["attention_mask", "input_ids", "token_type_ids"],
    label_cols=["label"],
    shuffle=False,
    collate_fn=data_collator,
    batch_size=8,
)

Here we set parameters and create the optimizer.

In [16]:
from transformers import create_optimizer
import tensorflow as tf

batch_size = 8
num_epochs = 3
batches_per_epoch = len(tokenized_dataset["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer(init_lr=5e-6, num_warmup_steps=0, num_train_steps=total_train_steps)

Now we'll use the transformers library's auto model functionality to create a TensorFlow model object appropriate
for three-way classification.

In [17]:
from transformers import TFAutoModelForSequenceClassification

bertweet = TFAutoModelForSequenceClassification.from_pretrained("vinai/bertweet-base", num_labels=3)

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Finally it's time to compile and train!

In [18]:
bertweet.compile(
    optimizer = optimizer,
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics = [tf.metrics.SparseCategoricalAccuracy(), coral.MeanAbsoluteErrorLabels()],
)

bertweet.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=num_epochs)

Epoch 1/3
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x166a812e9a0>

Judging from the validation loss figures, three epochs is a reasonable choice before the model starts to overfit.
Now let's convert the tokenized test split to be compatible with TensorFlow and then evaluate the model.

In [20]:
tf_test_dataset = tokenized_dataset['test'].to_tf_dataset(
    columns=["attention_mask", "input_ids", "token_type_ids"],
    label_cols=["label"],
    shuffle=False,
    collate_fn=data_collator,
    batch_size=8,
)

In [21]:
bertweet.evaluate(tf_test_dataset)



[0.6388965249061584, 0.718984067440033, 0.92236328125]

The bertweet model achieves 0.7190 accuracy on the test split, not too bad of a stepdown from the performance on the 
training and validation splits, and comparable with other BERT-based sentiment classification models. We can try training
for another epoch and see what effect that might have on performance.

In [22]:
bertweet.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x166eb7ceb20>

In [23]:
bertweet.evaluate(tf_test_dataset)



[0.6388965249061584, 0.718984067440033, 0.92236328125]

There was no improvement in performance after the additional training epoch.

In [24]:
bertweet.save_pretrained("./models/bertweet_simple/")

In [26]:
from transformers import AutoModel

bertweet_load_test = AutoModel.from_pretrained("./models/bertweet_simple/", from_tf=True)

All TF 2.0 model weights were used when initializing RobertaModel.

All the weights of RobertaModel were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaModel for predictions without further training.
