# CS 39AA - Notebook J: Transformers for Airline Tweets

We'll now revisit the Airline Tweet dataset and try using one of the large pre-trained models that is available on huggingface.co. 

Note that this is roughly the same as [Assign 5 Starter on Kaggle](https://www.kaggle.com/code/steve5438/assign-5-starter) except that now we are using [Weights and Biases](https://wandb.ai) to track the training and performance of the model, and to automatically experiment with various hyperparameter values. 

In [1]:
import torch
import torch.nn.functional as F
import numpy as np
import pandas as pd
import os

from transformers import AutoTokenizer, AutoModelForSequenceClassification,  TrainingArguments, Trainer
from datasets import Dataset, load_metric

NOTE: Redirects are currently not supported in Windows or MacOs.


In [2]:
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mgeinitz[0m ([33mmsudenver[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

In [3]:
df = pd.read_csv("./data/trainA.csv")
df.head()

Unnamed: 0,sentiment,text
0,positive,@JetBlue @JayVig I like the inflight snacks! I...
1,positive,@VirginAmerica thanks guys! Sweet route over t...
2,negative,@USAirways Your exchange/credit policies are w...
3,negative,@USAirways but in the meantime I'll be sleepin...
4,negative,@VirginAmerica hold times at call center are a...


In [None]:
#device = 'mps' if torch.backends.mps.is_available() else 'cpu'
#print(f"device: {device}")

In [4]:
MODEL_NAME = "bert-base-cased"
MAX_LENGTH=50

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
#model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=3, max_length=MAX_LENGTH, output_attentions=False, output_hidden_states=False)

In [5]:
classes = df.sentiment.unique().tolist()
class_tok2idx = dict((v, k) for k, v in enumerate(classes))
class_idx2tok = dict((k, v) for k, v in enumerate(classes))
print(class_tok2idx)
print(class_idx2tok)

{'positive': 0, 'negative': 1, 'neutral': 2}
{0: 'positive', 1: 'negative', 2: 'neutral'}


In [6]:
def model_init(model_name=MODEL_NAME):
    bert_model = AutoModelForSequenceClassification.from_pretrained(
        pretrained_model_name_or_path=model_name,
        num_labels=3,
        max_length=MAX_LENGTH, 
        output_attentions=False, 
        output_hidden_states=False
    )
    return bert_model


In [7]:
df['label'] = df['sentiment'].apply(lambda x: class_tok2idx[x])
df.head()

Unnamed: 0,sentiment,text,label
0,positive,@JetBlue @JayVig I like the inflight snacks! I...,0
1,positive,@VirginAmerica thanks guys! Sweet route over t...,0
2,negative,@USAirways Your exchange/credit policies are w...,1
3,negative,@USAirways but in the meantime I'll be sleepin...,1
4,negative,@VirginAmerica hold times at call center are a...,1


In [8]:
model = model_init()

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

In [9]:
sequence_0 = "@united I will never fly with you again. Period."
seq0_tokens = tokenizer(sequence_0, return_tensors="pt")
print(f"number of tokens in seq0 is {len(seq0_tokens['input_ids'].flatten())}")
print(seq0_tokens)
torch.round(F.softmax(model(**seq0_tokens).logits, -1), decimals=3)

number of tokens in seq0 is 14
{'input_ids': tensor([[  101,   137, 10280,   146,  1209,  1309,  4689,  1114,  1128,  1254,
           119, 16477,   119,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


tensor([[0.4130, 0.2620, 0.3260]], grad_fn=<RoundBackward1>)

In [10]:
sequence_1 = "@united Delayed flight, but you did the best you could. Thank you united crew."
seq1_tokens = tokenizer(sequence_1, return_tensors="pt")
print(f"number of tokens in seq1 is {len(seq1_tokens['input_ids'].flatten())}")
torch.round(F.softmax(model(**seq1_tokens).logits, -1), decimals=3)

number of tokens in seq1 is 22


tensor([[0.4060, 0.2690, 0.3250]], grad_fn=<RoundBackward1>)

In [11]:
ds_raw = Dataset.from_pandas(df[['label','text']])
ds_raw[0]

{'label': 0,
 'text': "@JetBlue @JayVig I like the inflight snacks! I'm flying with you guys on 2/28! #JVMChat"}

In [12]:
os.environ["TOKENIZERS_PARALLELISM"] = "true"

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=MAX_LENGTH)

ds = ds_raw.map(tokenize_function, batched=True)

  0%|          | 0/10 [00:00<?, ?ba/s]

In [13]:
ds[0]['text']

"@JetBlue @JayVig I like the inflight snacks! I'm flying with you guys on 2/28! #JVMChat"

In [14]:
ds = ds.shuffle(seed=42)
ds[0]['text']

'@AmericanAir 11 out of 11 delayed flights, you suck and getting worse'

In [15]:
train_prop = 0.85
ds_train = ds.select(range(int(len(ds)*train_prop)))
ds_eval = ds.select(range(int(len(ds)*train_prop), len(ds)))
print(f"len(ds_train) = {len(ds_train)}")
print(f"len(ds_eval) = {len(ds_eval)}")

len(ds_train) = 8500
len(ds_eval) = 1500


In [18]:
os.environ["WANDB_DISABLED"] = "false"
os.environ["WANDB_LOG_MODEL"] = "true"
os.environ["WANDB_PROJECT"] = "bert_airline_tweets_local"
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

In [19]:
# define sweep config: 

# method
sweep_config = {
    'method': 'random'
}


# hyperparameters
parameters_dict = {
    'epochs': {
        'value': 3
        },
    'batch_size': {
        'values': [8, 16, 32, 64]
        },
    'learning_rate': {
        'distribution': 'log_uniform_values',
        'min': 1e-5,
        'max': 1e-3
    },
    'weight_decay': {
        'values': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
    },
}


sweep_config['parameters'] = parameters_dict

In [20]:
sweep_id = wandb.sweep(sweep_config, project=os.environ["WANDB_PROJECT"])

Create sweep with ID: bm1pzv7x
Sweep URL: https://wandb.ai/msudenver/bert_airline_tweets_local/sweeps/bm1pzv7x


In [None]:
#model.to(device)

In [21]:
def compute_metrics(eval_pred):
    metrics = dict()

    accuracy_metric = load_metric('accuracy')
    precision_metric = load_metric('precision')
    recall_metric = load_metric('recall')
    f1_metric = load_metric('f1')

    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    
    metrics.update(accuracy_metric.compute(predictions=preds, references=labels))
    metrics.update(precision_metric.compute(predictions=preds, references=labels, average='weighted'))
    metrics.update(recall_metric.compute(predictions=preds, references=labels, average='weighted'))
    metrics.update(f1_metric.compute(predictions=preds, references=labels, average='weighted'))
    
    return metrics


In [22]:
def train(config=None):
    with wandb.init(config=config):
        # set sweep configuration
        config = wandb.config

        # set training arguments
        training_args = TrainingArguments(
            output_dir='/Users/steve/models/bert4airlinetweets/sweeps',
            report_to='wandb',  # Turn on Weights & Biases logging
            num_train_epochs=config.epochs,
            learning_rate=config.learning_rate,
            weight_decay=config.weight_decay,
            per_device_train_batch_size=config.batch_size,
            per_device_eval_batch_size=16,
            save_strategy='epoch', #'steps',
            evaluation_strategy='epoch', #'steps',
            #eval_steps=500,
            logging_strategy='epoch',
            load_best_model_at_end=True,
            remove_unused_columns=True,
            use_mps_device=True
        )


        # define training loop
        trainer = Trainer(
            model_init=model_init,
            args=training_args,
            train_dataset=ds_train,
            eval_dataset=ds_eval,
            compute_metrics=compute_metrics
        )


        # start training loop
        trainer.train()

In [23]:
wandb.agent(sweep_id, train, count=4)

[34m[1mwandb[0m: Agent Starting Run: q7fo1i4b with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 3.6370797403092096e-05
[34m[1mwandb[0m: 	weight_decay: 0.1


loading configuration file config.json from cache at /Users/steve/.cache/huggingface/hub/models--bert-base-cased/snapshots/a8d257ba9925ef39f3036bfc338acf5283c512d9/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_length": 50,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 

Epoch,Training Loss,Validation Loss


VBox(children=(Label(value='0.001 MB of 0.006 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.159531…

[34m[1mwandb[0m: Agent Starting Run: 1blulqy1 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 1.6265744363427824e-05
[34m[1mwandb[0m: 	weight_decay: 0.2


PyTorch: setting up devices
loading configuration file config.json from cache at /Users/steve/.cache/huggingface/hub/models--bert-base-cased/snapshots/a8d257ba9925ef39f3036bfc338acf5283c512d9/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_length": 50,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_ca

Epoch,Training Loss,Validation Loss


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

[34m[1mwandb[0m: Agent Starting Run: zfesg5al with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 0.0006092068459267166
[34m[1mwandb[0m: 	weight_decay: 0.4


PyTorch: setting up devices
loading configuration file config.json from cache at /Users/steve/.cache/huggingface/hub/models--bert-base-cased/snapshots/a8d257ba9925ef39f3036bfc338acf5283c512d9/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_length": 50,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_ca

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9183,0.870028,0.665333,0.442668,0.665333,0.531627


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1500
  Batch size = 16
  accuracy_metric = load_metric('accuracy')


Downloading builder script:   0%|          | 0.00/2.58k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.52k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-266
Configuration saved in /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-266/config.json
Model weights saved in /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-266/pytorch_model.bin


VBox(children=(Label(value='0.001 MB of 0.022 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.043173…

0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁
train/global_step,▁▁

0,1
eval/accuracy,0.66533
eval/f1,0.53163
eval/loss,0.87003
eval/precision,0.44267
eval/recall,0.66533
eval/runtime,8.5179
eval/samples_per_second,176.101
eval/steps_per_second,11.036
train/epoch,1.0
train/global_step,266.0


[34m[1mwandb[0m: Agent Starting Run: xebq6hov with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 3
[34m[1mwandb[0m: 	learning_rate: 1.8898133501112415e-05
[34m[1mwandb[0m: 	weight_decay: 0.3


PyTorch: setting up devices
loading configuration file config.json from cache at /Users/steve/.cache/huggingface/hub/models--bert-base-cased/snapshots/a8d257ba9925ef39f3036bfc338acf5283c512d9/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "layer_norm_eps": 1e-12,
  "max_length": 50,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.24.0",
  "type_vocab_size": 2,
  "use_ca

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5437,0.414128,0.852,0.865215,0.852,0.85625
2,0.3118,0.371611,0.864667,0.866212,0.864667,0.865385
3,0.2272,0.383846,0.868,0.864807,0.868,0.865603


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1500
  Batch size = 16
Saving model checkpoint to /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-133
Configuration saved in /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-133/config.json
Model weights saved in /Users/steve/models/bert4airlinetweets/sweeps/checkpoint-133/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1500
  Batch size = 16
Saving model checkpoint to /Users/steve/models/bert4airl

0,1
eval/accuracy,▁▇█
eval/f1,▁██
eval/loss,█▁▃
eval/precision,▃█▁
eval/recall,▁▇█
eval/runtime,▁█▂
eval/samples_per_second,█▁▇
eval/steps_per_second,█▁▇
train/epoch,▁▁▅▅███
train/global_step,▁▁▅▅███

0,1
eval/accuracy,0.868
eval/f1,0.8656
eval/loss,0.38385
eval/precision,0.86481
eval/recall,0.868
eval/runtime,7.6893
eval/samples_per_second,195.077
eval/steps_per_second,12.225
train/epoch,3.0
train/global_step,399.0


In [None]:
# load best model we saw

In [None]:
model.to('cpu')

In [None]:
# recall the label encodings:
class_tok2idx

In [None]:
print(f"going to classify the (negative) tweet: '{sequence_0}'")
torch.round(F.softmax(model(**seq0_tokens).logits, -1), decimals=3)

In [None]:
print(f"going to classify the (positive) tweet: '{sequence_1}'")
torch.round(F.softmax(model(**seq1_tokens).logits, -1), decimals=3)