# GPT2 model training code

 This code is inspired by the hugginface tutorial on [sequence classification](https://huggingface.co/docs/transformers/tasks/sequence_classification). 
 
 The idea for using a GPT2 instance which was specilized for sequence classification tasks come from the [hugging face documentation](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2ForSequenceClassification). 
 
 We used pytorch instead of tensorflow here because the base model of [microsoft/DialogRPT-updown](https://huggingface.co/microsoft/DialogRPT-updown) is a pytorch model. And converting it to a tensorflow model caused weird issues.

In [1]:
# import data set
from datasets import load_dataset

training_data = load_dataset("financial_phrasebank", "sentences_allagree")
valitdation_data = load_dataset("financial_phrasebank", "sentences_75agree")



In [2]:
# import tokenizer and model from hugging face 
id2label = {0: "negative", 2: "positive", 1: "neutral"}
label2id = {"negative": 0, "positive": 2,"neutral": 1}

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialogRPT-updown")


from transformers import GPT2ForSequenceClassification

model = GPT2ForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown", num_labels=3, ignore_mismatched_sizes=True)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at microsoft/DialogRPT-updown and are newly initialized because the shapes did not match:
- score.weight: found shape torch.Size([1, 1024]) in the checkpoint and torch.Size([3, 1024]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
# tokenize the traning and valiation set
def preprocess_function(examples):
    return tokenizer(examples["sentence"], return_tensors="pt", padding=True, truncation=True, max_length=64
                )

train_tokens = training_data.map(preprocess_function, batched=True)

val_tokens = valitdation_data.map(preprocess_function,  batched=True)

val_tokens["train"]

Dataset({
    features: ['sentence', 'label', 'input_ids', 'attention_mask'],
    num_rows: 3453
})

In [5]:
# set hyperparameters 
from transformers import TrainingArguments, Trainer

from transformers import DataCollatorWithPadding

training_args = TrainingArguments(
    output_dir="./classification_model",
    per_device_train_batch_size=16,
    save_total_limit=3,
    num_train_epochs=3,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokens["train"],
    eval_dataset=val_tokens["train"],
)



2023-11-30 06:55:49.926622: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-30 06:55:50.051221: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-30 06:55:50.051253: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-30 06:55:50.051639: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-30 06:55:50.120048: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: A

In [6]:
# train the model and save model every epoch
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print(results)


#tokenizer.save_pretrained("./classification_model")

Epoch,Training Loss,Validation Loss
1,No log,0.255276
2,No log,0.295096
3,No log,0.323329


{'eval_loss': 0.3233290910720825, 'eval_runtime': 460.2914, 'eval_samples_per_second': 7.502, 'eval_steps_per_second': 0.939, 'epoch': 3.0}


The model was uploaded manually to [Huggingface]() 