---
### 1. Loading the dataset
The code loads a dataset called "mteb/tweet_sentiment_extraction" using the load_dataset function from the datasets library. This dataset contains tweets with sentiment labels (positive, negative, or neutral).

In [10]:
from datasets import load_dataset
import pandas as pd
import numpy as np

dataset = load_dataset("mteb/tweet_sentiment_extraction")
df = pd.DataFrame(dataset['train'])

In [3]:
df.head()

Unnamed: 0,id,text,label,label_text
0,cb774db0d1,"I`d have responded, if I were going",1,neutral
1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,0,negative
2,088c60f138,my boss is bullying me...,0,negative
3,9642c003ef,what interview! leave me alone,0,negative
4,358bd9e861,"Sons of ****, why couldn`t they put them on t...",0,negative


---
### 2. Tokenizing the dataset
The code then tokenizes the dataset using the GPT2Tokenizer from the transformers library. Tokenization is the process of breaking down text into smaller units, such as words or subwords, that can be fed into a neural network. The tokenizer is trained on the "gpt2" model and is configured to pad the input text to a maximum length and truncate it if it's too long.

In [4]:
from transformers import GPT2Tokenizer

# Loading the dataset to train our model
dataset = load_dataset("mteb/tweet_sentiment_extraction")

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
def tokenize_function(examples):
   return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

---
### 3. Creating a small training and evaluation dataset
The code creates a small training dataset (small_train_dataset) and a small evaluation dataset (small_eval_dataset) by shuffling and selecting a subset of the original dataset. This is done to speed up training and evaluation.

In [5]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

---
### 4. Defining the model and training arguments
The code defines a GPT2ForSequenceClassification model, which is a variant of the GPT-2 model that's specifically designed for sequence classification tasks (i.e., predicting a label for a sequence of text). The model is initialized with the "gpt2" pre-trained weights and is configured to have 3 output labels (positive, negative, and neutral).
The code also defines a set of training arguments using the TrainingArguments class from the transformers library. These arguments control various aspects of the training process, such as the output directory, batch size, and gradient accumulation steps.

In [6]:
from transformers import GPT2ForSequenceClassification

model = GPT2ForSequenceClassification.from_pretrained("gpt2", num_labels=3)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


---
### 5. Evaluate the Fine-tuned Model
Evaluate the fine-tuned model on the validation set to assess its performance. You can use metrics such as accuracy, precision, recall, and F1-score to evaluate the model's performance.

In [7]:
import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
   logits, labels = eval_pred
   predictions = np.argmax(logits, axis=-1)
   return metric.compute(predictions=predictions, references=labels)

---
### 6. Use the Fine-tuned Model
Use the fine-tuned model. We can use the model to generate text, classify text, or perform other tasks.

In [8]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
   output_dir="test_trainer",
   #evaluation_strategy="epoch",
   per_device_train_batch_size=1,  # Reduce batch size here
   per_device_eval_batch_size=1,    # Optionally, reduce for evaluation as well
   gradient_accumulation_steps=4
   )


trainer = Trainer(
   model=model,
   args=training_args,
   train_dataset=small_train_dataset,
   eval_dataset=small_eval_dataset,
   compute_metrics=compute_metrics,

)

trainer.train()


  0%|          | 0/750 [00:00<?, ?it/s]

{'loss': 1.017, 'grad_norm': 14.172191619873047, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.0}
{'train_runtime': 2255.123, 'train_samples_per_second': 1.33, 'train_steps_per_second': 0.333, 'train_loss': 0.8688200073242187, 'epoch': 3.0}


TrainOutput(global_step=750, training_loss=0.8688200073242187, metrics={'train_runtime': 2255.123, 'train_samples_per_second': 1.33, 'train_steps_per_second': 0.333, 'total_flos': 1567794659328000.0, 'train_loss': 0.8688200073242187, 'epoch': 3.0})

---
### 7. Evaluate the model's performance on a validation or test set. 
The trainer class already contains an evaluate method that takes care of this.

In [11]:
import evaluate

trainer.evaluate()


{'eval_loss': 1.1757895946502686,
 'eval_accuracy': 0.537,
 'eval_runtime': 224.3747,
 'eval_samples_per_second': 4.457,
 'eval_steps_per_second': 4.457,
 'epoch': 3.0}

### Refrence

Datacamp
https://www.datacamp.com/tutorial/fine-tuning-large-language-models
