# Exercise 1 

In this exercise, we will keep practicing fine-tuning with the `transformers` library from `Hugging Face`. We will use a few models to attach sentiments to Twitter data.

### Exercise 1(a) (6 points)

Read the `Tweets.csv`. Do the following:

- Change the `sentiment` to the following: `negative -> 0`, `neutral -> 1`, and `positive -> 2`. 
- Drop `textID` and `selected_text`.
- Update the remainder columms mames to `text` and `label`.
- Split the data into `train` (80%) and `test` (20%).

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from transformers import pipeline

df = pd.read_csv('Tweets.csv')
df['sentiment'] = df['sentiment'].map({'negative': 0, 'neutral': 1, 'positive': 2})
df = df.drop(['textID', 'selected_text'], axis=1)
df.columns = ['text', 'label']

X_train, X_test = train_test_split(df, test_size=0.2, random_state=42, stratify=df['label'])

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
X_train.head()

Unnamed: 0,text,label
7025,in 8th grade yearbook: for narnia movie releas...,1
15039,ah remember the days when you`d sleep in until...,0
1804,my momma is comin 2night ! 2morrow tennis day...,2
25874,my picture wont come up it keeps saying thats...,1
27217,We don`t feel too comfortable using it. It`s...,0


### Exercise 1(b) (6 points)

Using the `pipeline`, load the ``, and attach sentiments for each of the `text` in the `test` dataset. Report the accuracy of the model.cardiffnlp/twitter-roberta-base-sentiment

In [13]:
# load model
cardiff_md = pipeline('sentiment-analysis', model='cardiffnlp/twitter-roberta-base-sentiment')

# predict on the test
y_pred = X_test['text'].apply(lambda x: cardiff_md(x)[0]['label'])

# calculate accuracy
accuracy = accuracy_score(X_test['label'], y_pred.map({'LABEL_0': 0, 'LABEL_1': 1, 'LABEL_2': 2}))
print(f'Accuracy: {accuracy:.2f}')

Device set to use cpu


Accuracy: 0.73


### Exercise 1(c) (20 points)

Fine-tune the `cardiffnlp/twitter-roberta-base-sentiment` on the `train` dataset and report the accuracy of the tuned model on the `test` dataset. Consider the following configuration to start the tuning:

```
from transformers import RobertaForSequenceClassification, Trainer, TrainingArguments, AutoTokenizer
from datasets import Dataset
import torch

# Loading the pre-trained model
model = RobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment")
```

For training purpose, consider:

```
args = TrainingArguments(
    output_dir="./hwk_results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=0.0001,
    per_device_train_batch_size=256,
    per_device_eval_batch_size=256,
    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
)
```

In [2]:
from transformers import RobertaForSequenceClassification, Trainer, TrainingArguments, AutoTokenizer
from datasets import Dataset
import torch

# Loading the pre-trained model
model = RobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment")

# prepare data
train_dataset = Dataset.from_pandas(X_train)
test_dataset = Dataset.from_pandas(X_test)

train_dataset = train_dataset.filter(lambda x: x['text'] is not None and x['text'] != "")


# tokenize data
train_dataset = train_dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True, max_length = 128), batched=True)
test_dataset = test_dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True, max_length = 128), batched=True)

# convert to torch
train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
test_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {'accuracy': accuracy_score(predictions, labels)}

args = TrainingArguments(
    output_dir="./hwk_results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=0.0001,
    per_device_train_batch_size=256,
    per_device_eval_batch_size=256,
    num_train_epochs=1,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
)

# define trainer
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics)

# train model
trainer.train()

Filter: 100%|██████████| 21984/21984 [00:00<00:00, 160466.22 examples/s]
Map: 100%|██████████| 21983/21983 [00:02<00:00, 8494.21 examples/s]
Map: 100%|██████████| 5497/5497 [00:00<00:00, 10994.03 examples/s]


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.485334,0.802074


TrainOutput(global_step=86, training_loss=0.5489309887553371, metrics={'train_runtime': 12045.6004, 'train_samples_per_second': 1.825, 'train_steps_per_second': 0.007, 'total_flos': 1446005565478656.0, 'train_loss': 0.5489309887553371, 'epoch': 1.0})

### Exercise 1(d) (3 points)

What model would you use the predict the sentiment? Be specific.

I would use the fine tuned model because it has the highest accuracy