***Imports***

In [30]:
import pandas as pd
from datasets import load_dataset
from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer
from transformers import pipeline
import evaluate 
import numpy as np
from sklearn.metrics import f1_score
from huggingface_hub import notebook_login

***Setting up Login***

In [2]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**Note:** For the next model we will be following the same idea as the previous notebook. The only change will be some of the parameters in the training argument.

***Loading Data***

In [3]:
ds = load_dataset('imdb')

***Tokenizer***

In [4]:
tokenizer = AutoTokenizer.from_pretrained('distilbert/distilbert-base-uncased')

# function to tokenize text
def tokenize(batch):
    token_data = tokenizer(batch['text'], truncation=True, padding=True)
    return token_data

# applying functions to data
tokenized_data = ds.map(tokenize, batched=True)

***Model***

In [5]:
model = AutoModelForSequenceClassification.from_pretrained('distilbert/distilbert-base-uncased',
                                                            num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


***Evalutation Metric Setup***

In [6]:
# choosing f1 as metric
metric = evaluate.load('f1')

# creating metric functions
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1) # highest prediction
    return metric.compute(predictions=predictions, references=labels)

***Training***

In [7]:
repo_name = 'sentiment-analysis-model'

training_args = TrainingArguments(
    output_dir=repo_name,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.001, # added weight decay
    warmup_steps=500, # added warmup steps
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data['train'],
    eval_dataset=tokenized_data['test'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

trainer.train()

  0%|          | 0/1563 [00:00<?, ?it/s]

{'loss': 0.4295, 'grad_norm': 14.283549308776855, 'learning_rate': 2e-05, 'epoch': 0.32}
{'loss': 0.2608, 'grad_norm': 11.268003463745117, 'learning_rate': 1.059266227657573e-05, 'epoch': 0.64}
{'loss': 0.2192, 'grad_norm': 10.393467903137207, 'learning_rate': 1.185324553151458e-06, 'epoch': 0.96}
{'train_runtime': 5867.4666, 'train_samples_per_second': 4.261, 'train_steps_per_second': 0.266, 'train_loss': 0.2992425114393082, 'epoch': 1.0}


TrainOutput(global_step=1563, training_loss=0.2992425114393082, metrics={'train_runtime': 5867.4666, 'train_samples_per_second': 4.261, 'train_steps_per_second': 0.266, 'total_flos': 3311684966400000.0, 'train_loss': 0.2992425114393082, 'epoch': 1.0})

***Evaluating Model***

In [8]:
trainer.evaluate()

  0%|          | 0/1563 [00:00<?, ?it/s]

{'eval_loss': 0.19744840264320374,
 'eval_f1': 0.9264379862146802,
 'eval_runtime': 1335.9506,
 'eval_samples_per_second': 18.713,
 'eval_steps_per_second': 1.17,
 'epoch': 1.0}

**Insights:** We can see from the `f1` score of `0.9264` of this model that is doing pretty well. Although it is ever so slightly worse than our previous model. It is not a significant change. Due to each training taking over an hour to sometimes two hours of my time, I have decided to keep the optimization to only this one. If I had a computer that could run this in a shorter amount of time I would test multiple different optimizations. For the next steps I will be using this model to use on unseen data.

***Pushing Model to Hub***

In [9]:
trainer.push_to_hub()

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/KiranWood/sentiment-analysis-model/commit/17177c732c85487c39cd5b0f0b266bba42d04235', commit_message='End of training', commit_description='', oid='17177c732c85487c39cd5b0f0b266bba42d04235', pr_url=None, pr_revision=None, pr_num=None)

***Testing Model on Unseen Data***

In [29]:
# getting data and model
text = ds['unsupervised']['text']
my_model = pipeline(model="KiranWood/sentiment-analysis-model")

# trunicating data to fit in the max length for model
max_length = my_model.tokenizer.model_max_length
data = []
for text in text:
    trunicated_text = text[:max_length]
    data.append(trunicated_text)

preds = my_model(data)

In [48]:
# turning into dataframe
preds = pd.DataFrame(preds)

preds.head()

Unnamed: 0,label,score
0,LABEL_1,0.987855
1,LABEL_1,0.992684
2,LABEL_1,0.981885
3,LABEL_0,0.976919
4,LABEL_0,0.991997


**Insights:** As we can see from this data that it is labeled as either negative or positive. This unseen section of dataset does not have labeled data. So we cannot fully evaluate the final model onto new data. But we can see that it does label new data pretty confidently.