<a href="https://colab.research.google.com/github/larajakl/Computational-Linguistics/blob/main/exercises/HomeExercise3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Home Exericse 3: Hyperparameters and Evaluation
In this third home exercise, you will use the knowledge from Tutorial 4 to experiment with hyperparameters, create a test set, and evaluate your final model on the created test set.

In this notebook, please complete all instructions starting with 👋 ⚒ in the code cell after the sign or provide your analysis in the text cell after the sign.

## **Distilbert: Hyperparameters and Evaluation**

Use the code of Tutorial 4 to load and fine-tune the `distilbert-base-cased`model on the small subset of the `imdb`Movie Review Dataset. For convenience, the code of Tutorial 4 required for this exercise is already provided in the code cells below.

👋 ⚒ When creating the dataset splits in the code cell below, additionally create a test set to be used after thet training. Make sure that your test set does not contain any of the sentences contained in the training or validation set and is approximately of the same size as the validation set.

In [26]:
!pip install transformers
!pip install datasets
!pip install evaluate
!pip install accelerate --upgrade
!pip install optuna


Collecting optuna
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.14.0-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.6-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-4.1.0-py3-none-any.whl (364 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m364.4/364.4 kB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.14.0-py3-none-any.whl (233 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.5/233.5 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading Mako-1.3.6-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: M

In [22]:
from datasets import load_dataset, DatasetDict
from transformers import DataCollatorWithPadding

from transformers import AutoTokenizer

imdb_dataset = load_dataset("imdb")
# we had loaded the imdb dataset already above - if not, outcomment this line
# Make sure you have the right tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-cased")


# Just take the first 50 tokens for speed on CPU
def truncate(example):
    return {
        'text': " ".join(example['text'].split()[:100]),
        'label': example['label']
    }

# Take 128 random examples for train and 32 validation
small_imdb_dataset = DatasetDict(
    train=imdb_dataset['train'].shuffle(seed=24).select(range(128)).map(truncate),
    val=imdb_dataset['train'].shuffle(seed=24).select(range(128, 160)).map(truncate),
)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding=True, truncation=True)

small_tokenized_dataset = small_imdb_dataset.map(tokenize_function, batched=True, batch_size=16)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

👋 ⚒ For this exercise, we will use the Hugging Face Trainer class to play with hyperparamters. Try to find a set of hyperparameter settings that achieves the highest possilbe accuracy on the **validation set** with the small dataset and model in this setup.

**Optional:** If you want to follow a more systematic route, feel free to use available frameworks for hyperparameter optimization, such as [Optuna](https://optuna.org/).

In [27]:
import numpy as np
import evaluate
from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification
#new imports:
import optuna

model = AutoModelForSequenceClassification.from_pretrained('distilbert/distilbert-base-cased', num_labels=2)
accuracy = evaluate.load("accuracy")

arguments = TrainingArguments(
    output_dir="sample_cl_trainer",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    logging_steps=8, # because 8 times 16 is 128
    num_train_epochs=5,
    eval_strategy="epoch", # run validation at the end of each epoch
    save_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    report_to='none',
    seed=224
)

def compute_metrics(eval_pred):
    """Called at the end of validation. Gives accuracy"""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    # calculates the accuracy
    return accuracy.compute(predictions=predictions, references=labels)


trainer = Trainer(
    model=model,
    args=arguments,
    train_dataset=small_tokenized_dataset['train'],
    eval_dataset=small_tokenized_dataset['val'], # change to test when you do your final evaluation!
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [16]:
trainer.train()
trainer.save_model("sample_cl_trainer")  # Explicitly save the final model

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6868,0.694317,0.46875
2,0.6729,0.692271,0.46875
3,0.6628,0.681649,0.5
4,0.6409,0.662725,0.8125
5,0.612,0.658475,0.8125


Without hyperparameter tuning, I got the best result at epoch 5, with 0.8125. I saved this to my Google Drive to be able to come back to it.

Then I...

In [28]:
# In this code cell, I use Optuna to test hyperparameters
import numpy as np
import evaluate
from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification
#new imports:
import optuna

accuracy = evaluate.load("accuracy")

# tokenized dataset and data collator are already preloaded as "small_tokenized_dataset" and "data_collator" above

# Define the objective function
def objective(trial):
    # Define hyperparameters to tune
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 5e-5, log=True)
    weight_decay = trial.suggest_float("weight_decay", 0.0, 0.3)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16, 32])
    num_train_epochs = trial.suggest_int("num_train_epochs", 3, 10)

    # Initialize model
    model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-cased", num_labels=2)

    # Define training arguments
    arguments = TrainingArguments(
        output_dir="optuna_cl_trainer",
        per_device_train_batch_size=per_device_train_batch_size,
        per_device_eval_batch_size=16,
        logging_steps=8,
        num_train_epochs=num_train_epochs,
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=learning_rate,
        weight_decay=weight_decay,
        load_best_model_at_end=True,
        report_to='none',
        seed=224
    )

    # Define metrics
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return accuracy.compute(predictions=predictions, references=labels)

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=arguments,
        train_dataset=small_tokenized_dataset['train'],
        eval_dataset=small_tokenized_dataset['val'],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics
    )

    # Train the model
    trainer.train()

    # Evaluate the model on validation set
    eval_result = trainer.evaluate()

    # Optuna will minimize the return value, so we return the negative of accuracy to maximize it
    return eval_result['eval_accuracy']

# Create a study to maximize accuracy
study = optuna.create_study(direction="maximize")

# Optimize the study with n_trials
study.optimize(objective, n_trials=20)

# Get the best hyperparameters
print("Best hyperparameters:", study.best_params)

# Train final model with the best hyperparameters
#best_params = study.best_params
#final_arguments = TrainingArguments(
 #   output_dir="final_cl_trainer",
  #  per_device_train_batch_size=best_params["per_device_train_batch_size"],
  #  per_device_eval_batch_size=16,
  #  logging_steps=8,
  #  num_train_epochs=best_params["num_train_epochs"],
  #  eval_strategy="epoch",
  #  save_strategy="epoch",
  #  learning_rate=best_params["learning_rate"],
  #  weight_decay=best_params["weight_decay"],
  #  load_best_model_at_end=True,
  #  report_to='none',
  #  seed=224
#)

#final_model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-cased", num_labels=2)
#final_trainer = Trainer(
 #   model=final_model,
  #  args=final_arguments,
   # train_dataset=small_tokenized_dataset['train'],
    #eval_dataset=small_tokenized_dataset['val'],
    #tokenizer=tokenizer,
    #data_collator=data_collator,
    #compute_metrics=compute_metrics
#)

# Train the final model
#final_trainer.train()

# Evaluate on the test set
#test_results = final_trainer.evaluate(eval_dataset=small_tokenized_dataset['test'])
#print("Test results:", test_results)



[I 2024-11-27 17:45:50,542] A new study created in memory with name: no-name-6144ebbd-86dc-4433-9246-36c853a61b29
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6769,0.693824,0.46875
2,0.6813,0.690343,0.46875
3,0.6712,0.68736,0.46875


[I 2024-11-27 17:54:15,683] Trial 0 finished with value: 0.46875 and parameters: {'learning_rate': 1.0799487171985386e-05, 'weight_decay': 0.16559908150051014, 'per_device_train_batch_size': 8, 'num_train_epochs': 3}. Best is trial 0 with value: 0.46875.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6714,0.683679,0.46875
2,0.6455,0.64306,0.75
3,0.5006,0.541094,0.78125
4,0.2952,0.526457,0.78125
5,0.1292,0.463079,0.8125
6,0.0741,0.478704,0.84375
7,0.0304,0.528988,0.78125
8,0.0182,0.569449,0.78125
9,0.0155,0.582053,0.78125


[I 2024-11-27 18:18:27,998] Trial 1 finished with value: 0.8125 and parameters: {'learning_rate': 1.779298147067955e-05, 'weight_decay': 0.10886095850307957, 'per_device_train_batch_size': 8, 'num_train_epochs': 9}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6777,0.692771,0.46875
2,0.6507,0.67576,0.5
3,0.5535,0.621951,0.71875
4,0.3622,0.595913,0.71875
5,0.2821,0.571613,0.75


[I 2024-11-27 18:32:08,650] Trial 2 finished with value: 0.75 and parameters: {'learning_rate': 2.4971768958894923e-05, 'weight_decay': 0.08733810770852725, 'per_device_train_batch_size': 8, 'num_train_epochs': 5}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6725,0.696353,0.46875
2,0.6698,0.677214,0.5
3,0.6233,0.626757,0.78125
4,0.4805,0.640684,0.65625
5,0.3362,0.506323,0.71875
6,0.2312,0.481112,0.78125
7,0.1016,0.472576,0.78125
8,0.0659,0.48641,0.78125
9,0.0382,0.51394,0.78125
10,0.0379,0.521264,0.8125


[I 2024-11-27 18:58:50,100] Trial 3 finished with value: 0.78125 and parameters: {'learning_rate': 1.5503734021029926e-05, 'weight_decay': 0.059648153445032756, 'per_device_train_batch_size': 8, 'num_train_epochs': 10}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6941,0.685069,0.46875
2,0.6476,0.64674,0.71875
3,0.5151,0.625765,0.6875
4,0.2728,0.765055,0.6875
5,0.1935,0.679796,0.65625
6,0.0521,0.775602,0.6875
7,0.0244,0.826884,0.78125
8,0.0138,0.938029,0.6875
9,0.0112,0.968675,0.6875


[I 2024-11-27 19:22:31,378] Trial 4 finished with value: 0.6875 and parameters: {'learning_rate': 4.706745312023782e-05, 'weight_decay': 0.13317977349755078, 'per_device_train_batch_size': 16, 'num_train_epochs': 9}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.690265,0.5625
2,0.682700,0.691604,0.46875
3,0.682700,0.689752,0.46875
4,0.657700,0.677772,0.53125
5,0.657700,0.669463,0.6875
6,0.620800,0.665882,0.6875


[I 2024-11-27 19:39:15,695] Trial 5 finished with value: 0.6875 and parameters: {'learning_rate': 2.376475877865306e-05, 'weight_decay': 0.20000821140165462, 'per_device_train_batch_size': 32, 'num_train_epochs': 6}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6939,0.69969,0.46875
2,0.6859,0.695864,0.46875
3,0.6774,0.693355,0.46875
4,0.6768,0.691252,0.46875


[I 2024-11-27 19:49:48,848] Trial 6 finished with value: 0.46875 and parameters: {'learning_rate': 1.1469437004232727e-05, 'weight_decay': 0.27772839305593144, 'per_device_train_batch_size': 16, 'num_train_epochs': 4}. Best is trial 1 with value: 0.8125.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6733,0.688528,0.46875
2,0.5569,0.571155,0.75
3,0.2624,0.435227,0.84375
4,0.1282,0.38886,0.84375


[I 2024-11-27 20:00:47,247] Trial 7 finished with value: 0.84375 and parameters: {'learning_rate': 3.848169677905598e-05, 'weight_decay': 0.22438235331891607, 'per_device_train_batch_size': 8, 'num_train_epochs': 4}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.668,0.689977,0.46875
2,0.6484,0.663746,0.6875
3,0.6021,0.639386,0.71875


[I 2024-11-27 20:08:59,665] Trial 8 finished with value: 0.71875 and parameters: {'learning_rate': 3.244139767856603e-05, 'weight_decay': 0.2448101967710896, 'per_device_train_batch_size': 8, 'num_train_epochs': 3}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6709,0.679304,0.46875
2,0.6225,0.615596,0.78125
3,0.4285,0.502894,0.78125
4,0.2519,0.468311,0.75
5,0.17,0.429397,0.8125


[I 2024-11-27 20:22:15,184] Trial 9 finished with value: 0.8125 and parameters: {'learning_rate': 2.420220647506377e-05, 'weight_decay': 0.08696586589911566, 'per_device_train_batch_size': 8, 'num_train_epochs': 5}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.693328,0.46875
2,0.677900,0.695501,0.46875
3,0.677900,0.643279,0.75
4,0.610400,0.602282,0.6875
5,0.610400,0.629529,0.65625
6,0.419600,0.584133,0.75
7,0.419600,0.55076,0.75


[I 2024-11-27 20:40:54,346] Trial 10 finished with value: 0.75 and parameters: {'learning_rate': 4.5083947381601806e-05, 'weight_decay': 0.21613402596976583, 'per_device_train_batch_size': 32, 'num_train_epochs': 7}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6811,0.699503,0.46875
2,0.6701,0.687669,0.46875
3,0.612,0.65664,0.625
4,0.5092,0.674306,0.625
5,0.3758,0.604953,0.65625
6,0.3074,0.641118,0.6875
7,0.1932,0.609127,0.71875
8,0.1881,0.604895,0.71875


[I 2024-11-27 21:02:07,323] Trial 11 finished with value: 0.71875 and parameters: {'learning_rate': 1.587606246267303e-05, 'weight_decay': 0.0009650090973031406, 'per_device_train_batch_size': 8, 'num_train_epochs': 8}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6741,0.680954,0.46875
2,0.501,0.481825,0.8125
3,0.1471,0.499256,0.78125
4,0.0616,0.636168,0.71875
5,0.0045,0.697499,0.78125
6,0.0057,0.780649,0.75
7,0.0024,0.967995,0.78125
8,0.0018,0.767629,0.8125
9,0.0017,0.782687,0.75
10,0.0017,0.790969,0.75


[I 2024-11-27 21:28:23,333] Trial 12 finished with value: 0.8125 and parameters: {'learning_rate': 3.436052315434758e-05, 'weight_decay': 0.14934362747896282, 'per_device_train_batch_size': 8, 'num_train_epochs': 10}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6796,0.690296,0.46875
2,0.6623,0.66864,0.8125
3,0.5706,0.623644,0.71875
4,0.4141,0.617953,0.71875
5,0.2607,0.608326,0.71875
6,0.1977,0.606381,0.71875
7,0.1107,0.627518,0.71875
8,0.0879,0.627852,0.6875


[I 2024-11-27 21:49:31,786] Trial 13 finished with value: 0.71875 and parameters: {'learning_rate': 1.8003028691786272e-05, 'weight_decay': 0.2931907984698821, 'per_device_train_batch_size': 8, 'num_train_epochs': 8}. Best is trial 7 with value: 0.84375.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.693739,0.46875
2,0.680600,0.694537,0.46875
3,0.680600,0.658016,0.625
4,0.630500,0.614031,0.8125
5,0.630500,0.593668,0.875
6,0.525900,0.585093,0.875


[I 2024-11-27 22:05:26,803] Trial 14 finished with value: 0.875 and parameters: {'learning_rate': 3.29432412995289e-05, 'weight_decay': 0.18368779096914803, 'per_device_train_batch_size': 32, 'num_train_epochs': 6}. Best is trial 14 with value: 0.875.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.688346,0.59375
2,0.686800,0.693013,0.46875


[W 2024-11-27 22:11:22,303] Trial 15 failed with parameters: {'learning_rate': 3.444437881338669e-05, 'weight_decay': 0.1970261801558752, 'per_device_train_batch_size': 32, 'num_train_epochs': 5} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "<ipython-input-28-80b5b0c093d2>", line 58, in objective
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2123, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2481, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3579, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
  Fil

KeyboardInterrupt: 

In [17]:
# saving models in my Google Drive:
from google.colab import drive
drive.mount('/content/drive')

# Specify model checkpoint directory:
model_checkpoint = "sample_cl_trainer/checkpoint-40"
# Load the model and tokenizer:
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Define the Google Drive directory where you want to save the model:
save_directory = "/content/drive/MyDrive/saved_model_checkpoint"
# Save the model and tokenizer to Google Drive:
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)




Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


('/content/drive/MyDrive/saved_model_checkpoint/tokenizer_config.json',
 '/content/drive/MyDrive/saved_model_checkpoint/special_tokens_map.json',
 '/content/drive/MyDrive/saved_model_checkpoint/vocab.txt',
 '/content/drive/MyDrive/saved_model_checkpoint/added_tokens.json',
 '/content/drive/MyDrive/saved_model_checkpoint/tokenizer.json')

In [12]:
# I used this to check what is in my checkpoint-40 folder:
import os
print(os.listdir("sample_cl_trainer/checkpoint-40"))

['training_args.bin', 'model.safetensors', 'scheduler.pt', 'rng_state.pth', 'vocab.txt', 'tokenizer.json', 'trainer_state.json', 'special_tokens_map.json', 'config.json', 'tokenizer_config.json', 'optimizer.pt']


👋 ⚒ Change the following code cell in a way that not only a single sentence is evaluated on your trained model (!make sure to use the correct checkpoint!) but the evaluation is performaned on the entire newly created test set.

This might also be a good occassion to get familiar with the [Hugging Face documentation and tutorials](https://huggingface.co/docs/transformers/index).

In [7]:
import torch
test_str = "I love this movie!"

fine_tuned_model = AutoModelForSequenceClassification.from_pretrained("sample_cl_trainer/checkpoint-40")
model_inputs = tokenizer(test_str, return_tensors="pt")
prediction = torch.argmax(fine_tuned_model(**model_inputs).logits)
print(["NEGATIVE", "POSITIVE"][prediction])

POSITIVE
