In [6]:
!pip install optuna




The Python libraries need to support the full fine-tuning and evaluation process are installed in this code cell.  Text categorization tasks require the ability to retrieve and refine pre-trained language models, like BERT, which is accomplished through the Transformers library.  To make importing, preparing, and managing big datasets easier, the Datasets library is supplied.  The Scikit-learn library offers resources for assessing the model's performance using F1-score and accuracy, among other measures.  Pandas is used to handle and analyze tabular data, such as reading or processing CSV files, while OpenPyXL allows findings to be exported and saved in Excel format.  The -q parameter at the end suppresses extraneous output messages to guarantee a silent installation.

In [7]:
# Install required libraries
!pip install transformers datasets scikit-learn pandas openpyxl -q

The Transformers library, an essential tool for working with pre-trained models such as BERT, DistilBERT, and RoBERTa, is loaded in this code cell.  It gives users access to training tools, tokenizers, and model architectures for tasks involving natural language processing.  The installed version of the Transformers library is shown in the second line, print(transformers.__version__).  Checking the version helps ensure that the environment is correctly set up before proceeding with model fine-tuning or evaluation, which is helpful for assuring compatibility with the code because some functions or parameters may differ across versions.

In [8]:
import transformers
print(transformers.__version__)

4.57.1


This cell imports a number of necessary Python libraries to set up the environment for training and testing the model.  To construct and train deep learning models, the PyTorch library is imported.  Pre-trained models, in this case DistilBertTokenizerFast for tokenizing text input and DistilBertForSequenceClassification for text classification tasks like predicting review ratings, are accessible through the transformers module.  By managing model training loops, evaluation, and result saving, the TrainingArguments and Trainer classes streamline the training procedure.

 The dataset is then split into training and validation sets using train_test_split from sklearn.model_selection, and the model's performance is gauged using accuracy_score and f1_score from sklearn.metrics.  For numerical calculations and data processing, the pandas and numpy libraries are utilized.

 Lastly, the code uses torch.cuda.is_available() to see whether a GPU is available.  If so, it computes more quickly using the GPU; if not, the CPU is used by default.  The result  A slower training time could arise from using device: cpu, which means the computer's processor will be used for training rather than a GPU. *italicized text* *italicized text* *italicized text*

In [9]:
import torch
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification, TrainingArguments, Trainer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import pandas as pd
import numpy as np

# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


# Load and Inspect Dataset

 this cell is in charge of loading the dataset that will be utilized to optimize the BERT model.  To manage data operations including reading, cleaning, and dataset exploration, the pandas library is imported.  Using the pd.read_excel() method, the script tries to read the Excel file "AMAZON REVIEW RATING.xlsx" and saves it in a variable called df_excel.

 Potential failures are handled with a try-except block; if the file is located and loaded successfully, the message "Successfully loaded data from AMAZON REVIEW RATING.xlsx" is printed, and the first few rows are shown using df_excel.head().  On the other hand, it prints an error message asking the user to upload the missing file if the file cannot be located in the working directory.

 The output displays a preview of the first five records, which comprise the columns "Rating," "Title," and "Review," and verifies that the dataset was loaded properly.  These fields show the user's star rating, the review's title, and the entire review text.  Later on, this dataset will be utilized for model training and text preparation.

In [10]:
import pandas as pd

try:
    df_excel = pd.read_excel('AMAZON REVIEW RATING.xlsx')
    print("Successfully loaded data from AMAZON REVIEW RATING.xlsx")
    print(df_excel.head())
except FileNotFoundError:
    print("Error: AMAZON REVIEW RATING.xlsx not found. Please upload the file.")

Successfully loaded data from AMAZON REVIEW RATING.xlsx
   Rating                                 Title   \
0       3                     more like funchuck   
1       5                              Inspiring   
2       5  The best soundtrack ever to anything.   
3       4                       Chrono Cross OST   
4       5                    Too good to be true   

                                              Review  
0  Gave this to my dad for a gag gift after direc...  
1  I hope a lot of people hear this cd. We need m...  
2  I'm reading a lot of reviews saying that this ...  
3  The music of Yasunori Misuda is without questi...  
4  Probably the greatest soundtrack in history! U...  


# Preprocess Data (Convert Ratings → Labels)

this code cell transforms the Amazon Review Rating dataset into a binary sentiment classification format in order to get it ready for model training.  In order to make the rating system (1–5 stars) suitable for fine-tuning a BERT-based classification model, it is intended to be simplified into two sentiment categories: positive and negative.

 Since reviews with a rating of three are regarded as neutral and could mislead the model, the code first eliminates them entirely.  Only ratings of 1-2 (negative) and 4-5 (positive) are present in the final dataset (df_model).  To prevent type inconsistencies during processing, the Rating column is subsequently transformed to an integer data type.  A lambda function is used to generate a new column named label, where reviews with ratings of 4 or 5 are labeled as 1 (positive) and those with ratings of 1 or 2 are labeled as 0 (negative).

 The script then renames the text column to text for compatibility with the tokenizer later in the pipeline, leaving only the two required columns—the review text and its sentiment label.  In order to guarantee clean data for training, it additionally eliminates any rows with empty or blank reviews before resetting the DataFrame's index.

 Lastly, train_test_split is used to divide the dataset into training (80%) and validation (20%) subsets.  Both subsets are guaranteed to retain a balanced distribution of positive and negative labels thanks to the stratify parameter.  The balance of the dataset is confirmed by the printed output, which displays 4,012 samples in total, with 2,102 negative and 1,910 favorable ratings.  Following splitting, 803 validation samples and 3,209 training samples are available for tokenization and model training.

In [11]:
# Example assumption: rating column has values 1–5
# Convert to binary sentiment (1–2 = Negative, 4–5 = Positive, ignore 3)
df_model = df_excel[df_excel['Rating'] != 3].copy()  # Use df_excel and correct column name

# Ensure Rating is integer before applying function
df_model['Rating'] = df_model['Rating'].astype(int)

# Convert to binary sentiment (1–2 = Negative, 4–5 = Positive, ignore 3)
df_model['label'] = df_model['Rating'].apply(lambda x: 1 if x >= 4 else 0)

# Assuming the review text is in a column named 'Review'
# If not, you might need to adjust 'Review' to the actual column name
df_model = df_model[['Review', 'label']].copy()
df_model = df_model.rename(columns={'Review': 'text'})

# Remove rows where 'text' is empty or just whitespace
df_model = df_model[df_model['text'].str.strip().astype(bool)]

# Reset index after filtering
df_model = df_model.reset_index(drop=True)

print(f"DataFrame shape after processing: {df_model.shape}")
print(f"Label distribution:\n{df_model['label'].value_counts()}")


# Split into train and validation sets (using the processed df_model)
# Ensure stratify is used to maintain label distribution in splits
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df_model['text'].tolist(),
    df_model['label'].tolist(),
    test_size=0.2,
    random_state=42,
    stratify=df_model['label'] # Added stratify for balanced splits
)

print(f"Training samples: {len(train_texts)}")
print(f"Validation samples: {len(val_texts)}")

DataFrame shape after processing: (4012, 2)
Label distribution:
label
0    2102
1    1910
Name: count, dtype: int64
Training samples: 3209
Validation samples: 803


# Tokenization

By transforming the raw text reviews into a numerical format that the DistilBERT model can comprehend, this code cell manages the tokenization procedure and gets the dataset ready for BERT fine-tuning.

 The pre-trained model name "distilbert-base-uncased" is defined first. This is a condensed and effective variant of BERT that works well for text classification applications.  The model name is then used to load the DistilBertTokenizerFast.  Words are converted into numerical tokens that match BERT's vocabulary using this tokenizer.

 The code then uses the tokenizer to tokenize the training and validation text data.  For batch processing during training, it is crucial that all reviews are trimmed or padded to the same length, which is ensured by the parameters truncation=True and padding=True.

 ReviewDataset is a special class designed to make the data compatible with PyTorch.  This class transforms the tokenized data into a Hugging Face Trainer-usable format.  It replaces two important techniques:

 One sample is retrieved at a time by __getitem__, which also transforms labels and input tokens into PyTorch tensors.

 The entire number of samples in the dataset is returned by __len__:.

 The tokenized encodings and their matching labels are then used to construct two dataset objects, train_dataset and val_dataset.  These datasets will thereafter be sent to the model trainer for assessment and refinement.

 The tokenizer and model are publically available, so they were downloaded successfully, and the process went through without any issues. The cautions that showed are typical and only tell you that you are not signed into the Hugging Face Hub.

In [12]:
MODEL_NAME = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)

# Tokenize text data
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)

# Create PyTorch Dataset class
class ReviewDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item
    def __len__(self):
        return len(self.labels)

train_dataset = ReviewDataset(train_encodings, train_labels)
val_dataset = ReviewDataset(val_encodings, val_labels)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# Define Model and Metrics

The DistilBERT model is initialized and ready for the Amazon Review Rating dataset fine-tuning process in this cell.

 At first, the line

 num_labels=2, MODEL_NAME, model = DistilBertForSequenceClassification.from_pretrained. to (device)

 uses Hugging Face's library to load a pre-trained DistilBERT model and adapt it for a binary classification job (positive vs. negative sentiment).  The model is informed that there are only two possible output classes via the parameter num_labels=2.  Faster computations during training are made possible by the.to(device) command, which makes sure the model runs on the GPU if available or the CPU otherwise.

 It is intended that the output will display the warning message.  It tells you that some layers were randomly initialized and were not included in the original pretrained DistilBERT model. These layers are the classifier and pre-classifier weights.  In order for the model to learn to differentiate between positive and negative attitudes, these layers will be trained from scratch using the review dataset.

 Next, the function compute_metrics is defined.  This feature will be used to gauge performance during model evaluation.  It takes the model's true labels (p.label_ids) and predictions (p.predictions), uses np.argmax to translate the predicted probability into class labels, and calculates two important metrics:

 The proportion of accurate predictions among all samples is known as accuracy.

 Particularly for unbalanced datasets, the F1-score—the harmonic mean of accuracy and recall—offers a better balance between false positives and false negatives.

 After each epoch, the model's training and assessment procedures may automatically compute and present these performance metrics thanks to the definition of this function.

In [13]:
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
from transformers import DistilBertForSequenceClassification, TrainingArguments, Trainer

# Define a function to initialize the model for the hyperparameter search
# This is required by the Trainer's hyperparameter_search method
def model_init():
    return DistilBertForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)

# Define the function to compute metrics during evaluation
def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    return {
        'accuracy': accuracy_score(p.label_ids, preds),
        'f1': f1_score(p.label_ids, preds, average='weighted'),
    }

# Perform Grid Search

Grid search is a hyperparameter tuning technique that systematically evaluates a model for all possible combinations of the hyperparameter values provided. It creates a "grid" of all possible hyperparameter configurations and trains the model with each combination. This method is exhaustive and guarantees finding the best combination within the specified grid, but it can be computationally expensive, especially with a large number of hyperparameters or values.

In [14]:
!pip install optuna



In [15]:
!pip install rich



In [16]:
!pip install evaluate



In [17]:
!pip install --upgrade transformers datasets evaluate



In [18]:
import optuna
from transformers import TrainingArguments, Trainer

# Define TrainingArguments with push_to_hub=False
training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="epoch",
    logging_steps=500,
    disable_tqdm=False,
    push_to_hub=False,      # <-- ADD THIS LINE to disable the prompt
    report_to="none",       # <-- ADD THIS LINE as well
)

# Initialize the Trainer
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
)

# Define the search space for Grid Search
def optuna_hp_space(trial):
    return {
        "learning_rate": trial.suggest_categorical("learning_rate", [2e-5, 3e-5, 5e-5]),
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [2, 3]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32]),
    }

# Perform the hyperparameter search
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=optuna_hp_space,
    n_trials=12,
    compute_objective=lambda metrics: metrics["eval_accuracy"],
)

print("Best trial found for Grid Search:")
print(best_trial)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[I 2025-11-13 09:08:20,443] A new study created in memory with name: no-name-111e4b4e-356a-4310-b286-1f2ef69a6104
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.248707,0.90411,0.904163
2,No log,0.22261,0.920299,0.920299


[I 2025-11-13 09:11:10,230] Trial 0 finished with value: 0.9202988792029888 and parameters: {'learning_rate': 3e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32}. Best is trial 0 with value: 0.9202988792029888.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.260264,0.901619,0.901666
2,No log,0.228225,0.914072,0.914102
3,No log,0.234543,0.914072,0.914095


[I 2025-11-13 09:15:24,997] Trial 1 finished with value: 0.9140722291407223 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 32}. Best is trial 0 with value: 0.9202988792029888.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.25605,0.905355,0.905315
2,No log,0.351383,0.861768,0.861037
3,No log,0.254065,0.907846,0.907887


[I 2025-11-13 09:19:37,313] Trial 2 finished with value: 0.9078455790784558 and parameters: {'learning_rate': 5e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 32}. Best is trial 0 with value: 0.9202988792029888.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.249937,0.901619,0.901674
2,No log,0.232029,0.912827,0.912877
3,No log,0.23623,0.92279,0.922806


[I 2025-11-13 09:23:41,962] Trial 3 finished with value: 0.9227895392278954 and parameters: {'learning_rate': 3e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 32}. Best is trial 3 with value: 0.9227895392278954.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.24654,0.914072,0.913999
2,No log,0.223095,0.915318,0.915318


[I 2025-11-13 09:26:43,099] Trial 4 finished with value: 0.9153175591531756 and parameters: {'learning_rate': 5e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32}. Best is trial 3 with value: 0.9227895392278954.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.261112,0.900374,0.900412


[I 2025-11-13 09:28:03,174] Trial 5 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.24654,0.914072,0.913999
2,No log,0.223095,0.915318,0.915318


[I 2025-11-13 09:30:55,121] Trial 6 finished with value: 0.9153175591531756 and parameters: {'learning_rate': 5e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32}. Best is trial 3 with value: 0.9227895392278954.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.25605,0.905355,0.905315
2,No log,0.351383,0.861768,0.861037


[I 2025-11-13 09:33:34,235] Trial 7 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.249937,0.901619,0.901674


[I 2025-11-13 09:34:53,931] Trial 8 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.259815,0.902864,0.902913
2,No log,0.28481,0.897883,0.897877
3,0.280900,0.262894,0.914072,0.914043


[I 2025-11-13 09:39:25,841] Trial 9 finished with value: 0.9140722291407223 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16}. Best is trial 3 with value: 0.9227895392278954.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.274012,0.899128,0.899187


[I 2025-11-13 09:40:49,325] Trial 10 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.272715,0.900374,0.900429


[I 2025-11-13 09:42:12,269] Trial 11 pruned. 


Best trial found for Grid Search:
BestRun(run_id='3', objective=0.9227895392278954, hyperparameters={'learning_rate': 3e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 32}, run_summary=None)


# **RANDOM SEARCH**

In [19]:
import optuna
from transformers import TrainingArguments, Trainer

# It's good practice to define new TrainingArguments for a new search
training_args_random = TrainingArguments(
    output_dir='./results_random',
    eval_strategy="epoch",
    logging_steps=500,
    disable_tqdm=False,
    push_to_hub=False,
    report_to="none",
)

# Initialize a new Trainer for the random search
trainer_random = Trainer(
    model_init=model_init,
    args=training_args_random,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
)

# Define the search space for Random Search
# We use suggest_float and suggest_int for a random search over a range
def optuna_hp_space_random(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 5e-5, log=True),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 1, 2),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [32, 64]),
        "weight_decay": trial.suggest_float("weight_decay", 0.0, 0.1), # Adding another hyperparameter
    }

# Perform the hyperparameter search
# n_trials determines how many random combinations to test
best_trial_random = trainer_random.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=optuna_hp_space_random,
    n_trials=10,  # You can increase this number for a more thorough search
    compute_objective=lambda metrics: metrics["eval_accuracy"],
)

print("Best trial found for Random Search:")
print(best_trial_random)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[I 2025-11-13 09:57:47,090] A new study created in memory with name: no-name-8982b6dc-4d67-4d7d-81bf-8709bd3185ca
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.406784,0.870486,0.869691
2,No log,0.306462,0.892902,0.89282


[I 2025-11-13 10:00:30,931] Trial 0 finished with value: 0.8929016189290162 and parameters: {'learning_rate': 1.3930826606969197e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.003875241015027764}. Best is trial 0 with value: 0.8929016189290162.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.465636,0.855542,0.85404
2,No log,0.336003,0.88543,0.88532


[I 2025-11-13 10:03:20,692] Trial 1 finished with value: 0.8854296388542964 and parameters: {'learning_rate': 1.1881756758326936e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.09400750420506544}. Best is trial 0 with value: 0.8929016189290162.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.476262,0.851806,0.851349


[I 2025-11-13 10:04:48,117] Trial 2 finished with value: 0.8518057285180572 and parameters: {'learning_rate': 1.844470458353026e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 64, 'weight_decay': 0.035437693230322156}. Best is trial 0 with value: 0.8929016189290162.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.282205,0.896638,0.896434
2,No log,0.260325,0.901619,0.901553


[I 2025-11-13 10:07:52,872] Trial 3 finished with value: 0.9016189290161893 and parameters: {'learning_rate': 2.337049868052799e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.030250374224475997}. Best is trial 3 with value: 0.9016189290161893.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.309122,0.88543,0.885246


[I 2025-11-13 10:09:24,869] Trial 4 finished with value: 0.8854296388542964 and parameters: {'learning_rate': 1.912767392474809e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.011309305093670742}. Best is trial 3 with value: 0.9016189290161893.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.267149,0.910336,0.910251
2,No log,0.245598,0.909091,0.908995


[I 2025-11-13 10:12:13,368] Trial 5 finished with value: 0.9090909090909091 and parameters: {'learning_rate': 3.30193816480245e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.020550767372667558}. Best is trial 5 with value: 0.9090909090909091.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.269826,0.901619,0.901345
2,No log,0.241751,0.912827,0.91276


[I 2025-11-13 10:15:12,625] Trial 6 finished with value: 0.912826899128269 and parameters: {'learning_rate': 3.6620378411950055e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.01736149547701833}. Best is trial 6 with value: 0.912826899128269.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.456896,0.867995,0.867842


[I 2025-11-13 10:16:44,463] Trial 7 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.455746,0.867995,0.867842


[I 2025-11-13 10:18:22,996] Trial 8 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.389242,0.871731,0.871739


[I 2025-11-13 10:19:49,560] Trial 9 pruned. 


Best trial found for Random Search:
BestRun(run_id='6', objective=0.912826899128269, hyperparameters={'learning_rate': 3.6620378411950055e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 64, 'weight_decay': 0.01736149547701833}, run_summary=None)
