# Model Training

## 1. Importing and downloading the necessary libraries

Before running the analysis, it's necessary to install specific Python libraries that our scripts depend on. Below are the commands used to install these libraries:

- **transformers[torch]**: Installs the transformers library along with PyTorch. This library from Hugging Face provides state-of-the-art machine learning models primarily focused on Natural Language Processing (NLP), including pre-trained models that can be easily adapted to various text-based tasks.

- **accelerate**: A library from Hugging Face designed to simplify and accelerate training deep learning models with PyTorch. It abstracts away the complexity involved in coding distributed machine learning models and helps in leveraging hardware acceleration.

- **krippendorff**: Installs the krippendorff package, which is used to calculate Krippendorff's alphaâ€”a statistical measure of inter-rater reliability or agreement between raters in qualitative studies.

- **datasets**: Provided by Hugging Face, this library is dedicated to easy access, sharing, and manipulation of datasets and evaluation metrics for machine learning. It is especially tailored to NLP tasks but extends to other domains as well.

In [3]:
!pip install transformers[torch]
!pip install accelerate -U
!pip install krippendorff
!pip install datasets

zsh:1: no matches found: transformers[torch]

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


This code snippet sets up the environment by importing a set of Python libraries and modules essential for handling data manipulation, machine learning model training, and evaluation. Below is a breakdown of each import and its role in data processing and machine learning workflows:

### Core Libraries:
- **pandas**: Provides powerful data structures such as DataFrames, enabling easy data manipulation and analysis.
- **numpy**: Essential for scientific computing, numpy supports large, multi-dimensional arrays and matrices along with a broad collection of high-level mathematical functions to operate on these arrays.
- **random**: Useful for generating random numbers and performing random selections, this module is often utilized in data sampling or when shuffling data.
- **re**: Facilitates regular expression operations for string searching and manipulation, which is useful in text preprocessing.
- **os**: Provides a way of using operating system dependent functionality to interact with the file system.

### Machine Learning and Data Splitting:
- **sklearn (Scikit-learn)**: A versatile machine learning library that features various classification, regression, clustering algorithms, and utilities for model evaluation such as train/test splits and various other metrics (precision, recall, F1 score, ROC AUC score).

### Inter-rater Reliability:
- **krippendorff**: Used to calculate Krippendorff's alpha, which measures the reliability of raters in qualitative research, ensuring consistent data labeling quality.

### Deep Learning Framework and Transformers:
- **torch (PyTorch)**: A widely used library in machine learning and deep learning for its dynamic computation graph paradigm and efficient memory usage.
- **transformers**: Provides state-of-the-art general-purpose architectures for natural language understanding and generation tasks, supporting thousands of pre-trained models optimized for a wide range of tasks.
- **datasets**: Simplifies the use of public datasets for training and evaluating machine learning models, providing easy-to-use data structures and data loading utilities from Hugging Face.

### Date and Time:
- **datetime**: Allows for manipulation of dates and times in both simple and complex ways, useful for timestamping model training sessions or handling data that includes date/time information.

### Practical Applications:
This setup is particularly useful for projects involving natural language processing where transformer models are applied. It supports tasks such as text classification, sentiment analysis, and entity recognition, with tools to split data, evaluate models, and ensure robust model training through reproducibility and reliability measures.

In [4]:
import pandas as pd
import numpy as np
import random
import re
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score
import krippendorff
import torch
from transformers import TrainingArguments, Trainer, EvalPrediction, AutoConfig, AutoTokenizer, AutoModelForSequenceClassification, IntervalStrategy
from datasets import Dataset, DatasetDict
from datetime import datetime
import ast  # To safely evaluate strings containing Python literals


## 2. Configure

For better readability and transparancy, we set paths, controls, model and hyperparameters here.

***Environment Setup***
- **Paths and Filenames**: Establish directories for training data and outputs. If the training directory doesn't exist, it is created.
- **Controls**: Define parameters such as the percentile for maximum tokens (`P`), the proportion of the dataset to be used for testing (`T`), and a seed for reproducibility.

***Model Configuration***
- **Model Selection**: The pre-trained model `DistilRoBERTa-base` from Hugging Face is chosen for fine-tuning because it offers a good balance between performance and computational efficiency.
- **Hyperparameters**: Set learning rate, batch size, weight decay, number of training epochs, and warmup steps. The choice of three training epochs was based on previous experiments where this setting achieved an accuracy of approximately 94.9%.

In [10]:
# Paths and Filenames
IN_TrainPath = "Data"
IN_TrainSample = "validated_labeled_data_cleaned.csv"
Training_Path = "Training"
if not os.path.exists(Training_Path):
    os.makedirs(Training_Path)

# Set Controls
P = 95   # percentile for max tokens
T = 0.2  # size of test split for training
seed = 42 # seed used everywhere

# Pre-Trained LLM to fine-tune
pretrained = 'DistilRoBERTa-base'

# Set basic Hyperparameters for training (classifier performance can vary with different parameter settings)
#num_train_epochs = 3 because it had an accuracy of 0.949
hyperparameters = {
    'learning_rate': 6.7e-06,
    'per_device_train_batch_size': 16,
    'weight_decay': 1.1e-05,
    'num_train_epochs': 4,
    'warmup_steps': 500
}

## 3. Helper Functions

### Tokenization and Data Preparation
- **Tokenization**: A function `get_tokens` is defined to tokenize text and return the number of tokens using the model's tokenizer.
- **Percentile Computation**: Compute the specified percentile of token lengths to determine the max length of input sequences.
- **Preprocessing**: Function `preprocess` encodes texts and their labels for training, padding and truncating them to the maximum token length determined earlier.

### Training and Evaluation
- **Metrics Calculation**: Define functions to compute various metrics for multi-label classification tasks, including precision, recall, F1 score, ROC AUC score, and Krippendorff's alpha. These metrics help assess the model's performance comprehensively.
- **Compute Metrics Function**: A wrapper function that uses the `multi_label_metrics` to evaluate predictions made by the model during validation.

### Additional Utilities
- **Seeding**: A utility function `seed_everything` is provided to ensure reproducibility of results. This function sets random seeds for the Python built-in `random` module, NumPy, and PyTorch.

### Execution and Workflow
The script is structured to be executed sequentially, where data loading, preprocessing, model initialization, training, and evaluation are conducted in a logical and systematic manner. Adjustments to hyperparameters and paths can be made based on the specific requirements and data characteristics of the project.

### Note
This script assumes that all necessary Python packages are installed and that the dataset is pre-processed to include necessary labels for training. Make sure to validate the paths and file names as per your local or server setup.

In [68]:
def get_tokens(text):
    """Tokenize text (provided tokenizer is instantiated) """
    return len(tokenizer(text)['input_ids'])

def compute_percentile(split, P):
    """Compute Pth percentile of number of tokens in texts of a given split"""
    num_tokens = [get_tokens(dataset[split][i]["Text"]) for i in range(len(dataset[split]))]
    return np.percentile(num_tokens, P)

def preprocess(examples, tokenizer, labels):
    """Encode texts with labels for training, handling tokenization and label encoding."""
    text = examples["Text"]
    # Perform tokenization
    encoding = tokenizer(text, padding="max_length", truncation=True, max_length=512)  # Assume max_tokens=512 for this example
    
    # Initialize a matrix for labels
    labels_matrix = np.zeros((len(text), len(labels)))

    # Fill the labels matrix with values where appropriate
    for idx, label in enumerate(labels):
        if label in examples:
            labels_matrix[:, idx] = examples[label]
    
    # Update encoding to include labels
    encoding["labels"] = torch.tensor(labels_matrix, dtype=torch.float32)
    return encoding

def multi_label_metrics(predictions: np.array, labels: np.array, threshold: float = 0.5) -> dict:
    """
    Calculate classification metrics for multi-label classification.
    :param predictions: The raw output predictions from the model.
    :param labels: The ground truth labels.
    :param threshold: The threshold for converting probabilities to binary predictions.
    :return: A dictionary containing precision, recall, F1 score, ROC AUC score, and Krippendorff's alpha.
    """
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    y_pred = (probs >= threshold).numpy().astype(int)
    av = "micro"
    metrics = {
        'precision': precision_score(y_true=labels, y_pred=y_pred, average=av),
        'recall': recall_score(y_true=labels, y_pred=y_pred, average=av),
        'f1': f1_score(y_true=labels, y_pred=y_pred, average=av),
        'roc_auc': roc_auc_score(y_true=labels, y_score=probs, average=av),
        'krippendorff_alpha': krippendorff.alpha(reliability_data=np.vstack((labels.ravel(), y_pred.ravel())))
    }
    return metrics

def compute_metrics(eval_prediction: EvalPrediction) -> dict:
    """
    Wrapper function for computing multi-label metrics using EvalPrediction object.
    """
    preds = eval_prediction.predictions[0] if isinstance(eval_prediction.predictions, tuple) else eval_prediction.predictions
    return multi_label_metrics(predictions=preds, labels=eval_prediction.label_ids)

def seed_everything(seed = 42):
    """Seed everything for replicability. Largely works (especially on cuda, but not so much on Apple silicone (mps))"""
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

## 4. Load and Prepare Data

**Data Loading**: The dataset is loaded from a CSV file. Ensure the path matches where your CSV file is stored.

**String to List Conversion**: The brand_label and emotion_label columns, stored as string representations of lists, are converted back into Python lists using ast.literal_eval. This is necessary because the model expects actual lists, not string representations.

**Data Splitting**: The data is split into training and test datasets. This step is crucial for training and validating the model effectively.

**Dataset Creation**: The pandas DataFrame is converted into a DatasetDict, which is compatible with the Hugging Face Trainers.

**Tokenization**: Text data is tokenized using a tokenizer matched to the pre-trained model you intend to use (here, distilroberta-base). This step converts text into a format that the model can process.

**Encoding and Formatting**: The tokenized data is then formatted for PyTorch. This includes setting up the necessary columns like input_ids and attention_mask, which are used by the model during training.

In [34]:
import pandas as pd
import json

# Load Data
file_path = "./Data/labeled_data_cleaned.csv"
df = pd.read_csv(file_path)

# Convert JSON-like string to list
df['emotion_label'] = df['emotion_label'].apply(lambda x: json.loads(x.replace("'", '"')))
df['brand_label'] = df['brand_label'].apply(lambda x: json.loads(x.replace("'", '"')))

# Define emotion and brand categories
positive_emotions = {'admiration', 'amazed', 'amazing', 'amused', 'amusement', 'appreciate', 'awesome', 
                     'better', 'breathtaking', 'calm', 'confident', 'cool', 'curious', 'enjoyed', 
                     'excellent', 'excited', 'fantastic', 'fine', 'glad', 'good', 'grateful', 'great', 
                     'happy', 'hope', 'impressed', 'incredible', 'inspired', 'interested', 'laugh', 
                     'like', 'love', 'nice', 'obsessed', 'pride', 'proud', 'relaxed', 'relief', 
                     'respect', 'satisfied', 'satisfying', 'support', 'thankful', 'thrilled', 
                     'trust', 'wonderful', 'wow'}

negative_emotions = {'angry', 'annoyed', 'anxious', 'bad', 'bored', 'cold', 'concerned', 'confused', 
                     'confusion', 'despise', 'disappointed', 'dislike', 'doubt', 'embarrassed', 
                     'expensive', 'frustrating', 'guilty', 'hard', 'hate', 'horrible', 'irritated', 
                     'jealous', 'perplexed', 'question', 'sad', 'shocked', 'upset', 'worried', 
                     'worse', 'worst'}

neutral_emotions = {'neutral', 'okay'}

brand_categories = {'product quality', 'reputation & heritage', 'customer service', 'social impact', 
                    'ethical practices', 'sustainability'}

# Function to categorize emotions and brands
def categorize_labels(emotions, brands):
    positive = any(emotion in positive_emotions for emotion in emotions)
    negative = any(emotion in negative_emotions for emotion in emotions)
    neutral = any(emotion in neutral_emotions for emotion in emotions)
    brand_dict = {brand: (brand in brands) for brand in brand_categories}
    return pd.Series([positive, negative, neutral] + list(brand_dict.values()))

# Apply function and assign new columns
new_columns = ['positive', 'negative', 'neutral'] + list(brand_categories)
df[new_columns] = df.apply(lambda row: categorize_labels(row['emotion_label'], row['brand_label']), axis=1)

# Set index name
df.index.name = 'ID'

# Drop the original brand_label and emotion_label columns
df.drop(['brand_label', 'emotion_label'], axis=1, inplace=True)

# Display the dataframe
df.head()

Unnamed: 0_level_0,Text,positive,negative,neutral,social impact,product quality,customer service,sustainability,reputation & heritage,ethical practices
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,"maybe simpsons real cartoon,",True,True,False,True,True,True,True,True,True
1,collab balenciaga yall looks thing simpson put...,False,True,False,False,True,False,False,False,False
2,"videos created,",True,False,False,False,False,False,False,True,False
3,actually plan create like plan us believe simp...,False,True,False,False,False,False,False,True,False
4,"ill never understand fashion,",False,True,False,False,False,False,False,True,False


## 5. Prepare Data(Step 2) and Setup for Model Training

**Stratified Splitting**
First, we identify the label with the fewest samples (the minority label). This is crucial to ensure that both the training and testing datasets have a balanced representation of all labels, which helps prevent the model from developing a bias toward more frequent labels.

**Train-Test Split**
Using scikit-learn's train_test_split, we split the DataFrame into training and testing sets. The split is stratified by the minority label identified in the previous step, ensuring each split has proportional representation of each label.

**Conversion to Hugging Face Datasets**
The training and testing DataFrames are then converted into Dataset objects from the Hugging Face datasets library. This conversion facilitates the integration of our data with the Hugging Face ecosystem, enabling more efficient data manipulation and model training.

**Label Dictionary Construction**
For effective model training and evaluation, it is essential to map labels to indices and vice versa. We extract the labels from the training dataset (excluding the 'Text' column) and create dictionaries to map label names to indices (label2id) and indices to label names (id2label).

In [61]:
# Identify the minority label for stratification to ensure training and test sets are balanced
# Assuming the labels are in one-hot encoded format from column index 1 onwards
minority_label = df.iloc[:, 1:].sum().idxmin()

# Split the DataFrame into train and test sets, stratified by the minority label
train_df, test_df = train_test_split(df, test_size=T, random_state=seed, stratify=df[minority_label])

# Convert the pandas dataframes into Hugging Face datasets for easy use with the transformers library
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)
dataset = DatasetDict({
    'train': train_dataset,
    'test': test_dataset
})

# Extract labels from the training dataset; assumes that the first column is 'Text' and should be excluded
labels = [label for label in train_df.columns if label not in ['Text']]
id2label = {idx: label for idx, label in enumerate(labels)}
label2id = {label: idx for idx, label in enumerate(labels)}

# Print label dictionaries to confirm their correct setup
print("ID to Label:", id2label)
print("Label to ID:", label2id)

ID to Label: {0: 'positive', 1: 'negative', 2: 'neutral', 3: 'social impact', 4: 'product quality', 5: 'customer service', 6: 'sustainability', 7: 'reputation & heritage', 8: 'ethical practices'}
Label to ID: {'positive': 0, 'negative': 1, 'neutral': 2, 'social impact': 3, 'product quality': 4, 'customer service': 5, 'sustainability': 6, 'reputation & heritage': 7, 'ethical practices': 8}


## 6. Tokenization and Data Encoding

***Seed Setting for Reproducibility***

**Purpose:** To ensure that the results of the model training are reproducible. This involves setting the seed for random number generation in various libraries.

**Details:**
- `PYTHONHASHSEED` is set to make hash-based operations predictable.
- `TOKENIZERS_PARALLELISM` is set to 'false' to avoid potential issues with parallel processing in tokenization.
- Seeds for `numpy`, `random`, and `torch` are set to ensure that operations involving randomness yield the same result each time they are run.
- If a CUDA-enabled GPU is available, its random seed is also set.

***Tokenizer Loading***

**Purpose:** To load a pretrained tokenizer that is used to convert text into a format suitable for model training.

**Details:** 
- The `AutoTokenizer` class from the Hugging Face Transformers library is used to load a tokenizer that corresponds to the pretrained model specified by the `pretrained` variable.

***Token Count and Percentile Calculation***

**Purpose:** To determine the maximum sequence length for tokenization based on the actual length distribution of the dataset.

**Details:**
- A function `compute_percentile` calculates token counts for texts and determines the specified percentile of these counts. This helps in setting a uniform sequence length.
- The `higher_percentile` is computed as the maximum of the specified percentiles from both the training and testing datasets.

***Tokenization and Dataset Preparation***

**Purpose:** To prepare the dataset for training by tokenizing the text and setting it up in a format compatible with PyTorch.

**Details:**
- A custom function `tokenize_and_encode` applies the tokenizer to each example in the dataset, ensuring texts are truncated and padded to the length determined by `higher_percentile`.
- The dataset is then formatted to include only the fields necessary for model training (`input_ids` and `attention_mask`).


In [69]:
from transformers import AutoTokenizer
import numpy as np
import torch

# Load pretrained tokenizer
tokenizer = AutoTokenizer.from_pretrained(pretrained)

# Labels as defined
labels = ['positive', 'negative', 'neutral', 'social impact', 'product quality', 'customer service', 'sustainability', 'reputation & heritage', 'ethical practices']

# Assuming dataset is your DatasetDict that includes 'train' and 'test'
# Apply preprocessing, tokenize text, and prepare labels
encoded_dataset = dataset.map(lambda batch: preprocess(batch, tokenizer, labels), batched=True)

# Set dataset format for PyTorch
encoded_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])

# Check the first example from the train set
print("Sample data:", encoded_dataset['train'][0])

Map:   0%|          | 0/4615 [00:00<?, ? examples/s]

Map:   0%|          | 0/1154 [00:00<?, ? examples/s]

Sample data: {'input_ids': tensor([    0, 27333, 46999,  1790,     6,     2,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,  

## 7. Set-up Fine-Tuning of LLM

In [70]:
# Instantiate Classifier
    # setting "ignore_mismatched_sizes" to "True" if fine-tuning a pre-trained classification model with different class numbers
    # You should get several warnings about weights of checkpoint not being used in initialization.
    # This is expected since you will train the pretrained model on downstream task.
model = AutoModelForSequenceClassification.from_pretrained(pretrained,
                                                           problem_type="multi_label_classification", # vs. multi-class or binary!
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)
                                                           #ignore_mismatched_sizes=True) 

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [71]:
# Set Training Arguments
training_args = TrainingArguments(
    output_dir=f"{Training_Path}",
    evaluation_strategy="epoch",
    logging_dir=f"{Training_Path}/Logs",
    logging_strategy="steps",
    logging_steps=10,
    per_device_train_batch_size=hyperparameters['per_device_train_batch_size'],
    per_device_eval_batch_size= hyperparameters['per_device_train_batch_size'],
    num_train_epochs=hyperparameters['num_train_epochs'],
    learning_rate=hyperparameters['learning_rate'],
    weight_decay=hyperparameters['weight_decay'],
    warmup_steps=hyperparameters['warmup_steps'],
    save_strategy="epoch",
    load_best_model_at_end=True,
    save_total_limit=2,
    use_mps_device=(device == "mps"),
    optim='adamw_torch',
    seed=seed
    # ---> You can also do a more granular evaluation than epochs at every 100 (or so) steps
    #evaluation_strategy=IntervalStrategy.STEPS,  # Evaluate every 'eval_steps'
    #eval_steps=100,                              # Evaluate every 100 steps
    #do_train=True,
    #do_eval=True,
    #save_strategy=IntervalStrategy.STEPS,        # Save every 'save_steps'
    #save_steps=100,                              # Save every 100 steps
)

# Instantiate Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
print("Ready to Create Synthetic Expert")

Ready to Create Synthetic Expert


## 8. Fine-Tune and Evaluate

In [73]:
# Fine-tune the model with trainer to create Synthetic Expert
print(f"Started training with seed {seed} at {datetime.now()}\nFine-tuning {pretrained}")
trainer.train()
print(f"Completed training at {datetime.now()}")

Started training with seed 42 at 2024-05-04 01:33:57.431203
Fine-tuning distilroberta-base


  0%|          | 0/1156 [00:00<?, ?it/s]

{'loss': 0.4909, 'grad_norm': 1.4333361387252808, 'learning_rate': 1.34e-07, 'epoch': 0.03}
{'loss': 0.469, 'grad_norm': 0.8542483448982239, 'learning_rate': 2.68e-07, 'epoch': 0.07}
{'loss': 0.4656, 'grad_norm': 1.0602748394012451, 'learning_rate': 4.0199999999999997e-07, 'epoch': 0.1}
{'loss': 0.4636, 'grad_norm': 0.9330105781555176, 'learning_rate': 5.36e-07, 'epoch': 0.14}
{'loss': 0.443, 'grad_norm': 0.9150786399841309, 'learning_rate': 6.7e-07, 'epoch': 0.17}
{'loss': 0.4824, 'grad_norm': 1.0391064882278442, 'learning_rate': 8.039999999999999e-07, 'epoch': 0.21}
{'loss': 0.4677, 'grad_norm': 1.0519721508026123, 'learning_rate': 9.380000000000002e-07, 'epoch': 0.24}
{'loss': 0.4519, 'grad_norm': 0.9426830410957336, 'learning_rate': 1.072e-06, 'epoch': 0.28}
{'loss': 0.4569, 'grad_norm': 1.7580785751342773, 'learning_rate': 1.206e-06, 'epoch': 0.31}
{'loss': 0.4543, 'grad_norm': 1.0924509763717651, 'learning_rate': 1.34e-06, 'epoch': 0.35}
{'loss': 0.4429, 'grad_norm': 0.9584240317

  0%|          | 0/73 [00:00<?, ?it/s]

{'eval_loss': 0.4146001935005188, 'eval_precision': 0.7239462714219546, 'eval_recall': 0.533993850358729, 'eval_f1': 0.6146283916633897, 'eval_roc_auc': 0.8623828941568881, 'eval_krippendorff_alpha': 0.4897007728700923, 'eval_runtime': 330.2512, 'eval_samples_per_second': 3.494, 'eval_steps_per_second': 0.221, 'epoch': 1.0}
{'loss': 0.4525, 'grad_norm': 1.783381462097168, 'learning_rate': 3.886e-06, 'epoch': 1.0}
{'loss': 0.4335, 'grad_norm': 1.6590379476547241, 'learning_rate': 4.02e-06, 'epoch': 1.04}
{'loss': 0.4052, 'grad_norm': 1.534212350845337, 'learning_rate': 4.154e-06, 'epoch': 1.07}
{'loss': 0.4413, 'grad_norm': 0.9963368773460388, 'learning_rate': 4.288e-06, 'epoch': 1.11}
{'loss': 0.4114, 'grad_norm': 2.767707586288452, 'learning_rate': 4.422e-06, 'epoch': 1.14}
{'loss': 0.4193, 'grad_norm': 2.838695764541626, 'learning_rate': 4.556e-06, 'epoch': 1.18}
{'loss': 0.4339, 'grad_norm': 1.2511861324310303, 'learning_rate': 4.69e-06, 'epoch': 1.21}
{'loss': 0.3931, 'grad_norm': 

KeyboardInterrupt: 

In [None]:
# Evaluate Synthetic Expert on test data
print("Model performance on Test")
trainer.evaluate()

In [None]:
# Evaluate Synthetic Expert on train data
print("Model performance on Train")
trainer.eval_dataset = encoded_dataset["train"]
trainer.evaluate()

In [None]:
# Save fine-tuned model
trainer.save_model(f"{Training_Path}/brand_perception_expert")
print("Your Synthetic Expert was saved!")