# ESA Project: Fake or Real: The Impostor Hunt in Texts

This notebook is dedicated to **model training**.  
It covers:

- Load the pre-tokenized and saved PyTorch datasets (`tokenized_train.pt` and `tokenized_val.pt`) created in the data creation step.
- Instantiate the pre-trained **DistilBertForSequenceClassification** model, configured for a binary classification task (Real or fake).
- Define the **TrainingArguments** for the Hugging Face Trainer, including hyperparameters such as batch size, learning rate, number of epochs, and logging settings.
- Implemente a function to compute and track key performance metrics on the validation set including **Accuracy, Precision, Recall and F1-score**.
- Executing the training loop using the Hugging Face **Trainer** class which manages the entire process including checkpointing and model saving.
- Save the best-performing model and its tokenizer to disk for later use in inference and deployment.

# Import librairies

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import re

import torch
from torch.nn import CrossEntropyLoss
from transformers import (
    DistilBertForSequenceClassification, 
    Trainer, 
    TrainingArguments, 
    AutoTokenizer,
    TrainerCallback, 
    TrainerState, 
    TrainerControl,
    EarlyStoppingCallback
)
from transformers import AutoConfig, AutoModelForSequenceClassification

from sklearn.model_selection import KFold
from sklearn.metrics import (
    accuracy_score, 
    precision_recall_fscore_support, 
    confusion_matrix
)

import os
import sys

# Add the src folder to Python path
sys.path.append(os.path.abspath(os.path.join('..', 'src')))
import config
from preprocessing import TextPreprocessor, get_text_statistics

import warnings
warnings.filterwarnings("ignore")

sns.set_theme()

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/photoli93/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Create Custom Dataset Class

In [2]:
class TextPairDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx], dtype=torch.long)
        return item

    def __len__(self):
        return len(self.labels)

# Load tokenized datasets

In [3]:
TOKENIZED_TRAIN_PATH = config.PROCESSED_DATA_DIR / "tokenized_train.pt"
TOKENIZED_VAL_PATH = config.PROCESSED_DATA_DIR / "tokenized_val.pt"

OUTPUT_DIR = config.OUTPUT_DIR / "distilbert_fake_or_real"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Load tokenized datasets
print(f"Loading tokenized training data from: {TOKENIZED_TRAIN_PATH}")
try:
    train_dataset = torch.load(TOKENIZED_TRAIN_PATH, weights_only=False)
    val_dataset = torch.load(TOKENIZED_VAL_PATH, weights_only=False)
except FileNotFoundError:
    print("Error: Tokenized data not found. Please ensure 03_dataset_creation.py (or your data creation notebook) was run successfully")
    sys.exit(1)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")

Loading tokenized training data from: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/data/processed/tokenized_train.pt
Training dataset size: 212
Validation dataset size: 41


# Load model and tokenizer

In [4]:
# Load the model for sequence classification with 2 labels (Real/Fake)
print(f"Loading model: {config.TOKENIZER_NAME} for Sequence Classification")
model = DistilBertForSequenceClassification.from_pretrained(
    config.TOKENIZER_NAME, 
    num_labels=2
)

# Might need the tokenizer later for prediction/evaluation
tokenizer = AutoTokenizer.from_pretrained(config.TOKENIZER_NAME)

Loading model: distilbert-base-uncased for Sequence Classification


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Define parameters and metrics

In [5]:
# Detect device
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

# Training configuration
training_args = TrainingArguments(
    output_dir=str(OUTPUT_DIR),          
    num_train_epochs=20,                 # 20 epochs but with early stopping to avoid overfitting
    per_device_train_batch_size=16,      
    per_device_eval_batch_size=64,       
    warmup_steps=500,                    # Nb of steps during which the learning rate increases learning
    weight_decay=0.01,                   # Regularization term to reduce overfitting by penalizing large weights
    logging_dir='./logs',                
    logging_steps=50,                    # Log metrics every 50 training steps
    evaluation_strategy="epoch",         # Evaluate at the end of each epoch
    save_strategy="epoch",               # Save model checkpoint at the end of each epoch
    save_total_limit=2,                  # Keep only 2 best checkpoints
    load_best_model_at_end=True,         # Load the best model found during training
    metric_for_best_model="eval_f1",     # Metric to monitor for best model
    greater_is_better=True,              # Looking to maximize the f1 metric
    fp16=False,                          # Disable for Mac, Transformers will automatically use MPS
    report_to="none"                     # Disables wandb, tensorboard, etc. (disabling sending logs online)
)

# Define Evaluation Metric
try:
    from sklearn.metrics import accuracy_score, precision_recall_fscore_support
    
    # p is an object passed by the Hugging Face Trainer. It contains "predictions" and "label_ids"
    def compute_metrics(p):
        preds = np.argmax(p.predictions, axis=1)
        precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='binary')
        acc = accuracy_score(p.label_ids, preds)
        return {
            'accuracy': acc,
            'f1': f1,
            'precision': precision,
            'recall': recall
        }
except ImportError:
    print("Warning: scikit-learn not found. Install with 'pip install scikit-learn' to use advanced metrics")
    def compute_metrics(p):
        return {}

Using device: mps


# Custom Loss function

By exploding chunks, it has added unbalance in train dataset (138 for class 0 and 74 for class 1) so custom loss function has to be defined in order to add weights on classes

In [6]:
train_df = pd.read_csv(config.PROCESSED_DATA_DIR / "train_exploded.csv")

# Count of each class
counts = train_df['label'].value_counts().sort_index()
print(counts)

# Dynamic class weights (inverse frequency)
weights = 1.0 / counts
weights = weights / weights.sum()  # Normalize
weights = torch.tensor(weights.values, dtype=torch.float)
print(weights)

# Override the default loss
def compute_loss(model, inputs, return_outputs=False):
    labels = inputs.pop("labels")
    outputs = model(**inputs)
    logits = outputs.logits
    loss_fct = CrossEntropyLoss(weight=weights.to(logits.device))
    loss = loss_fct(logits, labels)
    return (loss, outputs) if return_outputs else loss

label
0    138
1     74
Name: count, dtype: int64
tensor([0.3491, 0.6509])


# Custom subclass Trainer

Since in Hugging Face Transformers, the **Trainer** class does not have a `compute_loss` argument in its constructor, I created a subclass called **WeightedTrainer** to use the custom `compute_loss` method.

In [7]:
class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        
        # Define class weights
        weights = torch.tensor([1.0, 138/74]).to(logits.device)
        loss_fct = CrossEntropyLoss(weight=weights)
        
        loss = loss_fct(logits, labels)
        return (loss, outputs) if return_outputs else loss

# Custom Confusion matrix callback

In [None]:
class ConfusionMatrixCallback(TrainerCallback):
    def __init__(self):
        self.trainer = None
        self.best_epoch = 0
        self.best_f1 = -float('inf')
    
    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        """Called after evaluation"""
        if not hasattr(self, 'trainer') or self.trainer is None:
            print("Trainer not set in ConfusionMatrixCallback")
            return
        
        try:
            # Get predictions
            preds_output = self.trainer.predict(self.trainer.eval_dataset)
            preds = np.argmax(preds_output.predictions, axis=1)
            labels = preds_output.label_ids
            
            # Compute confusion matrix
            cm = confusion_matrix(labels, preds)
            
            # Get current metrics
            current_epoch = int(state.epoch) if state.epoch else 0
            current_f1 = metrics.get('eval_f1', 0) if metrics else 0
            
            # Track best
            is_best = False
            if current_f1 > self.best_f1:
                self.best_f1 = current_f1
                self.best_epoch = current_epoch
                is_best = True
            
            # Print header
            print(f"\n{'='*80}")
            if is_best:
                print(f"[Epoch {current_epoch}] Confusion Matrix (New best)")
            else:
                print(f"[Epoch {current_epoch}] Confusion Matrix")
            print(f"{'='*80}")
            
            # Print confusion matrix with labels
            print("Confusion Matrix:")
            print("              Predicted")
            print("           Real    Fake")
            print(f"Real   [[{cm[0][0]:4d}    {cm[0][1]:4d}]]")
            print(f"Fake   [[{cm[1][0]:4d}    {cm[1][1]:4d}]]")
            
            # Calculate per-class metrics
            tn, fp, fn, tp = cm.ravel()
            
            # Class metrics
            real_precision = tn / (tn + fn) if (tn + fn) > 0 else 0
            real_recall = tn / (tn + fp) if (tn + fp) > 0 else 0
            fake_precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            fake_recall = tp / (tp + fn) if (tp + fn) > 0 else 0
            
            print(f"\nPer-Class Metrics:")
            print(f"  Real: Precision={real_precision:.3f}, Recall={real_recall:.3f}")
            print(f"  Fake: Precision={fake_precision:.3f}, Recall={fake_recall:.3f}")
            
            # Overall metrics
            if metrics:
                print(f"\nOverall Metrics:")
                print(f"  Accuracy:  {metrics.get('eval_accuracy', 0):.4f}")
                print(f"  F1:        {metrics.get('eval_f1', 0):.4f}")
                print(f"  Precision: {metrics.get('eval_precision', 0):.4f}")
                print(f"  Recall:    {metrics.get('eval_recall', 0):.4f}")
                print(f"  Loss:      {metrics.get('eval_loss', 0):.4f}")
            
            print(f"\nBest so far: Epoch {self.best_epoch}, F1={self.best_f1:.4f}")
            print(f"{'='*80}\n")
            
        except Exception as e:
            print(f"Error in ConfusionMatrixCallback: {e}")
    
    def on_train_end(self, args, state, control, **kwargs):
        """Verify best model was loaded"""
        print(f"\n{'='*80}")
        print(f"TRAINING COMPLETE")
        print(f"{'='*80}")
        print(f"Best Epoch: {self.best_epoch}")
        print(f"Best F1: {self.best_f1:.4f}")
        
        if state.best_model_checkpoint:
            print(f"Best checkpoint: {state.best_model_checkpoint}")
            from pathlib import Path
            if Path(state.best_model_checkpoint).exists():
                print(f"Checkpoint verified on disk")
            else:
                print(f"Checkpoint NOT found")
        else:
            print(f"No best checkpoint saved")
        
        # Final verification
        if hasattr(self, 'trainer') and self.trainer is not None:
            print(f"\nVerifying loaded model")
            try:
                preds_output = self.trainer.predict(self.trainer.eval_dataset)
                preds = np.argmax(preds_output.predictions, axis=1)
                labels = preds_output.label_ids
                
                from sklearn.metrics import f1_score
                final_f1 = f1_score(labels, preds, average='binary')
                
                print(f"Final Model F1: {final_f1:.4f}")
                print(f"Expected Best F1: {self.best_f1:.4f}")
                
                if abs(final_f1 - self.best_f1) < 0.001:
                    print(f"Correct: Best model was loaded")
                else:
                    print(f"Error: Wrong model! Expected {self.best_f1:.4f}, got {final_f1:.4f}")
            except Exception as e:
                print(f"Could not verify: {e}")
        
        print(f"{'='*80}\n")
    
    def set_trainer(self, trainer):
        self.trainer = trainer

# Training process

In [9]:
# Cross-validation settings
k = 4
kf = KFold(n_splits=k, shuffle=True, random_state=42)

all_fold_metrics = []

for fold, (train_idx, val_idx) in enumerate(kf.split(train_dataset)):
    print("-" * 80)
    print(f"\nFold {fold+1}/{k}")
    print("-" * 80)
    
    # Subsets for this fold
    train_fold = torch.utils.data.Subset(train_dataset, train_idx)
    val_fold = torch.utils.data.Subset(train_dataset, val_idx)
    
    # Load pre-trained model
    model_fold = AutoModelForSequenceClassification.from_pretrained(
        "distilbert-base-uncased",
        num_labels=2
    )
    
    # Callback for the confusion matrix
    cm_callback_fold = ConfusionMatrixCallback()
    
    # Init trainer for this fold
    trainer_fold = WeightedTrainer(
        model=model_fold,
        args=training_args,
        train_dataset=train_fold,
        eval_dataset=val_fold,
        compute_metrics=compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3),
                   cm_callback_fold]
    )
    
    cm_callback_fold.set_trainer(trainer_fold)
    
    # Training
    trainer_fold.train()

    # Print fold state for clarity
    print("-" * 80)
    print("\nFold trainer state")
    print(f"Best checkpoint: {trainer_fold.state.best_model_checkpoint}")
    print(f"Best metric value ({training_args.metric_for_best_model}): {trainer_fold.state.best_metric}")
    print("-" * 80)

    # Save best model of this fold (will save the reloaded best model)
    fold_model_path = OUTPUT_DIR / f"fold_{fold+1}"
    trainer_fold.save_model(str(fold_model_path))
    
    # Evaluation (will use the reloaded best model weights)
    metrics = trainer_fold.evaluate()
    all_fold_metrics.append(metrics)
    print(f"Fold {fold+1} evaluation (Best model metrics)")
    print(metrics)

    # Memory cleaning
    del model_fold, trainer_fold
    torch.mps.empty_cache()

# Average metrics on all the folds
avg_metrics = {k: np.mean([m[k] for m in all_fold_metrics if k in m]) for k in all_fold_metrics[0]}
print("\nCross-validation summary")
print(avg_metrics)

# Final training on full dataset
print("\nFinal training on full dataset")

final_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)

cm_callback_final = ConfusionMatrixCallback()
final_trainer = WeightedTrainer(
    model=final_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3),
               cm_callback_final]
)
cm_callback_final.set_trainer(final_trainer)

print("\nStarting final training on full training dataset")
final_trainer.train()

# The best model is automatically loaded into the trainer's model attribute
# because load_best_model_at_end=True was set in TrainingArguments.
# We save the model that is currently loaded in the trainer.

# Print final state for clarity
print("-" * 80)
print("\nFinal trainer state")
print(f"Best checkpoint: {final_trainer.state.best_model_checkpoint}")
print(f"Best metric value ({training_args.metric_for_best_model}): {final_trainer.state.best_metric}")
print("-" * 80)

# Final evaluation (will use the reloaded best model weights)
final_metrics = final_trainer.evaluate()

# Final save (will save the reloaded best model)
final_model_path = OUTPUT_DIR / "final_model"
final_trainer.save_model(str(final_model_path))
tokenizer.save_pretrained(str(final_model_path))
print(f"Final model and tokenizer saved to: {final_model_path}")


--------------------------------------------------------------------------------

Fold 1/4
--------------------------------------------------------------------------------


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.691260814666748, 'eval_accuracy': 0.6792452830188679, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.2064, 'eval_samples_per_second': 24.021, 'eval_steps_per_second': 0.453, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  36       0]]
Fake   [[  17       0]]

Per-Class Metrics:
  Real: Precision=0.679, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6792
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6913

Best So Far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6792891025543213, 'eval_accuracy': 0.6792452830188679, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.3006, 'eval_samples_per_second': 23.037, 'eval_steps_per_second': 0.435, 'epoch': 2.0}

[Epoch 2] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  36       0]]
Fake   [[  17       0]]

Per-Class Metrics:
  Real: Precision=0.679, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6792
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6793

Best So Far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6589536666870117, 'eval_accuracy': 0.6792452830188679, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.2885, 'eval_samples_per_second': 23.16, 'eval_steps_per_second': 0.437, 'epoch': 3.0}

[Epoch 3] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  36       0]]
Fake   [[  17       0]]

Per-Class Metrics:
  Real: Precision=0.679, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6792
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6590

Best So Far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.623315155506134, 'eval_accuracy': 0.7735849056603774, 'eval_f1': 0.625, 'eval_precision': 0.6666666666666666, 'eval_recall': 0.5882352941176471, 'eval_runtime': 2.308, 'eval_samples_per_second': 22.964, 'eval_steps_per_second': 0.433, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  31       5]]
Fake   [[   7      10]]

Per-Class Metrics:
  Real: Precision=0.816, Recall=0.861
  Fake: Precision=0.667, Recall=0.588

Overall Metrics:
  Accuracy:  0.7736
  F1:        0.6250
  Precision: 0.6667
  Recall:    0.5882
  Loss:      0.6233

Best So Far: Epoch 4, F1=0.6250

{'loss': 0.6698, 'grad_norm': 2.531451940536499, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5558594465255737, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7906976744186046, 'eval_precision': 0.6538461538461539, 'eval_recall': 1.0, 'eval_runtime': 2.2727, 'eval_samples_per_second': 23.32, 'eval_steps_per_second': 0.44, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       9]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.750
  Fake: Precision=0.654, Recall=1.000

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7907
  Precision: 0.6538
  Recall:    1.0000
  Loss:      0.5559

Best So Far: Epoch 5, F1=0.7907



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4649994373321533, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7906976744186046, 'eval_precision': 0.6538461538461539, 'eval_recall': 1.0, 'eval_runtime': 2.2785, 'eval_samples_per_second': 23.261, 'eval_steps_per_second': 0.439, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       9]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.750
  Fake: Precision=0.654, Recall=1.000

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7907
  Precision: 0.6538
  Recall:    1.0000
  Loss:      0.4650

Best So Far: Epoch 5, F1=0.7907



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.385510116815567, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7906976744186046, 'eval_precision': 0.6538461538461539, 'eval_recall': 1.0, 'eval_runtime': 2.3083, 'eval_samples_per_second': 22.961, 'eval_steps_per_second': 0.433, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       9]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.750
  Fake: Precision=0.654, Recall=1.000

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7907
  Precision: 0.6538
  Recall:    1.0000
  Loss:      0.3855

Best So Far: Epoch 5, F1=0.7907



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.33860349655151367, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8095238095238095, 'eval_precision': 0.68, 'eval_recall': 1.0, 'eval_runtime': 2.2848, 'eval_samples_per_second': 23.197, 'eval_steps_per_second': 0.438, 'epoch': 8.0}

[Epoch 8] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       8]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.778
  Fake: Precision=0.680, Recall=1.000

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8095
  Precision: 0.6800
  Recall:    1.0000
  Loss:      0.3386

Best So Far: Epoch 8, F1=0.8095



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3244887888431549, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8095238095238095, 'eval_precision': 0.68, 'eval_recall': 1.0, 'eval_runtime': 2.2779, 'eval_samples_per_second': 23.267, 'eval_steps_per_second': 0.439, 'epoch': 9.0}

[Epoch 9] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       8]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.778
  Fake: Precision=0.680, Recall=1.000

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8095
  Precision: 0.6800
  Recall:    1.0000
  Loss:      0.3245

Best So Far: Epoch 8, F1=0.8095

{'loss': 0.3832, 'grad_norm': 8.513765335083008, 'learning_rate': 1e-05, 'epoch': 10.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.32862576842308044, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7692307692307693, 'eval_precision': 0.6818181818181818, 'eval_recall': 0.8823529411764706, 'eval_runtime': 2.2591, 'eval_samples_per_second': 23.461, 'eval_steps_per_second': 0.443, 'epoch': 10.0}

[Epoch 10] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  29       7]]
Fake   [[   2      15]]

Per-Class Metrics:
  Real: Precision=0.935, Recall=0.806
  Fake: Precision=0.682, Recall=0.882

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7692
  Precision: 0.6818
  Recall:    0.8824
  Loss:      0.3286

Best So Far: Epoch 8, F1=0.8095



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.31711074709892273, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8, 'eval_precision': 0.6956521739130435, 'eval_recall': 0.9411764705882353, 'eval_runtime': 2.3101, 'eval_samples_per_second': 22.942, 'eval_steps_per_second': 0.433, 'epoch': 11.0}

[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  29       7]]
Fake   [[   1      16]]

Per-Class Metrics:
  Real: Precision=0.967, Recall=0.806
  Fake: Precision=0.696, Recall=0.941

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8000
  Precision: 0.6957
  Recall:    0.9412
  Loss:      0.3171

Best So Far: Epoch 8, F1=0.8095

{'train_runtime': 231.4845, 'train_samples_per_second': 13.737, 'train_steps_per_second': 0.864, 'train_loss': 0.49789215434681283, 'epoch': 11.0}

TRAINING COMPLETE
Best Epoch: 8
Best F1: 0.8095
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-80
Checkpoint verified 

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8095
Expected Best F1: 0.8095
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-80
Best metric value (eval_f1): 0.8095238095238095
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]


[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       8]]
Fake   [[   0      17]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.778
  Fake: Precision=0.680, Recall=1.000

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8095
  Precision: 0.6800
  Recall:    1.0000
  Loss:      0.3386

Best So Far: Epoch 8, F1=0.8095

Fold 1 evaluation (Best model metrics)
{'eval_loss': 0.33860349655151367, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8095238095238095, 'eval_precision': 0.68, 'eval_recall': 1.0, 'eval_runtime': 2.3221, 'eval_samples_per_second': 22.825, 'eval_steps_per_second': 0.431, 'epoch': 11.0}
--------------------------------------------------------------------------------

Fold 2/4
--------------------------------------------------------------------------------


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.7031173706054688, 'eval_accuracy': 0.5849056603773585, 'eval_f1': 0.08333333333333333, 'eval_precision': 0.25, 'eval_recall': 0.05, 'eval_runtime': 2.4022, 'eval_samples_per_second': 22.063, 'eval_steps_per_second': 0.416, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       3]]
Fake   [[  19       1]]

Per-Class Metrics:
  Real: Precision=0.612, Recall=0.909
  Fake: Precision=0.250, Recall=0.050

Overall Metrics:
  Accuracy:  0.5849
  F1:        0.0833
  Precision: 0.2500
  Recall:    0.0500
  Loss:      0.7031

Best So Far: Epoch 1, F1=0.0833



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6899948716163635, 'eval_accuracy': 0.660377358490566, 'eval_f1': 0.18181818181818182, 'eval_precision': 1.0, 'eval_recall': 0.1, 'eval_runtime': 2.3599, 'eval_samples_per_second': 22.459, 'eval_steps_per_second': 0.424, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  33       0]]
Fake   [[  18       2]]

Per-Class Metrics:
  Real: Precision=0.647, Recall=1.000
  Fake: Precision=1.000, Recall=0.100

Overall Metrics:
  Accuracy:  0.6604
  F1:        0.1818
  Precision: 1.0000
  Recall:    0.1000
  Loss:      0.6900

Best So Far: Epoch 2, F1=0.1818



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6691250205039978, 'eval_accuracy': 0.7547169811320755, 'eval_f1': 0.5806451612903226, 'eval_precision': 0.8181818181818182, 'eval_recall': 0.45, 'eval_runtime': 2.3205, 'eval_samples_per_second': 22.84, 'eval_steps_per_second': 0.431, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  31       2]]
Fake   [[  11       9]]

Per-Class Metrics:
  Real: Precision=0.738, Recall=0.939
  Fake: Precision=0.818, Recall=0.450

Overall Metrics:
  Accuracy:  0.7547
  F1:        0.5806
  Precision: 0.8182
  Recall:    0.4500
  Loss:      0.6691

Best So Far: Epoch 3, F1=0.5806



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6372031569480896, 'eval_accuracy': 0.7924528301886793, 'eval_f1': 0.7755102040816326, 'eval_precision': 0.6551724137931034, 'eval_recall': 0.95, 'eval_runtime': 2.339, 'eval_samples_per_second': 22.659, 'eval_steps_per_second': 0.428, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      10]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.958, Recall=0.697
  Fake: Precision=0.655, Recall=0.950

Overall Metrics:
  Accuracy:  0.7925
  F1:        0.7755
  Precision: 0.6552
  Recall:    0.9500
  Loss:      0.6372

Best So Far: Epoch 4, F1=0.7755

{'loss': 0.6676, 'grad_norm': 1.9820423126220703, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5768534541130066, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.8085106382978723, 'eval_precision': 0.7037037037037037, 'eval_recall': 0.95, 'eval_runtime': 2.3038, 'eval_samples_per_second': 23.005, 'eval_steps_per_second': 0.434, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25       8]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.758
  Fake: Precision=0.704, Recall=0.950

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.8085
  Precision: 0.7037
  Recall:    0.9500
  Loss:      0.5769

Best So Far: Epoch 5, F1=0.8085



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4866400957107544, 'eval_accuracy': 0.8113207547169812, 'eval_f1': 0.7916666666666666, 'eval_precision': 0.6785714285714286, 'eval_recall': 0.95, 'eval_runtime': 2.3557, 'eval_samples_per_second': 22.499, 'eval_steps_per_second': 0.425, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24       9]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.960, Recall=0.727
  Fake: Precision=0.679, Recall=0.950

Overall Metrics:
  Accuracy:  0.8113
  F1:        0.7917
  Precision: 0.6786
  Recall:    0.9500
  Loss:      0.4866

Best So Far: Epoch 5, F1=0.8085



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.41160812973976135, 'eval_accuracy': 0.7924528301886793, 'eval_f1': 0.7755102040816326, 'eval_precision': 0.6551724137931034, 'eval_recall': 0.95, 'eval_runtime': 2.3446, 'eval_samples_per_second': 22.605, 'eval_steps_per_second': 0.427, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      10]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.958, Recall=0.697
  Fake: Precision=0.655, Recall=0.950

Overall Metrics:
  Accuracy:  0.7925
  F1:        0.7755
  Precision: 0.6552
  Recall:    0.9500
  Loss:      0.4116

Best So Far: Epoch 5, F1=0.8085



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3542517423629761, 'eval_accuracy': 0.8679245283018868, 'eval_f1': 0.8444444444444444, 'eval_precision': 0.76, 'eval_recall': 0.95, 'eval_runtime': 2.4084, 'eval_samples_per_second': 22.006, 'eval_steps_per_second': 0.415, 'epoch': 8.0}

[Epoch 8] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       6]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.964, Recall=0.818
  Fake: Precision=0.760, Recall=0.950

Overall Metrics:
  Accuracy:  0.8679
  F1:        0.8444
  Precision: 0.7600
  Recall:    0.9500
  Loss:      0.3543

Best So Far: Epoch 8, F1=0.8444



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3541967570781708, 'eval_accuracy': 0.8113207547169812, 'eval_f1': 0.7916666666666666, 'eval_precision': 0.6785714285714286, 'eval_recall': 0.95, 'eval_runtime': 2.2954, 'eval_samples_per_second': 23.09, 'eval_steps_per_second': 0.436, 'epoch': 9.0}

[Epoch 9] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24       9]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.960, Recall=0.727
  Fake: Precision=0.679, Recall=0.950

Overall Metrics:
  Accuracy:  0.8113
  F1:        0.7917
  Precision: 0.6786
  Recall:    0.9500
  Loss:      0.3542

Best So Far: Epoch 8, F1=0.8444

{'loss': 0.3908, 'grad_norm': 8.959654808044434, 'learning_rate': 1e-05, 'epoch': 10.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3684075176715851, 'eval_accuracy': 0.8113207547169812, 'eval_f1': 0.7619047619047619, 'eval_precision': 0.7272727272727273, 'eval_recall': 0.8, 'eval_runtime': 2.3407, 'eval_samples_per_second': 22.643, 'eval_steps_per_second': 0.427, 'epoch': 10.0}

[Epoch 10] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       6]]
Fake   [[   4      16]]

Per-Class Metrics:
  Real: Precision=0.871, Recall=0.818
  Fake: Precision=0.727, Recall=0.800

Overall Metrics:
  Accuracy:  0.8113
  F1:        0.7619
  Precision: 0.7273
  Recall:    0.8000
  Loss:      0.3684

Best So Far: Epoch 8, F1=0.8444



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4221496284008026, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7692307692307693, 'eval_precision': 0.7894736842105263, 'eval_recall': 0.75, 'eval_runtime': 2.3225, 'eval_samples_per_second': 22.82, 'eval_steps_per_second': 0.431, 'epoch': 11.0}

[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  29       4]]
Fake   [[   5      15]]

Per-Class Metrics:
  Real: Precision=0.853, Recall=0.879
  Fake: Precision=0.789, Recall=0.750

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7692
  Precision: 0.7895
  Recall:    0.7500
  Loss:      0.4221

Best So Far: Epoch 8, F1=0.8444

{'train_runtime': 230.4294, 'train_samples_per_second': 13.8, 'train_steps_per_second': 0.868, 'train_loss': 0.5015023751692338, 'epoch': 11.0}

TRAINING COMPLETE
Best Epoch: 8
Best F1: 0.8444
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-80
Checkpoint verified on d

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8444
Expected Best F1: 0.8444
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-80
Best metric value (eval_f1): 0.8444444444444444
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       6]]
Fake   [[   1      19]]

Per-Class Metrics:
  Real: Precision=0.964, Recall=0.818
  Fake: Precision=0.760, Recall=0.950

Overall Metrics:
  Accuracy:  0.8679
  F1:        0.8444
  Precision: 0.7600
  Recall:    0.9500
  Loss:      0.3543

Best So Far: Epoch 8, F1=0.8444

Fold 2 evaluation (Best model metrics)
{'eval_loss': 0.3542517423629761, 'eval_accuracy': 0.8679245283018868, 'eval_f1': 0.8444444444444444, 'eval_precision': 0.76, 'eval_recall': 0.95, 'eval_runtime': 2.1316, 'eval_samples_per_second': 24.864, 'eval_steps_per_second': 0.469, 'epoch': 11.0}
--------------------------------------------------------------------------------

Fold 3/4
--------------------------------------------------------------------------------


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.703178346157074, 'eval_accuracy': 0.5660377358490566, 'eval_f1': 0.08, 'eval_precision': 0.25, 'eval_recall': 0.047619047619047616, 'eval_runtime': 2.3812, 'eval_samples_per_second': 22.258, 'eval_steps_per_second': 0.42, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  29       3]]
Fake   [[  20       1]]

Per-Class Metrics:
  Real: Precision=0.592, Recall=0.906
  Fake: Precision=0.250, Recall=0.048

Overall Metrics:
  Accuracy:  0.5660
  F1:        0.0800
  Precision: 0.2500
  Recall:    0.0476
  Loss:      0.7032

Best So Far: Epoch 1, F1=0.0800



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6904932856559753, 'eval_accuracy': 0.6226415094339622, 'eval_f1': 0.23076923076923078, 'eval_precision': 0.6, 'eval_recall': 0.14285714285714285, 'eval_runtime': 2.439, 'eval_samples_per_second': 21.73, 'eval_steps_per_second': 0.41, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       2]]
Fake   [[  18       3]]

Per-Class Metrics:
  Real: Precision=0.625, Recall=0.938
  Fake: Precision=0.600, Recall=0.143

Overall Metrics:
  Accuracy:  0.6226
  F1:        0.2308
  Precision: 0.6000
  Recall:    0.1429
  Loss:      0.6905

Best So Far: Epoch 2, F1=0.2308



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6716821193695068, 'eval_accuracy': 0.660377358490566, 'eval_f1': 0.35714285714285715, 'eval_precision': 0.7142857142857143, 'eval_recall': 0.23809523809523808, 'eval_runtime': 2.4294, 'eval_samples_per_second': 21.816, 'eval_steps_per_second': 0.412, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       2]]
Fake   [[  16       5]]

Per-Class Metrics:
  Real: Precision=0.652, Recall=0.938
  Fake: Precision=0.714, Recall=0.238

Overall Metrics:
  Accuracy:  0.6604
  F1:        0.3571
  Precision: 0.7143
  Recall:    0.2381
  Loss:      0.6717

Best So Far: Epoch 3, F1=0.3571



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6428462862968445, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.8260869565217391, 'eval_recall': 0.9047619047619048, 'eval_runtime': 2.466, 'eval_samples_per_second': 21.492, 'eval_steps_per_second': 0.406, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       4]]
Fake   [[   2      19]]

Per-Class Metrics:
  Real: Precision=0.933, Recall=0.875
  Fake: Precision=0.826, Recall=0.905

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8636
  Precision: 0.8261
  Recall:    0.9048
  Loss:      0.6428

Best So Far: Epoch 4, F1=0.8636

{'loss': 0.671, 'grad_norm': 1.8017971515655518, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.587121307849884, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.8085106382978723, 'eval_precision': 0.7307692307692307, 'eval_recall': 0.9047619047619048, 'eval_runtime': 2.416, 'eval_samples_per_second': 21.938, 'eval_steps_per_second': 0.414, 'epoch': 5.0}

[Epoch 5] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25       7]]
Fake   [[   2      19]]

Per-Class Metrics:
  Real: Precision=0.926, Recall=0.781
  Fake: Precision=0.731, Recall=0.905

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.8085
  Precision: 0.7308
  Recall:    0.9048
  Loss:      0.5871

Best So Far: Epoch 4, F1=0.8636



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.49164143204689026, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.8085106382978723, 'eval_precision': 0.7307692307692307, 'eval_recall': 0.9047619047619048, 'eval_runtime': 2.4475, 'eval_samples_per_second': 21.655, 'eval_steps_per_second': 0.409, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25       7]]
Fake   [[   2      19]]

Per-Class Metrics:
  Real: Precision=0.926, Recall=0.781
  Fake: Precision=0.731, Recall=0.905

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.8085
  Precision: 0.7308
  Recall:    0.9048
  Loss:      0.4916

Best So Far: Epoch 4, F1=0.8636



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.40291401743888855, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8333333333333334, 'eval_precision': 0.7407407407407407, 'eval_recall': 0.9523809523809523, 'eval_runtime': 2.4463, 'eval_samples_per_second': 21.666, 'eval_steps_per_second': 0.409, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25       7]]
Fake   [[   1      20]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.781
  Fake: Precision=0.741, Recall=0.952

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8333
  Precision: 0.7407
  Recall:    0.9524
  Loss:      0.4029

Best So Far: Epoch 4, F1=0.8636

{'train_runtime': 146.6356, 'train_samples_per_second': 21.686, 'train_steps_per_second': 1.364, 'train_loss': 0.6253912244524275, 'epoch': 7.0}

TRAINING COMPLETE
Best Epoch: 4
Best F1: 0.8636
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-40
Checkpoin

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8636
Expected Best F1: 0.8636
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-40
Best metric value (eval_f1): 0.8636363636363636
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       4]]
Fake   [[   2      19]]

Per-Class Metrics:
  Real: Precision=0.933, Recall=0.875
  Fake: Precision=0.826, Recall=0.905

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8636
  Precision: 0.8261
  Recall:    0.9048
  Loss:      0.6428

Best So Far: Epoch 4, F1=0.8636

Fold 3 evaluation (Best model metrics)
{'eval_loss': 0.6428462862968445, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.8260869565217391, 'eval_recall': 0.9047619047619048, 'eval_runtime': 2.2857, 'eval_samples_per_second': 23.188, 'eval_steps_per_second': 0.438, 'epoch': 7.0}
--------------------------------------------------------------------------------

Fold 4/4
--------------------------------------------------------------------------------


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6819256544113159, 'eval_accuracy': 0.6981132075471698, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.4227, 'eval_samples_per_second': 21.876, 'eval_steps_per_second': 0.413, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  37       0]]
Fake   [[  16       0]]

Per-Class Metrics:
  Real: Precision=0.698, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6981
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6819

Best So Far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6702266931533813, 'eval_accuracy': 0.7547169811320755, 'eval_f1': 0.3157894736842105, 'eval_precision': 1.0, 'eval_recall': 0.1875, 'eval_runtime': 2.4433, 'eval_samples_per_second': 21.692, 'eval_steps_per_second': 0.409, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  37       0]]
Fake   [[  13       3]]

Per-Class Metrics:
  Real: Precision=0.740, Recall=1.000
  Fake: Precision=1.000, Recall=0.188

Overall Metrics:
  Accuracy:  0.7547
  F1:        0.3158
  Precision: 1.0000
  Recall:    0.1875
  Loss:      0.6702

Best So Far: Epoch 2, F1=0.3158



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6498641967773438, 'eval_accuracy': 0.7358490566037735, 'eval_f1': 0.6111111111111112, 'eval_precision': 0.55, 'eval_recall': 0.6875, 'eval_runtime': 2.4342, 'eval_samples_per_second': 21.773, 'eval_steps_per_second': 0.411, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       9]]
Fake   [[   5      11]]

Per-Class Metrics:
  Real: Precision=0.848, Recall=0.757
  Fake: Precision=0.550, Recall=0.688

Overall Metrics:
  Accuracy:  0.7358
  F1:        0.6111
  Precision: 0.5500
  Recall:    0.6875
  Loss:      0.6499

Best So Far: Epoch 3, F1=0.6111



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6114247441291809, 'eval_accuracy': 0.7169811320754716, 'eval_f1': 0.6808510638297872, 'eval_precision': 0.5161290322580645, 'eval_recall': 1.0, 'eval_runtime': 2.4423, 'eval_samples_per_second': 21.701, 'eval_steps_per_second': 0.409, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      15]]
Fake   [[   0      16]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.595
  Fake: Precision=0.516, Recall=1.000

Overall Metrics:
  Accuracy:  0.7170
  F1:        0.6809
  Precision: 0.5161
  Recall:    1.0000
  Loss:      0.6114

Best So Far: Epoch 4, F1=0.6809

{'loss': 0.6524, 'grad_norm': 3.021094560623169, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5562828183174133, 'eval_accuracy': 0.7169811320754716, 'eval_f1': 0.6808510638297872, 'eval_precision': 0.5161290322580645, 'eval_recall': 1.0, 'eval_runtime': 2.4258, 'eval_samples_per_second': 21.849, 'eval_steps_per_second': 0.412, 'epoch': 5.0}

[Epoch 5] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      15]]
Fake   [[   0      16]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.595
  Fake: Precision=0.516, Recall=1.000

Overall Metrics:
  Accuracy:  0.7170
  F1:        0.6809
  Precision: 0.5161
  Recall:    1.0000
  Loss:      0.5563

Best So Far: Epoch 4, F1=0.6809



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4883077144622803, 'eval_accuracy': 0.7169811320754716, 'eval_f1': 0.6666666666666666, 'eval_precision': 0.5172413793103449, 'eval_recall': 0.9375, 'eval_runtime': 2.4112, 'eval_samples_per_second': 21.98, 'eval_steps_per_second': 0.415, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      14]]
Fake   [[   1      15]]

Per-Class Metrics:
  Real: Precision=0.958, Recall=0.622
  Fake: Precision=0.517, Recall=0.938

Overall Metrics:
  Accuracy:  0.7170
  F1:        0.6667
  Precision: 0.5172
  Recall:    0.9375
  Loss:      0.4883

Best So Far: Epoch 4, F1=0.6809



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.46777552366256714, 'eval_accuracy': 0.7169811320754716, 'eval_f1': 0.6808510638297872, 'eval_precision': 0.5161290322580645, 'eval_recall': 1.0, 'eval_runtime': 2.4758, 'eval_samples_per_second': 21.407, 'eval_steps_per_second': 0.404, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      15]]
Fake   [[   0      16]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.595
  Fake: Precision=0.516, Recall=1.000

Overall Metrics:
  Accuracy:  0.7170
  F1:        0.6809
  Precision: 0.5161
  Recall:    1.0000
  Loss:      0.4678

Best So Far: Epoch 4, F1=0.6809

{'train_runtime': 146.1072, 'train_samples_per_second': 21.765, 'train_steps_per_second': 1.369, 'train_loss': 0.5984152930123465, 'epoch': 7.0}

TRAINING COMPLETE
Best Epoch: 4
Best F1: 0.6809
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-40
Checkpoint verified on d

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.6809
Expected Best F1: 0.6809
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-40
Best metric value (eval_f1): 0.6808510638297872
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      15]]
Fake   [[   0      16]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.595
  Fake: Precision=0.516, Recall=1.000

Overall Metrics:
  Accuracy:  0.7170
  F1:        0.6809
  Precision: 0.5161
  Recall:    1.0000
  Loss:      0.6114

Best So Far: Epoch 4, F1=0.6809

Fold 4 evaluation (Best model metrics)
{'eval_loss': 0.6114247441291809, 'eval_accuracy': 0.7169811320754716, 'eval_f1': 0.6808510638297872, 'eval_precision': 0.5161290322580645, 'eval_recall': 1.0, 'eval_runtime': 2.2109, 'eval_samples_per_second': 23.972, 'eval_steps_per_second': 0.452, 'epoch': 7.0}

Cross-validation summary
{'eval_loss': 0.4867815673351288, 'eval_accuracy': 0.830188679245283, 'eval_f1': 0.7996139203586012, 'eval_precision': 0.6955539971949509, 'eval_recall': 0.9636904761904762, 'eval_runtime': 2.237575, 'eval_samples_per_second': 23.712249999999997, 'eval_steps_per_second': 0.447499

  0%|          | 0/280 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.7050134539604187, 'eval_accuracy': 0.5365853658536586, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.5352, 'eval_samples_per_second': 16.172, 'eval_steps_per_second': 0.394, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22       0]]
Fake   [[  19       0]]

Per-Class Metrics:
  Real: Precision=0.537, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.5366
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.7050

Best So Far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6804280281066895, 'eval_accuracy': 0.6585365853658537, 'eval_f1': 0.65, 'eval_precision': 0.6190476190476191, 'eval_recall': 0.6842105263157895, 'eval_runtime': 2.4568, 'eval_samples_per_second': 16.688, 'eval_steps_per_second': 0.407, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  14       8]]
Fake   [[   6      13]]

Per-Class Metrics:
  Real: Precision=0.700, Recall=0.636
  Fake: Precision=0.619, Recall=0.684

Overall Metrics:
  Accuracy:  0.6585
  F1:        0.6500
  Precision: 0.6190
  Recall:    0.6842
  Loss:      0.6804

Best So Far: Epoch 2, F1=0.6500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6258090734481812, 'eval_accuracy': 0.6829268292682927, 'eval_f1': 0.7450980392156863, 'eval_precision': 0.59375, 'eval_recall': 1.0, 'eval_runtime': 4.9783, 'eval_samples_per_second': 8.236, 'eval_steps_per_second': 0.201, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[   9      13]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.409
  Fake: Precision=0.594, Recall=1.000

Overall Metrics:
  Accuracy:  0.6829
  F1:        0.7451
  Precision: 0.5938
  Recall:    1.0000
  Loss:      0.6258

Best So Far: Epoch 3, F1=0.7451

{'loss': 0.652, 'grad_norm': 2.1886379718780518, 'learning_rate': 5e-06, 'epoch': 3.57}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5325445532798767, 'eval_accuracy': 0.7073170731707317, 'eval_f1': 0.76, 'eval_precision': 0.6129032258064516, 'eval_recall': 1.0, 'eval_runtime': 5.2054, 'eval_samples_per_second': 7.876, 'eval_steps_per_second': 0.192, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  10      12]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.455
  Fake: Precision=0.613, Recall=1.000

Overall Metrics:
  Accuracy:  0.7073
  F1:        0.7600
  Precision: 0.6129
  Recall:    1.0000
  Loss:      0.5325

Best So Far: Epoch 4, F1=0.7600



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.43350228667259216, 'eval_accuracy': 0.7804878048780488, 'eval_f1': 0.8085106382978723, 'eval_precision': 0.6785714285714286, 'eval_recall': 1.0, 'eval_runtime': 4.3695, 'eval_samples_per_second': 9.383, 'eval_steps_per_second': 0.229, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  13       9]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.591
  Fake: Precision=0.679, Recall=1.000

Overall Metrics:
  Accuracy:  0.7805
  F1:        0.8085
  Precision: 0.6786
  Recall:    1.0000
  Loss:      0.4335

Best So Far: Epoch 5, F1=0.8085



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3556208312511444, 'eval_accuracy': 0.8536585365853658, 'eval_f1': 0.85, 'eval_precision': 0.8095238095238095, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.4398, 'eval_samples_per_second': 9.235, 'eval_steps_per_second': 0.225, 'epoch': 6.0}

[Epoch 6] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  18       4]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.900, Recall=0.818
  Fake: Precision=0.810, Recall=0.895

Overall Metrics:
  Accuracy:  0.8537
  F1:        0.8500
  Precision: 0.8095
  Recall:    0.8947
  Loss:      0.3556

Best So Far: Epoch 6, F1=0.8500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.32451996207237244, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8717948717948718, 'eval_precision': 0.85, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.0219, 'eval_samples_per_second': 10.194, 'eval_steps_per_second': 0.249, 'epoch': 7.0}

[Epoch 7] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19       3]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.905, Recall=0.864
  Fake: Precision=0.850, Recall=0.895

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8718
  Precision: 0.8500
  Recall:    0.8947
  Loss:      0.3245

Best So Far: Epoch 7, F1=0.8718

{'loss': 0.3846, 'grad_norm': 3.981729030609131, 'learning_rate': 1e-05, 'epoch': 7.14}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.30064627528190613, 'eval_accuracy': 0.8536585365853658, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.76, 'eval_recall': 1.0, 'eval_runtime': 4.7301, 'eval_samples_per_second': 8.668, 'eval_steps_per_second': 0.211, 'epoch': 8.0}

[Epoch 8] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  16       6]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.727
  Fake: Precision=0.760, Recall=1.000

Overall Metrics:
  Accuracy:  0.8537
  F1:        0.8636
  Precision: 0.7600
  Recall:    1.0000
  Loss:      0.3006

Best So Far: Epoch 7, F1=0.8718



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3250519037246704, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8717948717948718, 'eval_precision': 0.85, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.4562, 'eval_samples_per_second': 9.201, 'eval_steps_per_second': 0.224, 'epoch': 9.0}

[Epoch 9] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19       3]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.905, Recall=0.864
  Fake: Precision=0.850, Recall=0.895

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8718
  Precision: 0.8500
  Recall:    0.8947
  Loss:      0.3251

Best So Far: Epoch 7, F1=0.8718



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5848348140716553, 'eval_accuracy': 0.8536585365853658, 'eval_f1': 0.8235294117647058, 'eval_precision': 0.9333333333333333, 'eval_recall': 0.7368421052631579, 'eval_runtime': 3.8268, 'eval_samples_per_second': 10.714, 'eval_steps_per_second': 0.261, 'epoch': 10.0}

[Epoch 10] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  21       1]]
Fake   [[   5      14]]

Per-Class Metrics:
  Real: Precision=0.808, Recall=0.955
  Fake: Precision=0.933, Recall=0.737

Overall Metrics:
  Accuracy:  0.8537
  F1:        0.8235
  Precision: 0.9333
  Recall:    0.7368
  Loss:      0.5848

Best So Far: Epoch 7, F1=0.8718

{'train_runtime': 319.056, 'train_samples_per_second': 13.289, 'train_steps_per_second': 0.878, 'train_loss': 0.44127853257315497, 'epoch': 10.0}

TRAINING COMPLETE
Best Epoch: 7
Best F1: 0.8718
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-98
Checkpo

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8718
Expected Best F1: 0.8718
Correct: Best model was loaded

--------------------------------------------------------------------------------

Final trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-98
Best metric value (eval_f1): 0.8717948717948718
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]


[Epoch 10] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19       3]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.905, Recall=0.864
  Fake: Precision=0.850, Recall=0.895

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8718
  Precision: 0.8500
  Recall:    0.8947
  Loss:      0.3245

Best So Far: Epoch 7, F1=0.8718

Final model and tokenizer saved to: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/final_model


## Observations
### 1. Cross-Validation Summary
The cross-validation results show a pretty good degree of variability in the model's performance across the first 3 folds but vary more across the last fold. This might suggests that the model's performance is sensitive to the specific data split

| Fold   | Best Epoch | Best eval_f1 | Best Checkpoint |
| ------ | ---------- | ------------ | --------------- |
| Fold 1 | 8          | 0.8095       | checkpoint-80   |
| Fold 2 | 8          | 0.8444       | checkpoint-80   |
| Fold 3 | 4          | 0.8636       | checkpoint-40   |
| Fold 4 | 4          | 0.6809       | checkpoint-40   |

<ins>NB:</ins> The average F1-score is approximately 0.7996

### 2. Final Training on Full Dataset
The final training run on the full dataset shows a strong result which aligns with the better-performing folds (folds 2 and 3)

| Epoch | `eval_f1` (Best metric) | `eval_accuracy` | `eval_loss`  | Best so far |
| ----- | --------------------- | ------------- | ---------- | ----------- |
| 1     | 0.0000                | 0.5366        | 0.7050     | Yes       |
| 2     | 0.6500                | 0.6585        | 0.6804     | Yes       |
| 5     | 0.8085                | 0.7805        | 0.4335     | Yes       |
| 6     | 0.8500                | 0.8537        | 0.3556     | Yes       |
| 7     | **0.8718**            | **0.8780**    | **0.3245** | Yes       |
| 8     | 0.8636                | 0.8537        | 0.3006     | No        |
| 9     | 0.8718                | 0.8780        | 0.3251     | No        |
| 10    | 0.8235                | 0.8537        | 0.5848     | No        |

<ins>Best Performance:</ins> The model achieved its peak F1-score of 0.8718 at Epoch 7.
The early stopping makes training terminated after the evaluation of Epoch 10 due to the `early_stopping_patience=3`

<ins>Final saved model:</ins> The final model saved to `/final_model` is confirmed to be the one from Epoch 7 (**checkpoint-98**), with an F1-score of 0.8718

### 3. Class-Specific Performance (Best Model - Epoch 7)
The confusion matrix for the final best model (Epoch 7) shows excellent and balanced performance :
|                 | **Predicted Real** | **Predicted Fake** | **Recall (TPR)**    |
| --------------- | ------------------ | ------------------ | ------------------- |
| **Actual Real** | 19 (True Negative) | 3 (False Positive) | 19 / 22 = **86.4%** |
| **Actual Fake** | 2 (False Negative) | 17 (True Positive) | 17 / 19 = **89.5%** |

The model correctly identifies 17 out of 19 actual Fake texts (Recall: 89.5%)

When the model predicts a text is Real, it is correct 19 out of 21 times (Precision: 90.5%)

## Conclusion
The training run was successful and the final model is robust and well-performing on the validation set. It has strong performance with the final DistilBert model

# End of model training notebook