# ESA Project: Fake or Real: The Impostor Hunt in Texts

This notebook is dedicated to **model training**.  
It covers:

- Load the pre-tokenized and saved PyTorch datasets (`tokenized_train.pt` and `tokenized_val.pt`) created in the data creation step.
- Instantiate the pre-trained **DistilBertForSequenceClassification** model, configured for a binary classification task (Real or fake).
- Define the **TrainingArguments** for the Hugging Face Trainer, including hyperparameters such as batch size, learning rate, number of epochs, and logging settings.
- Implemente a function to compute and track key performance metrics on the validation set including **Accuracy, Precision, Recall and F1-score**.
- Executing the training loop using the Hugging Face **Trainer** class which manages the entire process including checkpointing and model saving.
- Save the best-performing model and its tokenizer to disk for later use in inference and deployment.

# Import librairies

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import re

import torch
from torch.nn import CrossEntropyLoss
from transformers import (
    DistilBertForSequenceClassification, 
    Trainer, 
    TrainingArguments, 
    AutoTokenizer,
    TrainerCallback, 
    TrainerState, 
    TrainerControl,
    EarlyStoppingCallback
)
from transformers import AutoConfig, AutoModelForSequenceClassification

from sklearn.model_selection import GroupKFold
from sklearn.metrics import (
    accuracy_score, 
    precision_recall_fscore_support, 
    confusion_matrix
)

import os
import sys

# Add the src folder to Python path
sys.path.append(os.path.abspath(os.path.join('..', 'src')))
import config
from preprocessing import TextPreprocessor, get_text_statistics

import warnings
warnings.filterwarnings("ignore")

sns.set_theme()

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/photoli93/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Create Custom Dataset Class

In [2]:
class TextPairDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx], dtype=torch.long)
        return item

    def __len__(self):
        return len(self.labels)

# Load tokenized datasets

In [3]:
TOKENIZED_TRAIN_PATH = config.PROCESSED_DATA_DIR / "tokenized_train.pt"
TOKENIZED_VAL_PATH = config.PROCESSED_DATA_DIR / "tokenized_val.pt"

OUTPUT_DIR = config.OUTPUT_DIR / "distilbert_fake_or_real"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Load tokenized datasets
print(f"Loading tokenized training data from: {TOKENIZED_TRAIN_PATH}")
try:
    train_dataset = torch.load(TOKENIZED_TRAIN_PATH, weights_only=False)
    val_dataset = torch.load(TOKENIZED_VAL_PATH, weights_only=False)
except FileNotFoundError:
    print("Error: Tokenized data not found. Please ensure 03_dataset_creation.py (or your data creation notebook) was run successfully")
    sys.exit(1)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")

Loading tokenized training data from: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/data/processed/tokenized_train.pt
Training dataset size: 212
Validation dataset size: 41


# Load model and tokenizer

In [4]:
# Load the model for sequence classification with 2 labels (Real/Fake)
print(f"Loading model: {config.TOKENIZER_NAME} for Sequence Classification")
model = DistilBertForSequenceClassification.from_pretrained(
    config.TOKENIZER_NAME, 
    num_labels=2
)

# Might need the tokenizer later for prediction/evaluation
tokenizer = AutoTokenizer.from_pretrained(config.TOKENIZER_NAME)

Loading model: distilbert-base-uncased for Sequence Classification


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Define parameters and metrics

In [5]:
# Detect device
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

# Training configuration
training_args = TrainingArguments(
    output_dir=str(OUTPUT_DIR),          
    num_train_epochs=20,                 # 20 epochs but with early stopping to avoid overfitting
    per_device_train_batch_size=16,      
    per_device_eval_batch_size=64,       
    warmup_steps=500,                    # Nb of steps during which the learning rate increases learning
    weight_decay=0.01,                   # Regularization term to reduce overfitting by penalizing large weights
    logging_dir='./logs',                
    logging_steps=50,                    # Log metrics every 50 training steps
    evaluation_strategy="epoch",         # Evaluate at the end of each epoch
    save_strategy="epoch",               # Save model checkpoint at the end of each epoch
    save_total_limit=2,                  # Keep only 2 best checkpoints
    load_best_model_at_end=True,         # Load the best model found during training
    metric_for_best_model="eval_f1",     # Metric to monitor for best model
    greater_is_better=True,              # Looking to maximize the f1 metric
    fp16=False,                          # Disable for Mac, Transformers will automatically use MPS
    report_to="none"                     # Disables wandb, tensorboard, etc. (disabling sending logs online)
)

# Define Evaluation Metric
try:
    from sklearn.metrics import accuracy_score, precision_recall_fscore_support
    
    # p is an object passed by the Hugging Face Trainer. It contains "predictions" and "label_ids"
    def compute_metrics(p):
        preds = np.argmax(p.predictions, axis=1)
        precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='binary')
        acc = accuracy_score(p.label_ids, preds)
        return {
            'accuracy': acc,
            'f1': f1,
            'precision': precision,
            'recall': recall
        }
except ImportError:
    print("Warning: scikit-learn not found. Install with 'pip install scikit-learn' to use advanced metrics")
    def compute_metrics(p):
        return {}

Using device: mps


# Custom Loss function

By exploding chunks, it has added unbalance in train dataset (138 for class 0 and 74 for class 1) so custom loss function has to be defined in order to add weights on classes

In [6]:
train_df = pd.read_csv(config.PROCESSED_DATA_DIR / "train_exploded.csv")

# Count of each class
counts = train_df['label'].value_counts().sort_index()
print(counts)

# Dynamic class weights (inverse frequency)
weights = 1.0 / counts
weights = weights / weights.sum()  # Normalize
weights = torch.tensor(weights.values, dtype=torch.float)
print(weights)

# Override the default loss
def compute_loss(model, inputs, return_outputs=False):
    labels = inputs.pop("labels")
    outputs = model(**inputs)
    logits = outputs.logits
    loss_fct = CrossEntropyLoss(weight=weights.to(logits.device))
    loss = loss_fct(logits, labels)
    return (loss, outputs) if return_outputs else loss

label
0    138
1     74
Name: count, dtype: int64
tensor([0.3491, 0.6509])


# Custom subclass Trainer

Since in Hugging Face Transformers, the **Trainer** class does not have a `compute_loss` argument in its constructor, I created a subclass called **WeightedTrainer** to use the custom `compute_loss` method.

In [7]:
class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        
        # Define class weights
        weights = torch.tensor([1.0, 138/74]).to(logits.device)
        loss_fct = CrossEntropyLoss(weight=weights)
        
        loss = loss_fct(logits, labels)
        return (loss, outputs) if return_outputs else loss

# Custom Confusion matrix callback

In [8]:
class ConfusionMatrixCallback(TrainerCallback):
    def __init__(self):
        self.trainer = None
        self.best_epoch = 0
        self.best_f1 = -float('inf')
    
    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        """Called after evaluation"""
        if not hasattr(self, 'trainer') or self.trainer is None:
            print("Trainer not set in ConfusionMatrixCallback")
            return
        
        try:
            # Get predictions
            preds_output = self.trainer.predict(self.trainer.eval_dataset)
            preds = np.argmax(preds_output.predictions, axis=1)
            labels = preds_output.label_ids
            
            # Compute confusion matrix
            cm = confusion_matrix(labels, preds)
            
            # Get current metrics
            current_epoch = int(state.epoch) if state.epoch else 0
            current_f1 = metrics.get('eval_f1', 0) if metrics else 0
            
            # Track best
            is_best = False
            if current_f1 > self.best_f1:
                self.best_f1 = current_f1
                self.best_epoch = current_epoch
                is_best = True
            
            # Print header
            print(f"\n{'='*80}")
            if is_best:
                print(f"[Epoch {current_epoch}] Confusion Matrix (New best)")
            else:
                print(f"[Epoch {current_epoch}] Confusion Matrix")
            print(f"{'='*80}")
            
            # Print confusion matrix with labels
            print("Confusion Matrix:")
            print("              Predicted")
            print("           Real    Fake")
            print(f"Real   [[{cm[0][0]:4d}    {cm[0][1]:4d}]]")
            print(f"Fake   [[{cm[1][0]:4d}    {cm[1][1]:4d}]]")
            
            # Calculate per-class metrics
            tn, fp, fn, tp = cm.ravel()
            
            # Class metrics
            real_precision = tn / (tn + fn) if (tn + fn) > 0 else 0
            real_recall = tn / (tn + fp) if (tn + fp) > 0 else 0
            fake_precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            fake_recall = tp / (tp + fn) if (tp + fn) > 0 else 0
            
            print(f"\nPer-Class Metrics:")
            print(f"  Real: Precision={real_precision:.3f}, Recall={real_recall:.3f}")
            print(f"  Fake: Precision={fake_precision:.3f}, Recall={fake_recall:.3f}")
            
            # Overall metrics
            if metrics:
                print(f"\nOverall Metrics:")
                print(f"  Accuracy:  {metrics.get('eval_accuracy', 0):.4f}")
                print(f"  F1:        {metrics.get('eval_f1', 0):.4f}")
                print(f"  Precision: {metrics.get('eval_precision', 0):.4f}")
                print(f"  Recall:    {metrics.get('eval_recall', 0):.4f}")
                print(f"  Loss:      {metrics.get('eval_loss', 0):.4f}")
            
            print(f"\nBest so far: Epoch {self.best_epoch}, F1={self.best_f1:.4f}")
            print(f"{'='*80}\n")
            
        except Exception as e:
            print(f"Error in ConfusionMatrixCallback: {e}")
    
    def on_train_end(self, args, state, control, **kwargs):
        """Verify best model was loaded"""
        print(f"\n{'='*80}")
        print(f"TRAINING COMPLETE")
        print(f"{'='*80}")
        print(f"Best Epoch: {self.best_epoch}")
        print(f"Best F1: {self.best_f1:.4f}")
        
        if state.best_model_checkpoint:
            print(f"Best checkpoint: {state.best_model_checkpoint}")
            from pathlib import Path
            if Path(state.best_model_checkpoint).exists():
                print(f"Checkpoint verified on disk")
            else:
                print(f"Checkpoint NOT found")
        else:
            print(f"No best checkpoint saved")
        
        # Final verification
        if hasattr(self, 'trainer') and self.trainer is not None:
            print(f"\nVerifying loaded model")
            try:
                preds_output = self.trainer.predict(self.trainer.eval_dataset)
                preds = np.argmax(preds_output.predictions, axis=1)
                labels = preds_output.label_ids
                
                from sklearn.metrics import f1_score
                final_f1 = f1_score(labels, preds, average='binary')
                
                print(f"Final Model F1: {final_f1:.4f}")
                print(f"Expected Best F1: {self.best_f1:.4f}")
                
                if abs(final_f1 - self.best_f1) < 0.001:
                    print(f"Correct: Best model was loaded")
                else:
                    print(f"Error: Wrong model! Expected {self.best_f1:.4f}, got {final_f1:.4f}")
            except Exception as e:
                print(f"Could not verify: {e}")
        
        print(f"{'='*80}\n")
    
    def set_trainer(self, trainer):
        self.trainer = trainer

# Training process

In [9]:
# Cross-validation settings
k = 4
# Load the raw training data to extract the original_index for GroupKFold
raw_train_df = pd.read_csv(config.PROCESSED_DATA_DIR / "train_exploded.csv")

# Ensure 'original_index' exists and use it as the grouping variable
if 'original_index' not in raw_train_df.columns:
    print("Creating 'original_index' for GroupKFold")
    raw_train_df['original_index'] = raw_train_df.groupby(['file1_text', 'file2_text']).ngroup()

# Extract the groups array from the raw data
# This array contains the original text pair ID for every chunk
groups = raw_train_df['original_index'].values

# Use GroupKFold to ensure all chunks from the same original text pair are in the same fold
gkf = GroupKFold(n_splits=k)

all_fold_metrics = []

# The GroupKFold split is performed on the indices of the full dataset
# The 'groups' argument tells GroupKFold how to split the data
for fold, (train_idx, val_idx) in enumerate(gkf.split(np.arange(len(train_dataset)), groups=groups)):
    print("-" * 80)
    print(f"\nFold {fold+1}/{k}")
    print("-" * 80)
    
    # Subsets for this fold (rest of the loop is unchanged)
    train_fold = torch.utils.data.Subset(train_dataset, train_idx)
    val_fold = torch.utils.data.Subset(train_dataset, val_idx)
    
    # Subsets for this fold
    train_fold = torch.utils.data.Subset(train_dataset, train_idx)
    val_fold = torch.utils.data.Subset(train_dataset, val_idx)
    
    # Load pre-trained model
    model_fold = AutoModelForSequenceClassification.from_pretrained(
        "distilbert-base-uncased",
        num_labels=2
    )
    
    # Callback for the confusion matrix
    cm_callback_fold = ConfusionMatrixCallback()
    
    # Init trainer for this fold
    trainer_fold = WeightedTrainer(
        model=model_fold,
        args=training_args,
        train_dataset=train_fold,
        eval_dataset=val_fold,
        compute_metrics=compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3),
                   cm_callback_fold]
    )
    
    cm_callback_fold.set_trainer(trainer_fold)
    
    # Training
    trainer_fold.train()

    # Print fold state for clarity
    print("-" * 80)
    print("\nFold trainer state")
    print(f"Best checkpoint: {trainer_fold.state.best_model_checkpoint}")
    print(f"Best metric value ({training_args.metric_for_best_model}): {trainer_fold.state.best_metric}")
    print("-" * 80)

    # Save best model of this fold (will save the reloaded best model)
    fold_model_path = OUTPUT_DIR / f"fold_{fold+1}"
    trainer_fold.save_model(str(fold_model_path))
    
    # Evaluation (will use the reloaded best model weights)
    metrics = trainer_fold.evaluate()
    all_fold_metrics.append(metrics)
    print(f"Fold {fold+1} evaluation (Best model metrics)")
    print(metrics)

    # Memory cleaning
    del model_fold, trainer_fold
    torch.mps.empty_cache()

# Average metrics on all the folds
avg_metrics = {k: np.mean([m[k] for m in all_fold_metrics if k in m]) for k in all_fold_metrics[0]}
print("\nCross-validation summary")
print(avg_metrics)

# Final training on full dataset
print("\nFinal training on full dataset")

final_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)

cm_callback_final = ConfusionMatrixCallback()
final_trainer = WeightedTrainer(
    model=final_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3),
               cm_callback_final]
)
cm_callback_final.set_trainer(final_trainer)

print("\nStarting final training on full training dataset")
final_trainer.train()

# The best model is automatically loaded into the trainer's model attribute
# because load_best_model_at_end=True was set in TrainingArguments.
# We save the model that is currently loaded in the trainer.

# Print final state for clarity
print("-" * 80)
print("\nFinal trainer state")
print(f"Best checkpoint: {final_trainer.state.best_model_checkpoint}")
print(f"Best metric value ({training_args.metric_for_best_model}): {final_trainer.state.best_metric}")
print("-" * 80)

# Final evaluation (will use the reloaded best model weights)
final_metrics = final_trainer.evaluate()

# Final save (will save the reloaded best model)
final_model_path = OUTPUT_DIR / "final_model"
final_trainer.save_model(str(final_model_path))
tokenizer.save_pretrained(str(final_model_path))
print(f"Final model and tokenizer saved to: {final_model_path}")


Creating 'original_index' for GroupKFold...
--------------------------------------------------------------------------------

Fold 1/4
--------------------------------------------------------------------------------


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.7022829055786133, 'eval_accuracy': 0.33962264150943394, 'eval_f1': 0.5070422535211268, 'eval_precision': 0.33962264150943394, 'eval_recall': 1.0, 'eval_runtime': 2.3538, 'eval_samples_per_second': 22.517, 'eval_steps_per_second': 0.425, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[   0      35]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=0.000, Recall=0.000
  Fake: Precision=0.340, Recall=1.000

Overall Metrics:
  Accuracy:  0.3396
  F1:        0.5070
  Precision: 0.3396
  Recall:    1.0000
  Loss:      0.7023

Best so far: Epoch 1, F1=0.5070



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6861300468444824, 'eval_accuracy': 0.33962264150943394, 'eval_f1': 0.5070422535211268, 'eval_precision': 0.33962264150943394, 'eval_recall': 1.0, 'eval_runtime': 2.3964, 'eval_samples_per_second': 22.116, 'eval_steps_per_second': 0.417, 'epoch': 2.0}

[Epoch 2] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[   0      35]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=0.000, Recall=0.000
  Fake: Precision=0.340, Recall=1.000

Overall Metrics:
  Accuracy:  0.3396
  F1:        0.5070
  Precision: 0.3396
  Recall:    1.0000
  Loss:      0.6861

Best so far: Epoch 1, F1=0.5070



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6608251929283142, 'eval_accuracy': 0.4716981132075472, 'eval_f1': 0.5625, 'eval_precision': 0.391304347826087, 'eval_recall': 1.0, 'eval_runtime': 2.4121, 'eval_samples_per_second': 21.973, 'eval_steps_per_second': 0.415, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[   7      28]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.200
  Fake: Precision=0.391, Recall=1.000

Overall Metrics:
  Accuracy:  0.4717
  F1:        0.5625
  Precision: 0.3913
  Recall:    1.0000
  Loss:      0.6608

Best so far: Epoch 3, F1=0.5625



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6246447563171387, 'eval_accuracy': 0.6981132075471698, 'eval_f1': 0.6923076923076923, 'eval_precision': 0.5294117647058824, 'eval_recall': 1.0, 'eval_runtime': 2.4305, 'eval_samples_per_second': 21.806, 'eval_steps_per_second': 0.411, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19      16]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.543
  Fake: Precision=0.529, Recall=1.000

Overall Metrics:
  Accuracy:  0.6981
  F1:        0.6923
  Precision: 0.5294
  Recall:    1.0000
  Loss:      0.6246

Best so far: Epoch 4, F1=0.6923

{'loss': 0.6695, 'grad_norm': 2.062835216522217, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5652679204940796, 'eval_accuracy': 0.7358490566037735, 'eval_f1': 0.72, 'eval_precision': 0.5625, 'eval_recall': 1.0, 'eval_runtime': 2.3153, 'eval_samples_per_second': 22.891, 'eval_steps_per_second': 0.432, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  21      14]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.600
  Fake: Precision=0.562, Recall=1.000

Overall Metrics:
  Accuracy:  0.7358
  F1:        0.7200
  Precision: 0.5625
  Recall:    1.0000
  Loss:      0.5653

Best so far: Epoch 5, F1=0.7200



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4767593443393707, 'eval_accuracy': 0.7547169811320755, 'eval_f1': 0.7346938775510204, 'eval_precision': 0.5806451612903226, 'eval_recall': 1.0, 'eval_runtime': 2.3243, 'eval_samples_per_second': 22.803, 'eval_steps_per_second': 0.43, 'epoch': 6.0}

[Epoch 6] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      13]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.629
  Fake: Precision=0.581, Recall=1.000

Overall Metrics:
  Accuracy:  0.7547
  F1:        0.7347
  Precision: 0.5806
  Recall:    1.0000
  Loss:      0.4768

Best so far: Epoch 6, F1=0.7347



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3949638903141022, 'eval_accuracy': 0.7924528301886793, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6206896551724138, 'eval_recall': 1.0, 'eval_runtime': 2.3637, 'eval_samples_per_second': 22.423, 'eval_steps_per_second': 0.423, 'epoch': 7.0}

[Epoch 7] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24      11]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.686
  Fake: Precision=0.621, Recall=1.000

Overall Metrics:
  Accuracy:  0.7925
  F1:        0.7660
  Precision: 0.6207
  Recall:    1.0000
  Loss:      0.3950

Best so far: Epoch 7, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.32774782180786133, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.8, 'eval_precision': 0.6666666666666666, 'eval_recall': 1.0, 'eval_runtime': 2.326, 'eval_samples_per_second': 22.786, 'eval_steps_per_second': 0.43, 'epoch': 8.0}

[Epoch 8] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  26       9]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.743
  Fake: Precision=0.667, Recall=1.000

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.8000
  Precision: 0.6667
  Recall:    1.0000
  Loss:      0.3277

Best so far: Epoch 8, F1=0.8000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.29468342661857605, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8181818181818182, 'eval_precision': 0.6923076923076923, 'eval_recall': 1.0, 'eval_runtime': 2.3083, 'eval_samples_per_second': 22.961, 'eval_steps_per_second': 0.433, 'epoch': 9.0}

[Epoch 9] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       8]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.771
  Fake: Precision=0.692, Recall=1.000

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8182
  Precision: 0.6923
  Recall:    1.0000
  Loss:      0.2947

Best so far: Epoch 9, F1=0.8182

{'loss': 0.392, 'grad_norm': 4.422815799713135, 'learning_rate': 1e-05, 'epoch': 10.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.27904608845710754, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.85, 'eval_precision': 0.7727272727272727, 'eval_recall': 0.9444444444444444, 'eval_runtime': 2.2982, 'eval_samples_per_second': 23.062, 'eval_steps_per_second': 0.435, 'epoch': 10.0}

[Epoch 10] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       5]]
Fake   [[   1      17]]

Per-Class Metrics:
  Real: Precision=0.968, Recall=0.857
  Fake: Precision=0.773, Recall=0.944

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8500
  Precision: 0.7727
  Recall:    0.9444
  Loss:      0.2790

Best so far: Epoch 10, F1=0.8500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.26381686329841614, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.85, 'eval_precision': 0.7727272727272727, 'eval_recall': 0.9444444444444444, 'eval_runtime': 2.3032, 'eval_samples_per_second': 23.012, 'eval_steps_per_second': 0.434, 'epoch': 11.0}

[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       5]]
Fake   [[   1      17]]

Per-Class Metrics:
  Real: Precision=0.968, Recall=0.857
  Fake: Precision=0.773, Recall=0.944

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8500
  Precision: 0.7727
  Recall:    0.9444
  Loss:      0.2638

Best so far: Epoch 10, F1=0.8500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.28742751479148865, 'eval_accuracy': 0.8679245283018868, 'eval_f1': 0.8372093023255814, 'eval_precision': 0.72, 'eval_recall': 1.0, 'eval_runtime': 2.2745, 'eval_samples_per_second': 23.302, 'eval_steps_per_second': 0.44, 'epoch': 12.0}

[Epoch 12] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       7]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.800
  Fake: Precision=0.720, Recall=1.000

Overall Metrics:
  Accuracy:  0.8679
  F1:        0.8372
  Precision: 0.7200
  Recall:    1.0000
  Loss:      0.2874

Best so far: Epoch 10, F1=0.8500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.39316853880882263, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.7428571428571429, 'eval_precision': 0.7647058823529411, 'eval_recall': 0.7222222222222222, 'eval_runtime': 2.273, 'eval_samples_per_second': 23.317, 'eval_steps_per_second': 0.44, 'epoch': 13.0}

[Epoch 13] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  31       4]]
Fake   [[   5      13]]

Per-Class Metrics:
  Real: Precision=0.861, Recall=0.886
  Fake: Precision=0.765, Recall=0.722

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.7429
  Precision: 0.7647
  Recall:    0.7222
  Loss:      0.3932

Best so far: Epoch 10, F1=0.8500

{'train_runtime': 270.5447, 'train_samples_per_second': 11.754, 'train_steps_per_second': 0.739, 'train_loss': 0.4577501700474666, 'epoch': 13.0}

TRAINING COMPLETE
Best Epoch: 10
Best F1: 0.8500
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-100
Check

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8500
Expected Best F1: 0.8500
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-100
Best metric value (eval_f1): 0.85
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]


[Epoch 13] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  30       5]]
Fake   [[   1      17]]

Per-Class Metrics:
  Real: Precision=0.968, Recall=0.857
  Fake: Precision=0.773, Recall=0.944

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8500
  Precision: 0.7727
  Recall:    0.9444
  Loss:      0.2790

Best so far: Epoch 10, F1=0.8500

Fold 1 evaluation (Best model metrics)
{'eval_loss': 0.27904608845710754, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.85, 'eval_precision': 0.7727272727272727, 'eval_recall': 0.9444444444444444, 'eval_runtime': 2.3223, 'eval_samples_per_second': 22.822, 'eval_steps_per_second': 0.431, 'epoch': 13.0}
--------------------------------------------------------------------------------

Fold 2/4
--------------------------------------------------------------------------------


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.689300537109375, 'eval_accuracy': 0.5925925925925926, 'eval_f1': 0.5217391304347826, 'eval_precision': 0.4444444444444444, 'eval_recall': 0.631578947368421, 'eval_runtime': 2.309, 'eval_samples_per_second': 23.387, 'eval_steps_per_second': 0.433, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  20      15]]
Fake   [[   7      12]]

Per-Class Metrics:
  Real: Precision=0.741, Recall=0.571
  Fake: Precision=0.444, Recall=0.632

Overall Metrics:
  Accuracy:  0.5926
  F1:        0.5217
  Precision: 0.4444
  Recall:    0.6316
  Loss:      0.6893

Best so far: Epoch 1, F1=0.5217



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6744490265846252, 'eval_accuracy': 0.7777777777777778, 'eval_f1': 0.7391304347826086, 'eval_precision': 0.6296296296296297, 'eval_recall': 0.8947368421052632, 'eval_runtime': 2.5013, 'eval_samples_per_second': 21.589, 'eval_steps_per_second': 0.4, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25      10]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.926, Recall=0.714
  Fake: Precision=0.630, Recall=0.895

Overall Metrics:
  Accuracy:  0.7778
  F1:        0.7391
  Precision: 0.6296
  Recall:    0.8947
  Loss:      0.6744

Best so far: Epoch 2, F1=0.7391



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6511367559432983, 'eval_accuracy': 0.7407407407407407, 'eval_f1': 0.72, 'eval_precision': 0.5806451612903226, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.5136, 'eval_samples_per_second': 21.483, 'eval_steps_per_second': 0.398, 'epoch': 3.0}

[Epoch 3] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      13]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.957, Recall=0.629
  Fake: Precision=0.581, Recall=0.947

Overall Metrics:
  Accuracy:  0.7407
  F1:        0.7200
  Precision: 0.5806
  Recall:    0.9474
  Loss:      0.6511

Best so far: Epoch 2, F1=0.7391



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6122527718544006, 'eval_accuracy': 0.7777777777777778, 'eval_f1': 0.75, 'eval_precision': 0.6206896551724138, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.4583, 'eval_samples_per_second': 21.967, 'eval_steps_per_second': 0.407, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24      11]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.960, Recall=0.686
  Fake: Precision=0.621, Recall=0.947

Overall Metrics:
  Accuracy:  0.7778
  F1:        0.7500
  Precision: 0.6207
  Recall:    0.9474
  Loss:      0.6123

Best so far: Epoch 4, F1=0.7500

{'loss': 0.6569, 'grad_norm': 2.201120615005493, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5441964268684387, 'eval_accuracy': 0.7962962962962963, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6428571428571429, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.5499, 'eval_samples_per_second': 21.177, 'eval_steps_per_second': 0.392, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25      10]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.714
  Fake: Precision=0.643, Recall=0.947

Overall Metrics:
  Accuracy:  0.7963
  F1:        0.7660
  Precision: 0.6429
  Recall:    0.9474
  Loss:      0.5442

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.45954400300979614, 'eval_accuracy': 0.7962962962962963, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6428571428571429, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.5256, 'eval_samples_per_second': 21.381, 'eval_steps_per_second': 0.396, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25      10]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.714
  Fake: Precision=0.643, Recall=0.947

Overall Metrics:
  Accuracy:  0.7963
  F1:        0.7660
  Precision: 0.6429
  Recall:    0.9474
  Loss:      0.4595

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3989691138267517, 'eval_accuracy': 0.7962962962962963, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6428571428571429, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.8269, 'eval_samples_per_second': 19.102, 'eval_steps_per_second': 0.354, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25      10]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.714
  Fake: Precision=0.643, Recall=0.947

Overall Metrics:
  Accuracy:  0.7963
  F1:        0.7660
  Precision: 0.6429
  Recall:    0.9474
  Loss:      0.3990

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.37441790103912354, 'eval_accuracy': 0.8148148148148148, 'eval_f1': 0.7619047619047619, 'eval_precision': 0.6956521739130435, 'eval_recall': 0.8421052631578947, 'eval_runtime': 2.5675, 'eval_samples_per_second': 21.032, 'eval_steps_per_second': 0.389, 'epoch': 8.0}

[Epoch 8] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       7]]
Fake   [[   3      16]]

Per-Class Metrics:
  Real: Precision=0.903, Recall=0.800
  Fake: Precision=0.696, Recall=0.842

Overall Metrics:
  Accuracy:  0.8148
  F1:        0.7619
  Precision: 0.6957
  Recall:    0.8421
  Loss:      0.3744

Best so far: Epoch 5, F1=0.7660

{'train_runtime': 168.3618, 'train_samples_per_second': 18.769, 'train_steps_per_second': 1.188, 'train_loss': 0.5727805495262146, 'epoch': 8.0}

TRAINING COMPLETE
Best Epoch: 5
Best F1: 0.7660
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-50
Checkpoin

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.7660
Expected Best F1: 0.7660
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-50
Best metric value (eval_f1): 0.7659574468085106
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



[Epoch 8] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  25      10]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.962, Recall=0.714
  Fake: Precision=0.643, Recall=0.947

Overall Metrics:
  Accuracy:  0.7963
  F1:        0.7660
  Precision: 0.6429
  Recall:    0.9474
  Loss:      0.5442

Best so far: Epoch 5, F1=0.7660

Fold 2 evaluation (Best model metrics)
{'eval_loss': 0.5441964268684387, 'eval_accuracy': 0.7962962962962963, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6428571428571429, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.2046, 'eval_samples_per_second': 24.495, 'eval_steps_per_second': 0.454, 'epoch': 8.0}
--------------------------------------------------------------------------------

Fold 3/4
--------------------------------------------------------------------------------


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6947055459022522, 'eval_accuracy': 0.6346153846153846, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.2663, 'eval_samples_per_second': 22.945, 'eval_steps_per_second': 0.441, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  33       1]]
Fake   [[  18       0]]

Per-Class Metrics:
  Real: Precision=0.647, Recall=0.971
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6346
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6947

Best so far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6767831444740295, 'eval_accuracy': 0.6538461538461539, 'eval_f1': 0.1, 'eval_precision': 0.5, 'eval_recall': 0.05555555555555555, 'eval_runtime': 2.1757, 'eval_samples_per_second': 23.9, 'eval_steps_per_second': 0.46, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  33       1]]
Fake   [[  17       1]]

Per-Class Metrics:
  Real: Precision=0.660, Recall=0.971
  Fake: Precision=0.500, Recall=0.056

Overall Metrics:
  Accuracy:  0.6538
  F1:        0.1000
  Precision: 0.5000
  Recall:    0.0556
  Loss:      0.6768

Best so far: Epoch 2, F1=0.1000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6478733420372009, 'eval_accuracy': 0.7307692307692307, 'eval_f1': 0.631578947368421, 'eval_precision': 0.6, 'eval_recall': 0.6666666666666666, 'eval_runtime': 2.1891, 'eval_samples_per_second': 23.754, 'eval_steps_per_second': 0.457, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  26       8]]
Fake   [[   6      12]]

Per-Class Metrics:
  Real: Precision=0.812, Recall=0.765
  Fake: Precision=0.600, Recall=0.667

Overall Metrics:
  Accuracy:  0.7308
  F1:        0.6316
  Precision: 0.6000
  Recall:    0.6667
  Loss:      0.6479

Best so far: Epoch 3, F1=0.6316



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6046055555343628, 'eval_accuracy': 0.75, 'eval_f1': 0.7346938775510204, 'eval_precision': 0.5806451612903226, 'eval_recall': 1.0, 'eval_runtime': 2.2401, 'eval_samples_per_second': 23.213, 'eval_steps_per_second': 0.446, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  21      13]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.618
  Fake: Precision=0.581, Recall=1.000

Overall Metrics:
  Accuracy:  0.7500
  F1:        0.7347
  Precision: 0.5806
  Recall:    1.0000
  Loss:      0.6046

Best so far: Epoch 4, F1=0.7347

{'loss': 0.6496, 'grad_norm': 3.5500271320343018, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5440356731414795, 'eval_accuracy': 0.7884615384615384, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6206896551724138, 'eval_recall': 1.0, 'eval_runtime': 2.1868, 'eval_samples_per_second': 23.779, 'eval_steps_per_second': 0.457, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      11]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.676
  Fake: Precision=0.621, Recall=1.000

Overall Metrics:
  Accuracy:  0.7885
  F1:        0.7660
  Precision: 0.6207
  Recall:    1.0000
  Loss:      0.5440

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4667207598686218, 'eval_accuracy': 0.7692307692307693, 'eval_f1': 0.75, 'eval_precision': 0.6, 'eval_recall': 1.0, 'eval_runtime': 2.2446, 'eval_samples_per_second': 23.166, 'eval_steps_per_second': 0.446, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      12]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.647
  Fake: Precision=0.600, Recall=1.000

Overall Metrics:
  Accuracy:  0.7692
  F1:        0.7500
  Precision: 0.6000
  Recall:    1.0000
  Loss:      0.4667

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4185212552547455, 'eval_accuracy': 0.7884615384615384, 'eval_f1': 0.7555555555555555, 'eval_precision': 0.6296296296296297, 'eval_recall': 0.9444444444444444, 'eval_runtime': 2.2247, 'eval_samples_per_second': 23.374, 'eval_steps_per_second': 0.449, 'epoch': 7.0}

[Epoch 7] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24      10]]
Fake   [[   1      17]]

Per-Class Metrics:
  Real: Precision=0.960, Recall=0.706
  Fake: Precision=0.630, Recall=0.944

Overall Metrics:
  Accuracy:  0.7885
  F1:        0.7556
  Precision: 0.6296
  Recall:    0.9444
  Loss:      0.4185

Best so far: Epoch 5, F1=0.7660



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3805749714374542, 'eval_accuracy': 0.7884615384615384, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6206896551724138, 'eval_recall': 1.0, 'eval_runtime': 2.2505, 'eval_samples_per_second': 23.106, 'eval_steps_per_second': 0.444, 'epoch': 8.0}

[Epoch 8] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      11]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.676
  Fake: Precision=0.621, Recall=1.000

Overall Metrics:
  Accuracy:  0.7885
  F1:        0.7660
  Precision: 0.6207
  Recall:    1.0000
  Loss:      0.3806

Best so far: Epoch 5, F1=0.7660

{'train_runtime': 167.8237, 'train_samples_per_second': 19.068, 'train_steps_per_second': 1.192, 'train_loss': 0.5668884515762329, 'epoch': 8.0}

TRAINING COMPLETE
Best Epoch: 5
Best F1: 0.7660
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-50
Checkpoint verified on di

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.7660
Expected Best F1: 0.7660
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-50
Best metric value (eval_f1): 0.7659574468085106
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



[Epoch 8] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  23      11]]
Fake   [[   0      18]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.676
  Fake: Precision=0.621, Recall=1.000

Overall Metrics:
  Accuracy:  0.7885
  F1:        0.7660
  Precision: 0.6207
  Recall:    1.0000
  Loss:      0.5440

Best so far: Epoch 5, F1=0.7660

Fold 3 evaluation (Best model metrics)
{'eval_loss': 0.5440356731414795, 'eval_accuracy': 0.7884615384615384, 'eval_f1': 0.7659574468085106, 'eval_precision': 0.6206896551724138, 'eval_recall': 1.0, 'eval_runtime': 2.1879, 'eval_samples_per_second': 23.767, 'eval_steps_per_second': 0.457, 'epoch': 8.0}
--------------------------------------------------------------------------------

Fold 4/4
--------------------------------------------------------------------------------


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6898497939109802, 'eval_accuracy': 0.6037735849056604, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.3869, 'eval_samples_per_second': 22.204, 'eval_steps_per_second': 0.419, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  32       2]]
Fake   [[  19       0]]

Per-Class Metrics:
  Real: Precision=0.627, Recall=0.941
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.6038
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6898

Best so far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6726938486099243, 'eval_accuracy': 0.6981132075471698, 'eval_f1': 0.38461538461538464, 'eval_precision': 0.7142857142857143, 'eval_recall': 0.2631578947368421, 'eval_runtime': 2.499, 'eval_samples_per_second': 21.208, 'eval_steps_per_second': 0.4, 'epoch': 2.0}

[Epoch 2] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  32       2]]
Fake   [[  14       5]]

Per-Class Metrics:
  Real: Precision=0.696, Recall=0.941
  Fake: Precision=0.714, Recall=0.263

Overall Metrics:
  Accuracy:  0.6981
  F1:        0.3846
  Precision: 0.7143
  Recall:    0.2632
  Loss:      0.6727

Best so far: Epoch 2, F1=0.3846



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6428987979888916, 'eval_accuracy': 0.7735849056603774, 'eval_f1': 0.76, 'eval_precision': 0.6129032258064516, 'eval_recall': 1.0, 'eval_runtime': 2.4712, 'eval_samples_per_second': 21.447, 'eval_steps_per_second': 0.405, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      12]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.647
  Fake: Precision=0.613, Recall=1.000

Overall Metrics:
  Accuracy:  0.7736
  F1:        0.7600
  Precision: 0.6129
  Recall:    1.0000
  Loss:      0.6429

Best so far: Epoch 3, F1=0.7600



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6004484295845032, 'eval_accuracy': 0.7735849056603774, 'eval_f1': 0.76, 'eval_precision': 0.6129032258064516, 'eval_recall': 1.0, 'eval_runtime': 2.5411, 'eval_samples_per_second': 20.857, 'eval_steps_per_second': 0.394, 'epoch': 4.0}

[Epoch 4] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22      12]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.647
  Fake: Precision=0.613, Recall=1.000

Overall Metrics:
  Accuracy:  0.7736
  F1:        0.7600
  Precision: 0.6129
  Recall:    1.0000
  Loss:      0.6004

Best so far: Epoch 3, F1=0.7600

{'loss': 0.6516, 'grad_norm': 2.9854331016540527, 'learning_rate': 5e-06, 'epoch': 5.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5282425880432129, 'eval_accuracy': 0.8113207547169812, 'eval_f1': 0.7916666666666666, 'eval_precision': 0.6551724137931034, 'eval_recall': 1.0, 'eval_runtime': 2.5003, 'eval_samples_per_second': 21.197, 'eval_steps_per_second': 0.4, 'epoch': 5.0}

[Epoch 5] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24      10]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.706
  Fake: Precision=0.655, Recall=1.000

Overall Metrics:
  Accuracy:  0.8113
  F1:        0.7917
  Precision: 0.6552
  Recall:    1.0000
  Loss:      0.5282

Best so far: Epoch 5, F1=0.7917



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.44580304622650146, 'eval_accuracy': 0.8113207547169812, 'eval_f1': 0.7916666666666666, 'eval_precision': 0.6551724137931034, 'eval_recall': 1.0, 'eval_runtime': 2.5542, 'eval_samples_per_second': 20.75, 'eval_steps_per_second': 0.392, 'epoch': 6.0}

[Epoch 6] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  24      10]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.706
  Fake: Precision=0.655, Recall=1.000

Overall Metrics:
  Accuracy:  0.8113
  F1:        0.7917
  Precision: 0.6552
  Recall:    1.0000
  Loss:      0.4458

Best so far: Epoch 5, F1=0.7917



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.378466933965683, 'eval_accuracy': 0.8301886792452831, 'eval_f1': 0.8, 'eval_precision': 0.6923076923076923, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.5808, 'eval_samples_per_second': 20.536, 'eval_steps_per_second': 0.387, 'epoch': 7.0}

[Epoch 7] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  26       8]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.963, Recall=0.765
  Fake: Precision=0.692, Recall=0.947

Overall Metrics:
  Accuracy:  0.8302
  F1:        0.8000
  Precision: 0.6923
  Recall:    0.9474
  Loss:      0.3785

Best so far: Epoch 7, F1=0.8000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.343048095703125, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8260869565217391, 'eval_precision': 0.7037037037037037, 'eval_recall': 1.0, 'eval_runtime': 2.518, 'eval_samples_per_second': 21.049, 'eval_steps_per_second': 0.397, 'epoch': 8.0}

[Epoch 8] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  26       8]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.765
  Fake: Precision=0.704, Recall=1.000

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8261
  Precision: 0.7037
  Recall:    1.0000
  Loss:      0.3430

Best so far: Epoch 8, F1=0.8261



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3031170070171356, 'eval_accuracy': 0.8490566037735849, 'eval_f1': 0.8181818181818182, 'eval_precision': 0.72, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.5132, 'eval_samples_per_second': 21.088, 'eval_steps_per_second': 0.398, 'epoch': 9.0}

[Epoch 9] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  27       7]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.964, Recall=0.794
  Fake: Precision=0.720, Recall=0.947

Overall Metrics:
  Accuracy:  0.8491
  F1:        0.8182
  Precision: 0.7200
  Recall:    0.9474
  Loss:      0.3031

Best so far: Epoch 8, F1=0.8261

{'loss': 0.3749, 'grad_norm': 1.8437893390655518, 'learning_rate': 1e-05, 'epoch': 10.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.280023455619812, 'eval_accuracy': 0.8679245283018868, 'eval_f1': 0.8372093023255814, 'eval_precision': 0.75, 'eval_recall': 0.9473684210526315, 'eval_runtime': 2.4522, 'eval_samples_per_second': 21.613, 'eval_steps_per_second': 0.408, 'epoch': 10.0}

[Epoch 10] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       6]]
Fake   [[   1      18]]

Per-Class Metrics:
  Real: Precision=0.966, Recall=0.824
  Fake: Precision=0.750, Recall=0.947

Overall Metrics:
  Accuracy:  0.8679
  F1:        0.8372
  Precision: 0.7500
  Recall:    0.9474
  Loss:      0.2800

Best so far: Epoch 10, F1=0.8372



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.2687629163265228, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.76, 'eval_recall': 1.0, 'eval_runtime': 2.5571, 'eval_samples_per_second': 20.727, 'eval_steps_per_second': 0.391, 'epoch': 11.0}

[Epoch 11] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       6]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.824
  Fake: Precision=0.760, Recall=1.000

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8636
  Precision: 0.7600
  Recall:    1.0000
  Loss:      0.2688

Best so far: Epoch 11, F1=0.8636



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3343336284160614, 'eval_accuracy': 0.8679245283018868, 'eval_f1': 0.8108108108108109, 'eval_precision': 0.8333333333333334, 'eval_recall': 0.7894736842105263, 'eval_runtime': 2.5571, 'eval_samples_per_second': 20.727, 'eval_steps_per_second': 0.391, 'epoch': 12.0}

[Epoch 12] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  31       3]]
Fake   [[   4      15]]

Per-Class Metrics:
  Real: Precision=0.886, Recall=0.912
  Fake: Precision=0.833, Recall=0.789

Overall Metrics:
  Accuracy:  0.8679
  F1:        0.8108
  Precision: 0.8333
  Recall:    0.7895
  Loss:      0.3343

Best so far: Epoch 11, F1=0.8636



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.267289936542511, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.76, 'eval_recall': 1.0, 'eval_runtime': 2.5364, 'eval_samples_per_second': 20.896, 'eval_steps_per_second': 0.394, 'epoch': 13.0}

[Epoch 13] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       6]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.824
  Fake: Precision=0.760, Recall=1.000

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8636
  Precision: 0.7600
  Recall:    1.0000
  Loss:      0.2673

Best so far: Epoch 11, F1=0.8636



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.31105104088783264, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8421052631578947, 'eval_precision': 0.8421052631578947, 'eval_recall': 0.8421052631578947, 'eval_runtime': 2.5314, 'eval_samples_per_second': 20.937, 'eval_steps_per_second': 0.395, 'epoch': 14.0}

[Epoch 14] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  31       3]]
Fake   [[   3      16]]

Per-Class Metrics:
  Real: Precision=0.912, Recall=0.912
  Fake: Precision=0.842, Recall=0.842

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8421
  Precision: 0.8421
  Recall:    0.8421
  Loss:      0.3111

Best so far: Epoch 11, F1=0.8636

{'train_runtime': 295.44, 'train_samples_per_second': 10.764, 'train_steps_per_second': 0.677, 'train_loss': 0.4283861296517508, 'epoch': 14.0}

TRAINING COMPLETE
Best Epoch: 11
Best F1: 0.8636
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-110
Check

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8636
Expected Best F1: 0.8636
Correct: Best model was loaded

--------------------------------------------------------------------------------

Fold trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-110
Best metric value (eval_f1): 0.8636363636363636
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]


[Epoch 14] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  28       6]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.824
  Fake: Precision=0.760, Recall=1.000

Overall Metrics:
  Accuracy:  0.8868
  F1:        0.8636
  Precision: 0.7600
  Recall:    1.0000
  Loss:      0.2688

Best so far: Epoch 11, F1=0.8636

Fold 4 evaluation (Best model metrics)
{'eval_loss': 0.2687629163265228, 'eval_accuracy': 0.8867924528301887, 'eval_f1': 0.8636363636363636, 'eval_precision': 0.76, 'eval_recall': 1.0, 'eval_runtime': 2.4976, 'eval_samples_per_second': 21.22, 'eval_steps_per_second': 0.4, 'epoch': 14.0}

Cross-validation summary
{'eval_loss': 0.40901027619838715, 'eval_accuracy': 0.839585685104553, 'eval_f1': 0.8113878143133463, 'eval_precision': 0.6990685176892073, 'eval_recall': 0.972953216374269, 'eval_runtime': 2.3030999999999997, 'eval_samples_per_second': 23.076, 'eval_steps_per_second': 0.4355, 'epoch': 10.75}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Starting final training on full training dataset


  0%|          | 0/280 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.7277801036834717, 'eval_accuracy': 0.5365853658536586, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.3211, 'eval_samples_per_second': 17.664, 'eval_steps_per_second': 0.431, 'epoch': 1.0}

[Epoch 1] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22       0]]
Fake   [[  19       0]]

Per-Class Metrics:
  Real: Precision=0.537, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.5366
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.7278

Best so far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.6991944909095764, 'eval_accuracy': 0.5365853658536586, 'eval_f1': 0.0, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_runtime': 2.2938, 'eval_samples_per_second': 17.874, 'eval_steps_per_second': 0.436, 'epoch': 2.0}

[Epoch 2] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  22       0]]
Fake   [[  19       0]]

Per-Class Metrics:
  Real: Precision=0.537, Recall=1.000
  Fake: Precision=0.000, Recall=0.000

Overall Metrics:
  Accuracy:  0.5366
  F1:        0.0000
  Precision: 0.0000
  Recall:    0.0000
  Loss:      0.6992

Best so far: Epoch 1, F1=0.0000



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.653232216835022, 'eval_accuracy': 0.8048780487804879, 'eval_f1': 0.75, 'eval_precision': 0.9230769230769231, 'eval_recall': 0.631578947368421, 'eval_runtime': 4.8701, 'eval_samples_per_second': 8.419, 'eval_steps_per_second': 0.205, 'epoch': 3.0}

[Epoch 3] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  21       1]]
Fake   [[   7      12]]

Per-Class Metrics:
  Real: Precision=0.750, Recall=0.955
  Fake: Precision=0.923, Recall=0.632

Overall Metrics:
  Accuracy:  0.8049
  F1:        0.7500
  Precision: 0.9231
  Recall:    0.6316
  Loss:      0.6532

Best so far: Epoch 3, F1=0.7500

{'loss': 0.6633, 'grad_norm': 1.7586039304733276, 'learning_rate': 5e-06, 'epoch': 3.57}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.5721936225891113, 'eval_accuracy': 0.8048780487804879, 'eval_f1': 0.8260869565217391, 'eval_precision': 0.7037037037037037, 'eval_recall': 1.0, 'eval_runtime': 4.9388, 'eval_samples_per_second': 8.302, 'eval_steps_per_second': 0.202, 'epoch': 4.0}

[Epoch 4] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  14       8]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.636
  Fake: Precision=0.704, Recall=1.000

Overall Metrics:
  Accuracy:  0.8049
  F1:        0.8261
  Precision: 0.7037
  Recall:    1.0000
  Loss:      0.5722

Best so far: Epoch 4, F1=0.8261



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4825575649738312, 'eval_accuracy': 0.8048780487804879, 'eval_f1': 0.8260869565217391, 'eval_precision': 0.7037037037037037, 'eval_recall': 1.0, 'eval_runtime': 4.6104, 'eval_samples_per_second': 8.893, 'eval_steps_per_second': 0.217, 'epoch': 5.0}

[Epoch 5] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  14       8]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.636
  Fake: Precision=0.704, Recall=1.000

Overall Metrics:
  Accuracy:  0.8049
  F1:        0.8261
  Precision: 0.7037
  Recall:    1.0000
  Loss:      0.4826

Best so far: Epoch 4, F1=0.8261



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3731321096420288, 'eval_accuracy': 0.8536585365853658, 'eval_f1': 0.85, 'eval_precision': 0.8095238095238095, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.271, 'eval_samples_per_second': 9.6, 'eval_steps_per_second': 0.234, 'epoch': 6.0}

[Epoch 6] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  18       4]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.900, Recall=0.818
  Fake: Precision=0.810, Recall=0.895

Overall Metrics:
  Accuracy:  0.8537
  F1:        0.8500
  Precision: 0.8095
  Recall:    0.8947
  Loss:      0.3731

Best so far: Epoch 6, F1=0.8500



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.3241245448589325, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8717948717948718, 'eval_precision': 0.85, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.6368, 'eval_samples_per_second': 8.842, 'eval_steps_per_second': 0.216, 'epoch': 7.0}

[Epoch 7] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19       3]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.905, Recall=0.864
  Fake: Precision=0.850, Recall=0.895

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8718
  Precision: 0.8500
  Recall:    0.8947
  Loss:      0.3241

Best so far: Epoch 7, F1=0.8718

{'loss': 0.4052, 'grad_norm': 4.483792304992676, 'learning_rate': 1e-05, 'epoch': 7.14}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.29327571392059326, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8837209302325582, 'eval_precision': 0.7916666666666666, 'eval_recall': 1.0, 'eval_runtime': 4.5572, 'eval_samples_per_second': 8.997, 'eval_steps_per_second': 0.219, 'epoch': 8.0}

[Epoch 8] Confusion Matrix (New best)
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  17       5]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.773
  Fake: Precision=0.792, Recall=1.000

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8837
  Precision: 0.7917
  Recall:    1.0000
  Loss:      0.2933

Best so far: Epoch 8, F1=0.8837



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.34198325872421265, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8717948717948718, 'eval_precision': 0.85, 'eval_recall': 0.8947368421052632, 'eval_runtime': 4.6238, 'eval_samples_per_second': 8.867, 'eval_steps_per_second': 0.216, 'epoch': 9.0}

[Epoch 9] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  19       3]]
Fake   [[   2      17]]

Per-Class Metrics:
  Real: Precision=0.905, Recall=0.864
  Fake: Precision=0.850, Recall=0.895

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8718
  Precision: 0.8500
  Recall:    0.8947
  Loss:      0.3420

Best so far: Epoch 8, F1=0.8837



  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.4958080053329468, 'eval_accuracy': 0.8292682926829268, 'eval_f1': 0.8, 'eval_precision': 0.875, 'eval_recall': 0.7368421052631579, 'eval_runtime': 4.7439, 'eval_samples_per_second': 8.643, 'eval_steps_per_second': 0.211, 'epoch': 10.0}

[Epoch 10] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  20       2]]
Fake   [[   5      14]]

Per-Class Metrics:
  Real: Precision=0.800, Recall=0.909
  Fake: Precision=0.875, Recall=0.737

Overall Metrics:
  Accuracy:  0.8293
  F1:        0.8000
  Precision: 0.8750
  Recall:    0.7368
  Loss:      0.4958

Best so far: Epoch 8, F1=0.8837

{'loss': 0.2398, 'grad_norm': 1.7651311159133911, 'learning_rate': 1.5e-05, 'epoch': 10.71}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': 0.27769654989242554, 'eval_accuracy': 0.8780487804878049, 'eval_f1': 0.8837209302325582, 'eval_precision': 0.7916666666666666, 'eval_recall': 1.0, 'eval_runtime': 4.2262, 'eval_samples_per_second': 9.701, 'eval_steps_per_second': 0.237, 'epoch': 11.0}

[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  17       5]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.773
  Fake: Precision=0.792, Recall=1.000

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8837
  Precision: 0.7917
  Recall:    1.0000
  Loss:      0.2777

Best so far: Epoch 8, F1=0.8837

{'train_runtime': 339.7678, 'train_samples_per_second': 12.479, 'train_steps_per_second': 0.824, 'train_loss': 0.42893810086436085, 'epoch': 11.0}

TRAINING COMPLETE
Best Epoch: 8
Best F1: 0.8837
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-112
Checkpoint verified 

  0%|          | 0/1 [00:00<?, ?it/s]

Final Model F1: 0.8837
Expected Best F1: 0.8837
Correct: Best model was loaded

--------------------------------------------------------------------------------

Final trainer state
Best checkpoint: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/checkpoint-112
Best metric value (eval_f1): 0.8837209302325582
--------------------------------------------------------------------------------


  0%|          | 0/1 [00:00<?, ?it/s]


[Epoch 11] Confusion Matrix
Confusion Matrix:
              Predicted
           Real    Fake
Real   [[  17       5]]
Fake   [[   0      19]]

Per-Class Metrics:
  Real: Precision=1.000, Recall=0.773
  Fake: Precision=0.792, Recall=1.000

Overall Metrics:
  Accuracy:  0.8780
  F1:        0.8837
  Precision: 0.7917
  Recall:    1.0000
  Loss:      0.2933

Best so far: Epoch 8, F1=0.8837

Final model and tokenizer saved to: /Users/photoli93/Desktop/Projets perso Python/esa_fake_or_real/results/distilbert_fake_or_real/final_model


## Observations
### 1. Cross-Validation Summary (GroupKFold)
The cross-validation results show a pretty good degree of variability in the model's performance across the first 3 folds but vary more across the last fold. This might suggests that the model's performance is sensitive to the specific data split

| Fold   | Best Epoch | Best eval_f1 | Best Checkpoint |
| ------ | ---------- | ------------ | --------------- |
| Fold 1 | 10         | 0.8500       | checkpoint-100  |
| Fold 2 | 5          | 0.7660       | checkpoint-50   |
| Fold 3 | 11         | 0.8636       | checkpoint-110  |
| Fold 4 | 11         | 0.8636       | checkpoint-110  |

<ins>NB:</ins> The average F1-score is approximately 0.835

### 2. Final Training on Full Dataset
The final training run on the full training dataset (with a separate validation dataset for evaluation) achieved excellent metrics, confirming the model’s ability to effectively learn the classification task

| Epoch | `eval_f1` (Best metric) | `eval_accuracy` | `eval_loss`  | Best checkpoint |
| ----- | --------------------- | ------------- | ---------- | ----------- |
| 8     | 0.8837                | 0.8780        | 0.2933     | checkpoint-80       |

<ins>Best Performance:</ins> The model achieved its peak F1-score of 0.8837 at Epoch 8.
The early stopping makes training terminated after the evaluation of Epoch 11 due to the `early_stopping_patience=3`

<ins>Final saved model:</ins> The final model saved to `/final_model` is confirmed to be the one from Epoch 8 (**checkpoint-80**), with an F1-score of 0.8837

### 3. Class-Specific performance (Best model - Epoch 8)
The confusion matrix for the best model (Epoch 8) shows a highly asymmetric classification pattern, reflecting a bias toward identifying "Fake" texts

| **True Label** | **Predicted Real** | **Predicted Fake** | **Recall**           |
| -------------- | ------------------ | ------------------ | -------------------- |
| **Real (0)**   | 17 (True Negative) | 5 (False Positive) | 17 / 22 = **77.3%**  |
| **Fake (1)**   | 0 (False Negative) | 19 (True Positive) | 19 / 19 = **100.0%** |

The model correctly identifies perfectly the actual Real texts (Recall: 100%)

When the model predicts a text is Fake, it is correct 17 out of 22 times (Precision: 77.3%)

## Conclusion
The training run was successful and the final model is robust and well-performing on the validation set. It has strong performance with the final DistilBert model

# End of model training notebook