# üî¨ Deepfake Detection Research - Google Colab

## Konfiguracja GPU:
**Runtime ‚Üí Change runtime type ‚Üí GPU**

| GPU | VRAM | Rekomendacja |
|-----|------|-------------|
| H100 | 80GB | üèÜ Najlepsza (Pro+) |
| A100 | 40GB | ‚≠ê ≈öwietna (Pro) |
| L4 | 24GB | ‚úÖ Dobra (Pro) |
| **T4** | 16GB | ‚úÖ **Free - wybierz to!** |

**W≈ÇƒÖcz te≈º: Du≈ºa ilo≈õƒá pamiƒôci RAM** ‚úÖ

## 1Ô∏è‚É£ Sprawd≈∫ GPU

In [None]:
# Sprawd≈∫ GPU
!nvidia-smi

import torch
print(f"\n{'='*50}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    vram = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"VRAM: {vram:.1f} GB")
    
    # Rekomendacja batch size
    if vram >= 40:
        print("\nüéØ Rekomendowany batch size: 64-128")
    elif vram >= 20:
        print("\nüéØ Rekomendowany batch size: 32-64")
    elif vram >= 15:
        print("\nüéØ Rekomendowany batch size: 16-32")
    else:
        print("\nüéØ Rekomendowany batch size: 8-16")

## 2Ô∏è‚É£ Sklonuj Repozytorium

In [None]:
# Repozytorium GitHub
GITHUB_REPO = "kordin33/DeepFake"

# Sklonuj repo
!git clone https://github.com/{GITHUB_REPO}.git

# Wejd≈∫ do folderu
%cd DeepFake

# Poka≈º strukturƒô
!ls -la

## 3Ô∏è‚É£ Zainstaluj Zale≈ºno≈õci

In [None]:
# Instaluj dodatkowe pakiety
!pip install -q scikit-learn tqdm matplotlib seaborn opencv-python-headless
!pip install -q wandb  # Weights & Biases
!pip install -q datasets huggingface_hub  # Do pobierania danych

print("\n‚úÖ Zale≈ºno≈õci zainstalowane!")

## 4Ô∏è‚É£ üîê Zaloguj siƒô do Weights & Biases

In [None]:
import wandb

# Zaloguj siƒô do W&B (pojawi siƒô link do wpisania klucza API)
wandb.login()

print("\n‚úÖ Zalogowano do Weights & Biases!")

## 5Ô∏è‚É£ Konfiguracja Eksperymentu

In [None]:
# =============================================================================
# KONFIGURACJA - ZMIE≈É TUTAJ
# =============================================================================

# W&B Project
WANDB_PROJECT = "msc-deepfake-detection"

# Eksperymenty do uruchomienia
EXPERIMENT = "all"  # "all", "baseline", "advanced", "ultimate"

# Training
EPOCHS = 20
BATCH_SIZE = 32  # Dostosuj do GPU (T4: 16-32, A100: 64-128)
LEARNING_RATE = 1e-4
IMG_SIZE = 224

# Data
MAX_PER_CLASS_A = 5000  # Max samples per class dla datasetu A
MAX_PER_CLASS_B = 2000  # Max samples per class dla datasetu B

# Augmentation
USE_SBI = True  # Self-Blended Images (rekomendowane!)

# CUDA
USE_COMPILE = True  # torch.compile() - 20-40% speedup

# Seed
SEED = 42

print(f"üìä Konfiguracja:")
print(f"   Experiment: {EXPERIMENT}")
print(f"   Epochs: {EPOCHS}")
print(f"   Batch size: {BATCH_SIZE}")
print(f"   Use SBI: {USE_SBI}")
print(f"   Use compile: {USE_COMPILE}")

## 6Ô∏è‚É£ Pobierz Dane z HuggingFace

In [None]:
# Pobierz i przygotuj dane u≈ºywajƒÖc wbudowanego skryptu
!python efficientnet_b0_deepfake.py --prepare --data-root ./data \
    --max-per-class-a {MAX_PER_CLASS_A} \
    --max-per-class-b {MAX_PER_CLASS_B}

# Sprawd≈∫ dane
print("\nüìÅ Struktura danych:")
!find ./data -type d | head -20

In [None]:
# Policz pliki w ka≈ºdym folderze
import os
from pathlib import Path

data_root = Path("./data")
for split_dir in data_root.rglob("*"):
    if split_dir.is_dir() and (split_dir / "fake").exists():
        fake_count = len(list((split_dir / "fake").glob("*")))
        real_count = len(list((split_dir / "real").glob("*")))
        print(f"{split_dir.relative_to(data_root)}: fake={fake_count}, real={real_count}")

## 7Ô∏è‚É£ Quick Test

In [None]:
# Test czy wszystkie modu≈Çy dzia≈ÇajƒÖ
!python quick_test.py

## 8Ô∏è‚É£ Setup CUDA Optimizations

In [None]:
import sys
sys.path.insert(0, '.')

from deepfake_research.utils.cuda_utils import (
    setup_cuda_optimizations,
    print_cuda_memory_stats,
)

# Setup optimizations
cuda_config = setup_cuda_optimizations(
    use_compile=USE_COMPILE,
    use_cudnn_benchmark=True,
    use_tf32=True,
    verbose=True,
)

print("\n" + "="*50)
print_cuda_memory_stats()

## 9Ô∏è‚É£ üöÄ Uruchom Eksperymenty z W&B Logging

In [None]:
# Uruchom eksperymenty z logowaniem do W&B
!python run_experiments.py \
    --experiment {EXPERIMENT} \
    --epochs {EPOCHS} \
    --batch-size {BATCH_SIZE} \
    --lr {LEARNING_RATE} \
    --data-root ./data \
    --output-dir ./experiments \
    --wandb \
    --wandb-project {WANDB_PROJECT} \
    {'--use-sbi' if USE_SBI else ''} \
    {'--compile' if USE_COMPILE else ''} \
    --seed {SEED}

## üîü Alternatywnie: Rƒôczne Trenowanie z Pe≈ÇnƒÖ KontrolƒÖ

In [None]:
import sys
sys.path.insert(0, '.')

import torch
import wandb
from deepfake_research.models.factory import create_model
from deepfake_research.data.datasets import create_dataloaders
from deepfake_research.training.trainer import Trainer
from deepfake_research.training.optimizers import get_optimizer, get_scheduler
from deepfake_research.training.losses import DeepfakeLoss
from deepfake_research.utils.cuda_utils import compile_model

# Lista eksperyment√≥w do uruchomienia
MODELS_TO_TRAIN = [
    "baseline_efficientnet",
    "baseline_vit", 
    "freq_efficientnet",
    "attention_efficientnet",
    "hybrid",
    "ultimate",
]

# Lub pojedynczy model:
# MODELS_TO_TRAIN = ["ultimate"]

device = "cuda" if torch.cuda.is_available() else "cpu"

# Za≈Çaduj dane
print("Loading data...")
loaders = create_dataloaders(
    data_root="./data",
    batch_size=BATCH_SIZE,
    num_workers=2,
    use_sbi=USE_SBI,
    sbi_probability=0.3,
)

print(f"Train: {len(loaders['train'].dataset)} samples")
print(f"Val: {len(loaders['val'].dataset)} samples")
print(f"Test A: {len(loaders['test_A'].dataset)} samples")
print(f"Test B: {len(loaders['test_B'].dataset)} samples")

In [None]:
# Trenuj ka≈ºdy model z logowaniem do W&B
results = {}

for model_name in MODELS_TO_TRAIN:
    print(f"\n{'='*60}")
    print(f"üöÄ Training: {model_name}")
    print(f"{'='*60}")
    
    # Inicjalizuj W&B run
    run = wandb.init(
        project=WANDB_PROJECT,
        name=model_name,
        config={
            "model": model_name,
            "stage": "training",
            "img_size": IMG_SIZE,
            "batch_size": BATCH_SIZE,
            "lr": LEARNING_RATE,
            "epochs": EPOCHS,
            "use_sbi": USE_SBI,
            "use_compile": USE_COMPILE,
            "seed": SEED,
        },
        reinit=True,  # Pozwala na wiele run√≥w w jednej sesji
    )
    
    try:
        # Stw√≥rz model
        model = create_model(model_name)
        
        # Kompiluj dla speedup
        if USE_COMPILE:
            model = compile_model(model, mode="default")
        
        # Optimizer
        optimizer = get_optimizer(model, lr=LEARNING_RATE)
        scheduler = get_scheduler(optimizer, epochs=EPOCHS)
        criterion = DeepfakeLoss(loss_type='smooth', label_smoothing=0.1)
        
        # Trainer
        trainer = Trainer(
            model=model,
            train_loader=loaders['train'],
            val_loader=loaders['val'],
            optimizer=optimizer,
            scheduler=scheduler,
            criterion=criterion,
            device=device,
            epochs=EPOCHS,
            use_amp=True,
            early_stopping=True,
            patience=5,
            save_dir=f"./experiments/{model_name}",
            experiment_name=model_name,
            use_wandb=True,
        )
        
        # Trenuj
        result = trainer.train()
        results[model_name] = result
        
        print(f"\n‚úÖ {model_name}: Best Val Acc = {result['best_val_acc']:.4f}")
        
    except Exception as e:
        print(f"\n‚ùå Error training {model_name}: {e}")
        
    finally:
        wandb.finish()

print("\n" + "="*60)
print("üèÅ All training completed!")
print("="*60)

## 1Ô∏è‚É£1Ô∏è‚É£ Benchmark i Por√≥wnanie

In [None]:
from deepfake_research.evaluation.benchmark import Benchmark
from deepfake_research.evaluation.metrics import MetricsComputer

# Wczytaj wytrenowane modele i zbenchmarkuj
print("Loading trained models for benchmarking...")

# Inicjalizuj W&B dla benchmarku
benchmark_run = wandb.init(
    project=WANDB_PROJECT,
    name="benchmark_comparison",
    config={"stage": "benchmark"},
    reinit=True,
)

test_loaders = {
    'test_A': loaders['test_A'],
    'test_B': loaders['test_B'],
}

benchmark = Benchmark(
    dataloaders=test_loaders,
    device=device,
    output_dir="./experiments/benchmark",
)

# Dodaj ka≈ºdy wytrenowany model do benchmarku
from pathlib import Path

for model_name in MODELS_TO_TRAIN:
    checkpoint_path = Path(f"./experiments/{model_name}/{model_name}_best.pth")
    if checkpoint_path.exists():
        print(f"\nBenchmarking {model_name}...")
        model = create_model(model_name)
        checkpoint = torch.load(checkpoint_path, map_location=device)
        model.load_state_dict(checkpoint['model_state_dict'])
        model = model.to(device)
        
        benchmark.add_model(model, model_name)

# Print and log comparison
benchmark.print_comparison()

# Log to W&B
comparison = benchmark.compare()
for model_name, data in comparison['models'].items():
    wandb.log({
        f"{model_name}/mean_accuracy": data['mean_accuracy'],
        f"{model_name}/mean_auc": data['mean_auc'],
        f"{model_name}/params_millions": data['params_millions'],
    })

wandb.finish()

# Save results
benchmark.save_results("full_benchmark.json")

## 1Ô∏è‚É£2Ô∏è‚É£ Wy≈õwietl Wyniki

In [None]:
# Poka≈º raport
!cat experiments/benchmark/BENCHMARK_REPORT.md

In [None]:
# Wy≈õwietl wykresy
from IPython.display import Image, display
import os

for img_file in ['cross_dataset_heatmap.png', 'model_comparison.png']:
    path = f'experiments/{img_file}'
    if os.path.exists(path):
        print(f"\n{img_file}:")
        display(Image(path))

In [None]:
# Szczeg√≥≈Çowe wyniki
import json

with open('experiments/benchmark/full_benchmark.json', 'r') as f:
    results = json.load(f)

print("=" * 60)
print("üèÜ RANKING MODELI")
print("=" * 60)

# Sort by mean accuracy
sorted_models = sorted(
    results['detailed_results'].items(),
    key=lambda x: x[1]['summary']['mean_accuracy'],
    reverse=True
)

for i, (model_name, data) in enumerate(sorted_models, 1):
    summary = data['summary']
    medal = "ü•á" if i == 1 else "ü•à" if i == 2 else "ü•â" if i == 3 else f"{i}."
    print(f"\n{medal} {model_name}")
    print(f"   Mean Accuracy: {summary['mean_accuracy']:.4f}")
    print(f"   Mean AUC: {summary['mean_auc']:.4f}")
    print(f"   Parameters: {summary['params_millions']:.2f}M")

## 1Ô∏è‚É£3Ô∏è‚É£ Zapisz na Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import shutil
from datetime import datetime

# Nazwa folderu z datƒÖ
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
dest_folder = f"/content/drive/MyDrive/deepfake_results_{timestamp}"

# Kopiuj wyniki
shutil.copytree("./experiments", dest_folder)
print(f"\n‚úÖ Wyniki zapisane do: {dest_folder}")

# Poka≈º co zapisano
!ls -la {dest_folder}

---

## üìä Summary

### Links:
- **W&B Dashboard**: https://wandb.ai/YOUR_USERNAME/msc-deepfake-detection
- **GitHub Repo**: https://github.com/kordin33/DeepFake

### Next Steps:
1. Przejrzyj wyniki na W&B
2. Pobierz najlepszy model z checkpointu
3. Dodaj wiƒôcej danych lub eksperyment√≥w
4. Przeprowad≈∫ ablation study