
# Pipeline de détection Deepfake vs Réel 
Ce notebook implémente l'architecture demandée :
- Extraction de frames depuis des vidéos
- (Optionnel) GAN-inversion → vecteurs latents `z` (placeholder / hook)
- Extraction de vecteurs latents par image (fallback: caractéristiques CNN)
- Analyse de l'espace latent : PCA / t-SNE / UMAP, moyenne/variance, distances, clustering
- Entraînement de classifieurs : SVM, MLP, Random Forest
- Évaluation et export du modèle

**Structure attendue des données** (à adapter) :
```
data/
  videos/
    real_001.mp4
    fake_001.mp4
  labels.csv   # colonnes: filename,label  (label in {real,fake})
```
> NOTE : La partie *GAN inversion* est délicate (dépend du GAN, checkpoints, et GPU). Le notebook fournit un *hook* où vous pouvez insérer votre méthode d'inversion (p.ex. e4e, pSp, Restyle, etc.). Si vous ne l'avez pas, le notebook utilisera un extracteur CNN (ResNet) comme approximation des latents.


In [1]:

# Installer les dépendances nécessaires (exécuté une seule fois)
# UMAP et moviepy peuvent être optionnels selon vos besoins.
import sys
import subprocess
def pip_install(packages):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "--quiet"] + packages)

packages = [
    "tqdm", "opencv-python", "scikit-learn", "matplotlib", "pandas", "joblib",
    "torch", "torchvision", "moviepy", "umap-learn"
]
pip_install(packages)
print("Packages installed. Restart the kernel if necessary.")

Packages installed. Restart the kernel if necessary.


In [2]:

# Imports et configuration
import os, math, io, json, warnings
from pathlib import Path
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, accuracy_score
import joblib
import cv2
import torch
import torchvision.transforms as T
import torchvision.models as models

# Fix plotting defaults (one plot per cell later)
%matplotlib inline

print('Imports OK')

  from .autonotebook import tqdm as notebook_tqdm


Imports OK



## 1) Extraction des frames depuis les vidéos
Fonctions pour extraire frames, sauver en répertoires par vidéo, et construire un DataFrame des images.


In [3]:

DATA_DIR = Path('data')
VIDEOS_DIR = DATA_DIR / 'videos'
FRAMES_DIR = DATA_DIR / 'frames'  # output
FRAMES_DIR.mkdir(parents=True, exist_ok=True)

def extract_frames_from_video(video_path, out_dir, fps=1, max_frames=None, resize=(256,256)):
    """Extract frames at `fps` frames per second. Save as JPEGs in out_dir/video_stem/"""
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    cap = cv2.VideoCapture(str(video_path))
    if not cap.isOpened():
        raise RuntimeError(f'Cannot open {video_path}')
    video_fps = cap.get(cv2.CAP_PROP_FPS) or 25.0
    step = max(1, int(round(video_fps / fps)))
    frame_idx = 0
    saved = 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        if frame_idx % step == 0:
            if resize is not None:
                frame = cv2.resize(frame, resize)
            out_path = out_dir / f"frame_{saved:05d}.jpg"
            cv2.imwrite(str(out_path), frame)
            saved += 1
            if max_frames and saved >= max_frames:
                break
        frame_idx += 1
    cap.release()
    return saved

def extract_all_frames(labels_csv='data/labels.csv', fps=1, max_frames_per_video=None, resize=(256,256)):
    labels = pd.read_csv(labels_csv)
    rows = []
    for _, row in tqdm(labels.iterrows(), total=len(labels)):
        fname = row['filename']
        label = row['label']
        video_path = VIDEOS_DIR / fname
        if not video_path.exists():
            print('Missing', video_path); continue
        out_dir = FRAMES_DIR / Path(fname).stem
        n = extract_frames_from_video(video_path, out_dir, fps=fps, max_frames=max_frames_per_video, resize=resize)
        for i in range(n):
            rows.append({'video': fname, 'frame_path': str(out_dir / f'frame_{i:05d}.jpg'), 'label': label})
    df = pd.DataFrame(rows)
    return df

# Example usage (uncomment to run):
# df_frames = extract_all_frames('data/labels.csv', fps=1, max_frames_per_video=30, resize=(256,256))
# df_frames.to_csv('data/frames_index.csv', index=False)



## 2) Extraction des vecteurs latents (hook)
- **Option A (GAN inversion)** : insérez ici votre fonction d'inversion GAN qui prend une image (PIL/np.array) et retourne un vecteur latent `z` (1D numpy array).
- **Option B (fallback CNN features)** : si vous ne disposez pas d'un inversion pipeline, on utilise un extracteur CNN (ResNet50 pretrained) et on prend la couche `avgpool` comme vecteur d'embedding.


In [4]:

from PIL import Image
import torchvision.transforms as transforms

# --- Prepare CNN fallback extractor (ResNet50) ---
device = 'cuda' if torch.cuda.is_available() else 'cpu'
resnet = models.resnet50(pretrained=True)
resnet = resnet.to(device)
resnet.eval()
# Remove final classifier to get features
feature_extractor = torch.nn.Sequential(*list(resnet.children())[:-1]).to(device)

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])
])

def cnn_get_latent(image_bgr):
    """Input: image as BGR numpy array (OpenCV). Output: 1D numpy latent vector."""
    img = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
    x = transform(img).unsqueeze(0).to(device)
    with torch.no_grad():
        feat = feature_extractor(x)  # shape (1,2048,1,1)
        feat = feat.reshape(feat.shape[0], -1).cpu().numpy().squeeze()
    return feat

# --- GAN inversion placeholder ---
def gan_inversion_get_latent(image_bgr):
    """PLACEHOLDER: Replace with your GAN inversion method.
    Should return a 1D numpy array latent vector z.
    Example signature kept identical to cnn_get_latent for compatibility.
    """
    raise NotImplementedError("Provide your GAN inversion function here.")

# Wrapper to choose method
def get_latent(image_path, method='cnn'):
    img = cv2.imread(str(image_path))
    if method == 'cnn':
        return cnn_get_latent(img)
    elif method == 'gan':
        return gan_inversion_get_latent(img)
    else:
        raise ValueError('Unknown method')

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to C:\Users\EliteLaptop/.cache\torch\hub\checkpoints\resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:09<00:00, 10.9MB/s]


In [5]:

# Batch extraction of latents and caching
def build_latents(df_frames, method='cnn', cache_path='data/latents.npz', batch_size=256, overwrite=False):
    cache_path = Path(cache_path)
    if cache_path.exists() and not overwrite:
        print('Loading cached latents...')
        data = np.load(cache_path, allow_pickle=True)
        latents = data['latents']
        paths = data['paths'].tolist()
        labels = data['labels'].tolist()
        return pd.DataFrame({'frame_path': paths, 'latent_idx': list(range(len(paths))), 'label': labels}), latents
    latents = []
    paths = []
    labels = []
    for i, row in tqdm(df_frames.iterrows(), total=len(df_frames)):
        p = row['frame_path']
        try:
            z = get_latent(p, method=method)
        except Exception as e:
            print('Error extracting', p, e); continue
        latents.append(z.astype(np.float32))
        paths.append(p)
        labels.append(row['label'])
    latents = np.stack(latents, axis=0)
    np.savez_compressed(cache_path, latents=latents, paths=np.array(paths), labels=np.array(labels))
    df_idx = pd.DataFrame({'frame_path': paths, 'latent_idx': list(range(len(paths))), 'label': labels})
    return df_idx, latents

# Example:
# df_frames = pd.read_csv('data/frames_index.csv')
# df_idx, latents = build_latents(df_frames, method='cnn', cache_path='data/latents.npz')



## 3) Analyse de l'espace latent
PCA / t-SNE / UMAP, statistiques (moyenne / variance), distances intra/inter-clusters, clustering simple (KMeans) et visualisation.


In [6]:

from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist

def analyze_latents(latents, labels, n_components_pca=50, tsne_perplexity=30, umap_n_neighbors=15):
    results = {}
    scaler = StandardScaler()
    Zs = scaler.fit_transform(latents)
    # PCA
    pca = PCA(n_components=min(n_components_pca, Zs.shape[1]))
    Z_pca = pca.fit_transform(Zs)
    results['pca'] = {'model': pca, 'embedding': Z_pca}
    # t-SNE (on PCA-reduced for speed)
    tsne = TSNE(n_components=2, verbose=1, perplexity=tsne_perplexity, init='random', random_state=42)
    Z_tsne = tsne.fit_transform(Z_pca[:, :min(50, Z_pca.shape[1])])
    results['tsne'] = {'model': tsne, 'embedding': Z_tsne}
    # UMAP
    um = umap.UMAP(n_components=2, n_neighbors=umap_n_neighbors, random_state=42)
    Z_umap = um.fit_transform(Z_pca[:, :min(50, Z_pca.shape[1])])
    results['umap'] = {'model': um, 'embedding': Z_umap}
    # Stats
    unique_labels = np.unique(labels)
    stats = {}
    for ul in unique_labels:
        idx = np.where(labels == ul)[0]
        stats[ul] = {'mean': latents[idx].mean(axis=0), 'var': latents[idx].var(axis=0), 'count': len(idx)}
    results['stats'] = stats
    # Distances intra/inter
    dists = {}
    for a in unique_labels:
        for b in unique_labels:
            idxa = np.where(labels == a)[0]
            idxb = np.where(labels == b)[0]
            if len(idxa)==0 or len(idxb)==0:
                d = np.nan
            else:
                d = cdist(latents[idxa], latents[idxb]).mean()
            dists[f'{a}_to_{b}'] = d
    results['distances'] = dists
    # Clustering
    kmeans = KMeans(n_clusters=min(8, len(unique_labels)*2), random_state=42)
    clusters = kmeans.fit_predict(Z_pca[:, :min(50, Z_pca.shape[1])])
    results['kmeans'] = {'model': kmeans, 'clusters': clusters}
    return results

def plot_embedding(embedding, labels, title='Embedding'):
    plt.figure(figsize=(8,6))
    labels_unique = np.unique(labels)
    for lab in labels_unique:
        idx = np.where(labels == lab)[0]
        plt.scatter(embedding[idx,0], embedding[idx,1], label=str(lab), s=8)
    plt.legend()
    plt.title(title)
    plt.xlabel('dim1'); plt.ylabel('dim2')
    plt.show()

# Example usage after obtaining latents:
# results = analyze_latents(latents, df_idx['label'].values)
# plot_embedding(results['umap']['embedding'], df_idx['label'].values, title='UMAP')


## 4) Entraînement des classifieurs (SVM, MLP, RandomForest)
Fonctions pour entraîner, valider via cross-validation et sauvegarder les modèles.


In [7]:

def train_and_evaluate(latents, labels, method_name='SVM', cv=5, random_state=42):
    X = latents.copy()
    y = np.array([1 if lab=='fake' else 0 for lab in labels])
    scaler = StandardScaler()
    Xs = scaler.fit_transform(X)
    models = {}
    if method_name == 'SVM':
        clf = SVC(kernel='rbf', probability=True, random_state=random_state)
    elif method_name == 'RF':
        clf = RandomForestClassifier(n_estimators=200, random_state=random_state)
    elif method_name == 'MLP':
        clf = MLPClassifier(hidden_layer_sizes=(512,256), max_iter=200, random_state=random_state)
    else:
        raise ValueError('Unknown method')
    # Cross-validation simple
    skf = StratifiedKFold(n_splits=cv, shuffle=True, random_state=random_state)
    y_true_all=[]; y_pred_all=[]; y_prob_all=[]
    for train_idx, test_idx in skf.split(Xs, y):
        clf.fit(Xs[train_idx], y[train_idx])
        y_pred = clf.predict(Xs[test_idx])
        y_prob = clf.predict_proba(Xs[test_idx])[:,1] if hasattr(clf, 'predict_proba') else None
        y_true_all.extend(y[test_idx].tolist())
        y_pred_all.extend(y_pred.tolist())
        if y_prob is not None:
            y_prob_all.extend(y_prob.tolist())
    report = classification_report(y_true_all, y_pred_all, output_dict=True)
    acc = accuracy_score(y_true_all, y_pred_all)
    roc = roc_auc_score(y_true_all, y_prob_all) if len(y_prob_all)>0 else None
    models['clf'] = clf
    models['scaler'] = scaler
    models['report'] = report
    models['accuracy'] = acc
    models['roc_auc'] = roc
    return models

# Train all three and compare
def train_compare_all(latents, labels):
    outcomes = {}
    for name in ['SVM', 'RF', 'MLP']:
        print('Training', name)
        m = train_and_evaluate(latents, labels, method_name=name)
        outcomes[name] = m
        print(name, 'acc=', m['accuracy'], 'roc_auc=', m['roc_auc'])
    return outcomes

# Save & load utility
def save_model(obj, path='models/model.joblib'):
    Path(path).parent.mkdir(parents=True, exist_ok=True)
    joblib.dump(obj, path)
def load_model(path='models/model.joblib'):
    return joblib.load(path)



### Agrégation par vidéo
Les latents sont par-frame. Pour classifier une vidéo, on agrégera (moyenne, médiane, ou vote majoritaire) les prédictions de ses frames.


In [8]:

def predict_video_from_frames(df_idx, latents, trained_model, agg='mean'):
    # df_idx: DataFrame with frame_path, latent_idx, label, video (optional)
    # trained_model: dict with 'clf' and 'scaler'
    clf = trained_model['clf']
    scaler = trained_model['scaler']
    df_idx = df_idx.copy()
    X = latents
    Xs = scaler.transform(X)
    probs = clf.predict_proba(Xs)[:,1] if hasattr(clf, 'predict_proba') else clf.decision_function(Xs)
    df_idx['prob_fake'] = probs
    # Need a video column: try to parse from frame_path
    def video_from_path(p): 
        parts = Path(p).parent.name
        return parts
    df_idx['video'] = df_idx['frame_path'].apply(video_from_path)
    agg_rows = []
    for vid, g in df_idx.groupby('video'):
        if agg == 'mean':
            score = g['prob_fake'].mean()
        elif agg == 'median':
            score = g['prob_fake'].median()
        elif agg == 'vote':
            score = (g['prob_fake']>0.5).mean()
        else:
            score = g['prob_fake'].mean()
        label = g['label'].iloc[0]
        pred = 'fake' if score>0.5 else 'real'
        agg_rows.append({'video': vid, 'score': score, 'label': label, 'pred': pred})
    return pd.DataFrame(agg_rows)

# Example usage:
# outcomes = train_compare_all(latents, df_idx['label'].values)
# df_video_preds = predict_video_from_frames(df_idx, latents, outcomes['RF'])



---
## Notes finales et recommandations
- **GAN inversion** : remplacez la fonction `gan_inversion_get_latent` par votre pipeline (e4e/psp/Restyle...). Les sorties doivent être vecteurs 1D comparables (même taille) pour toutes les images.
- **GPU** : l'inversion GAN et l'extraction CNN utilisent le GPU si disponible. Pour de gros jeux, utilisez traitement par lots et sauvegardez les latents.
- **Contrôle qualité** : vérifiez l'uniformité des latents (normes, outliers) et inspectez des projections (UMAP/t-SNE) pour anomalités.
- **Améliorations** : tests augmentés (flip/crop), architectures d'ensembles, time-series models (LSTM) sur séquences de latents pour capturer temporalité.
