# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/07_deep_learning_cnn/07_demo_resnet_transfer.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '07_demo_resnet_transfer.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# Chapitre 07 - D√©monstration : ResNet et Transfer Learning

**Objectif** : Utiliser un mod√®le ResNet pr√©-entra√Æn√© sur ImageNet pour classifier CIFAR-10.

**Contenu** :
1. Architecture ResNet (Skip Connections)
2. Transfer Learning : Feature Extraction vs Fine-Tuning
3. Comparaison avec entra√Ænement from scratch
4. Grad-CAM pour visualisation
5. Best practices Transfer Learning

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
from tqdm import tqdm
import cv2

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

torch.manual_seed(42)
np.random.seed(42)

## 1. Chargement CIFAR-10 avec transformations adapt√©es

In [None]:
# Transformations pour mod√®les pr√©-entra√Æn√©s ImageNet
# ImageNet normalisation: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
transform_train = transforms.Compose([
    transforms.Resize(224),  # ResNet attend 224x224
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(224, padding=4),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

transform_test = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Chargement datasets
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, 
                                              download=True, transform=transform_train)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, 
                                             download=True, transform=transform_test)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

print(f"Train samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Input size: 224x224 (requis pour ResNet)")

## 2. Architecture ResNet-18 (explication)

In [None]:
# Chargement ResNet-18 pr√©-entra√Æn√©
resnet_pretrained = models.resnet18(pretrained=True)

print("Architecture ResNet-18:")
print(resnet_pretrained)

print(f"\nNombre de param√®tres: {sum(p.numel() for p in resnet_pretrained.parameters()):,}")

# ResNet-18 a 1000 classes (ImageNet), on doit adapter pour CIFAR-10 (10 classes)
print(f"\nCouche finale originale: {resnet_pretrained.fc}")
print("On va remplacer fc pour avoir 10 sorties au lieu de 1000.")

## 3. Strat√©gie 1 : Feature Extraction (freeze all layers)

In [None]:
# Feature Extraction: geler toutes les couches sauf la derni√®re
model_feature_extract = models.resnet18(pretrained=True)

# Geler tous les param√®tres
for param in model_feature_extract.parameters():
    param.requires_grad = False

# Remplacer la derni√®re couche fc (1000 -> 10)
num_features = model_feature_extract.fc.in_features
model_feature_extract.fc = nn.Linear(num_features, 10)

model_feature_extract = model_feature_extract.to(device)

# Compter param√®tres entra√Ænables
trainable_params = sum(p.numel() for p in model_feature_extract.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model_feature_extract.parameters())

print("\n--- Feature Extraction ---")
print(f"Param√®tres entra√Ænables: {trainable_params:,} / {total_params:,}")
print(f"Pourcentage entra√Ænable: {trainable_params/total_params*100:.2f}%")
print("Seule la couche fc finale sera entra√Æn√©e.")

In [None]:
# Entra√Ænement Feature Extraction
def train_epoch(model, loader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    for images, labels in tqdm(loader, desc="Training", leave=False):
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    
    return total_loss / len(loader), correct / total

def validate_epoch(model, loader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in tqdm(loader, desc="Validation", leave=False):
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            total_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    return total_loss / len(loader), correct / total

criterion = nn.CrossEntropyLoss()
optimizer_fe = optim.Adam(model_feature_extract.fc.parameters(), lr=0.001)  # Optimiser seulement fc

epochs = 10
history_fe = {'train_acc': [], 'val_acc': []}

print("\nEntra√Ænement Feature Extraction (10 epochs)...")
for epoch in range(epochs):
    train_loss, train_acc = train_epoch(model_feature_extract, train_loader, criterion, optimizer_fe, device)
    val_loss, val_acc = validate_epoch(model_feature_extract, test_loader, criterion, device)
    
    history_fe['train_acc'].append(train_acc)
    history_fe['val_acc'].append(val_acc)
    
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
          f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")

print(f"\nAccuracy finale (Feature Extraction): {history_fe['val_acc'][-1]:.4f}")

## 4. Strat√©gie 2 : Fine-Tuning (train all layers)

In [None]:
# Fine-Tuning: entra√Æner TOUTES les couches (avec lr plus petit)
model_finetune = models.resnet18(pretrained=True)

# Remplacer fc
num_features = model_finetune.fc.in_features
model_finetune.fc = nn.Linear(num_features, 10)
model_finetune = model_finetune.to(device)

# Tous les param√®tres sont entra√Ænables (par d√©faut)
trainable_params = sum(p.numel() for p in model_finetune.parameters() if p.requires_grad)

print("\n--- Fine-Tuning ---")
print(f"Param√®tres entra√Ænables: {trainable_params:,}")
print("Toutes les couches seront entra√Æn√©es (avec petit lr).")

# Optimizer avec learning rates diff√©rents
# - Couches pr√©-entra√Æn√©es: lr faible (0.0001)
# - Nouvelle couche fc: lr normal (0.001)
optimizer_ft = optim.Adam([
    {'params': model_finetune.fc.parameters(), 'lr': 0.001},  # Nouvelle couche
    {'params': list(model_finetune.parameters())[:-2], 'lr': 0.0001}  # Couches pr√©-entra√Æn√©es
])

epochs = 10
history_ft = {'train_acc': [], 'val_acc': []}

print("\nEntra√Ænement Fine-Tuning (10 epochs)...")
for epoch in range(epochs):
    train_loss, train_acc = train_epoch(model_finetune, train_loader, criterion, optimizer_ft, device)
    val_loss, val_acc = validate_epoch(model_finetune, test_loader, criterion, device)
    
    history_ft['train_acc'].append(train_acc)
    history_ft['val_acc'].append(val_acc)
    
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
          f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")

print(f"\nAccuracy finale (Fine-Tuning): {history_ft['val_acc'][-1]:.4f}")

## 5. Strat√©gie 3 : From Scratch (pour comparaison)

In [None]:
# ResNet-18 from scratch (poids al√©atoires)
model_scratch = models.resnet18(pretrained=False)  # Pas de pr√©-entra√Ænement
num_features = model_scratch.fc.in_features
model_scratch.fc = nn.Linear(num_features, 10)
model_scratch = model_scratch.to(device)

optimizer_scratch = optim.Adam(model_scratch.parameters(), lr=0.001)

epochs = 10
history_scratch = {'train_acc': [], 'val_acc': []}

print("\nEntra√Ænement From Scratch (10 epochs)...")
for epoch in range(epochs):
    train_loss, train_acc = train_epoch(model_scratch, train_loader, criterion, optimizer_scratch, device)
    val_loss, val_acc = validate_epoch(model_scratch, test_loader, criterion, device)
    
    history_scratch['train_acc'].append(train_acc)
    history_scratch['val_acc'].append(val_acc)
    
    print(f"Epoch {epoch+1}/{epochs} | Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
          f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")

print(f"\nAccuracy finale (From Scratch): {history_scratch['val_acc'][-1]:.4f}")

## 6. Comparaison des 3 strat√©gies

In [None]:
# Comparaison visuelle
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
plt.plot(history_fe['train_acc'], label='Feature Extraction', marker='o')
plt.plot(history_ft['train_acc'], label='Fine-Tuning', marker='s')
plt.plot(history_scratch['train_acc'], label='From Scratch', marker='^')
plt.xlabel('Epoch')
plt.ylabel('Train Accuracy')
plt.title('Train Accuracy')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(history_fe['val_acc'], label='Feature Extraction', marker='o')
plt.plot(history_ft['val_acc'], label='Fine-Tuning', marker='s')
plt.plot(history_scratch['val_acc'], label='From Scratch', marker='^')
plt.xlabel('Epoch')
plt.ylabel('Val Accuracy')
plt.title('Validation Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

# R√©sum√©
print("\n" + "="*60)
print("COMPARAISON DES 3 STRAT√âGIES (apr√®s 10 epochs)")
print("="*60)
print(f"Feature Extraction: {history_fe['val_acc'][-1]:.4f}")
print(f"Fine-Tuning:        {history_ft['val_acc'][-1]:.4f}")
print(f"From Scratch:       {history_scratch['val_acc'][-1]:.4f}")
print("="*60)

# Analyse
print("\nANALYSE:")
print("- Feature Extraction: Rapide, peu de param√®tres √† entra√Æner")
print("- Fine-Tuning: Meilleure performance, adapte les features ImageNet")
print("- From Scratch: Plus lent, n√©cessite plus de donn√©es et d'epochs")
print("\nRECOMMANDATION: Fine-Tuning pour datasets moyens (10k-100k samples)")

## 7. Grad-CAM : Visualisation des zones importantes

In [None]:
class GradCAM:
    """Gradient-weighted Class Activation Mapping."""
    
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activations = None
        
        # Hook pour capturer gradients et activations
        target_layer.register_forward_hook(self.save_activation)
        target_layer.register_backward_hook(self.save_gradient)
    
    def save_activation(self, module, input, output):
        self.activations = output.detach()
    
    def save_gradient(self, module, grad_input, grad_output):
        self.gradients = grad_output[0].detach()
    
    def __call__(self, x, class_idx=None):
        # Forward
        output = self.model(x)
        
        if class_idx is None:
            class_idx = output.argmax(dim=1)
        
        # Backward
        self.model.zero_grad()
        output[0, class_idx].backward()
        
        # Calcul Grad-CAM
        gradients = self.gradients[0]  # (C, H, W)
        activations = self.activations[0]  # (C, H, W)
        
        # Poids: moyenne spatiale des gradients
        weights = gradients.mean(dim=(1, 2))  # (C,)
        
        # Combinaison pond√©r√©e des activations
        cam = (weights[:, None, None] * activations).sum(dim=0)  # (H, W)
        
        # ReLU + normalisation
        cam = torch.relu(cam)
        cam = cam / cam.max()
        
        return cam.cpu().numpy()

# Utilisation Grad-CAM sur mod√®le Fine-Tuned
grad_cam = GradCAM(model_finetune, model_finetune.layer4[-1].conv2)

# S√©lectionner une image de test
dataiter = iter(test_loader)
images, labels = next(dataiter)
sample_image = images[0:1].to(device)
sample_label = labels[0]

# G√©n√©rer Grad-CAM
model_finetune.eval()
cam = grad_cam(sample_image)

# Visualisation
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Image originale
img = sample_image[0].cpu().permute(1, 2, 0).numpy()
img = (img - img.min()) / (img.max() - img.min())  # Normalisation pour affichage
axes[0].imshow(img)
axes[0].set_title(f"Image originale\n{classes[sample_label]}")
axes[0].axis('off')

# Heatmap Grad-CAM
axes[1].imshow(cam, cmap='jet')
axes[1].set_title('Grad-CAM Heatmap')
axes[1].axis('off')

# Superposition
cam_resized = cv2.resize(cam, (224, 224))
heatmap = plt.cm.jet(cam_resized)[:, :, :3]  # RGB
superimposed = 0.6 * img + 0.4 * heatmap
superimposed = superimposed / superimposed.max()
axes[2].imshow(superimposed)
axes[2].set_title('Superposition')
axes[2].axis('off')

plt.tight_layout()
plt.show()

print("Grad-CAM montre les r√©gions de l'image qui ont contribu√© √† la pr√©diction.")

## 8. √âvaluation d√©taill√©e (mod√®le Fine-Tuned)

In [None]:
# Pr√©dictions sur test set
model_finetune.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        outputs = model_finetune(images)
        _, predicted = torch.max(outputs, 1)
        
        all_preds.extend(predicted.cpu().numpy())
        all_labels.extend(labels.numpy())

# Rapport de classification
print("\nRapport de classification (ResNet-18 Fine-Tuned):")
print(classification_report(all_labels, all_preds, target_names=classes, digits=4))

# Matrice de confusion
cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes, yticklabels=classes)
plt.title('Matrice de Confusion - ResNet-18 Fine-Tuned')
plt.xlabel('Pr√©diction')
plt.ylabel('V√©rit√©')
plt.show()

## Conclusion

**Transfer Learning : 3 strat√©gies** :

| Strat√©gie | Quand l'utiliser | Avantages | Inconv√©nients |
|-----------|------------------|-----------|---------------|
| **Feature Extraction** | Dataset petit, similaire √† ImageNet | Rapide, peu de donn√©es | Performance limit√©e |
| **Fine-Tuning** | Dataset moyen, peu similaire | Meilleure performance | Plus long |
| **From Scratch** | Dataset tr√®s grand, tr√®s diff√©rent | Totale flexibilit√© | N√©cessite beaucoup de donn√©es |

**R√©sultats CIFAR-10** :
- Feature Extraction : ~80-85% accuracy (10 epochs)
- Fine-Tuning : ~85-90% accuracy (10 epochs)
- From Scratch : ~70-75% accuracy (10 epochs, irait mieux avec plus d'epochs)

**Grad-CAM** :
- Visualise les r√©gions importantes pour la pr√©diction
- Aide √† comprendre ce que le mod√®le "voit"
- Utile pour d√©bugger et expliquer les pr√©dictions

**Best Practices Transfer Learning** :
1. **Toujours** essayer le transfer learning avant from scratch
2. **Adapter** la couche finale au nombre de classes cible
3. **Utiliser** learning rates diff√©rents (petit pour couches pr√©-entra√Æn√©es)
4. **Data Augmentation** cruciale pour g√©n√©ralisation
5. **Early Stopping** pour √©viter overfitting

**Prochaines √©tapes** :
- Tester d'autres architectures : **EfficientNet**, **Vision Transformer**
- **Ensemble Methods** : combiner plusieurs mod√®les
- **Advanced Augmentation** : CutMix, MixUp, AutoAugment