<a href="https://colab.research.google.com/gist/maclandrol/aa48a4bdba09ddae1f5158d715bc5671/10_TorchXRayVision_Dataset_Personnalise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Enseignant:** Emmanuel Noutahi, PhD

# Tutorial 6 : Int√©gration de Datasets Personnalis√©s
# Tutorial 6: Custom Dataset Integration

---

## üìö Contexte M√©dical / Medical Context

### Pour les √âtudiants en M√©decine / For Medical Students
L'**int√©gration de donn√©es personnalis√©es** est essentielle pour :
- **Recherche m√©dicale** : Analyser vos propres donn√©es cliniques
- **√âtudes sp√©cialis√©es** : Focus sur pathologies sp√©cifiques
- **Validation locale** : Adapter l'IA √† votre population
- **Projets √©tudiants** : Mener des recherches originales

### Pour les Praticiens / For Practitioners
- **Chirurgiens** : Validation sur cas chirurgicaux sp√©cifiques
- **M√©decins G√©n√©ralistes** : Adaptation aux sp√©cificit√©s locales
- **Enseignants** : Cr√©ation de cas p√©dagogiques personnalis√©s

---

## üéØ Objectifs d'Apprentissage / Learning Objectives

√Ä la fin de ce tutoriel, vous serez capable de :
1. **Charger facilement** vos propres images radiologiques
2. **Organiser** un dataset personnalis√© pour l'analyse
3. **Appliquer** les mod√®les TorchXRayVision sur vos donn√©es
4. **Analyser en lot** plusieurs images simultan√©ment
5. **G√©n√©rer** des rapports comparatifs pour votre dataset
6. **Exporter** les r√©sultats pour usage clinique ou recherche

---

## üí° Fonctionnalit√©s Colab Faciles / Easy Colab Features

Ce tutoriel est con√ßu pour **Google Colab** avec :
- **Glisser-d√©poser** d'images multiples
- **Organisation automatique** des donn√©es
- **Traitement par lot** simplifi√©
- **Visualisations interactives**
- **Export automatique** des r√©sultats

---

## üìã Pr√©requis / Prerequisites

Ce tutoriel fait suite aux **Tutoriels 1-4**. Vous devriez ma√Ætriser :
- Utilisation de base de TorchXRayVision
- Classification et d√©tection de pathologies
- Interpr√©tation des r√©sultats

---

## üîß Installation et Configuration / Setup

In [None]:
# Installation des biblioth√®ques n√©cessaires / Install required libraries
!pip install torchxrayvision
!pip install torch torchvision
!pip install matplotlib seaborn
!pip install numpy pandas
!pip install scikit-image opencv-python
!pip install tqdm  # Barre de progression
!pip install zipfile36  # Pour g√©rer les archives

print("‚úÖ Installation termin√©e / Installation completed")

In [None]:
# Import des biblioth√®ques / Import libraries
import torch
import torch.nn as nn
import torchxrayvision as xrv
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from PIL import Image
import cv2
import os
import glob
import zipfile
import io
import json
from google.colab import files
from tqdm import tqdm
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Configuration de l'affichage / Display configuration
plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams['font.size'] = 12
sns.set_style("whitegrid")

print("üìö Biblioth√®ques import√©es avec succ√®s / Libraries imported successfully")
print(f"üî• PyTorch version: {torch.__version__}")
print(f"üè• TorchXRayVision version: {xrv.__version__}")

# V√©rification du GPU / GPU check
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"üíª Device utilis√© / Device used: {device}")

# Cr√©er dossier de travail pour les donn√©es
os.makedirs('custom_dataset', exist_ok=True)
os.makedirs('results', exist_ok=True)
print("üìÅ Dossiers de travail cr√©√©s / Working directories created")

## ü§ñ Chargement des Mod√®les / Loading Models

In [None]:
# Chargement des mod√®les TorchXRayVision / Load TorchXRayVision models
print("üîÑ Chargement des mod√®les pour analyse de dataset personnalis√©...")
print("üîÑ Loading models for custom dataset analysis...")

models = {}
model_info = {}

# 1. Mod√®le principal pour classification
try:
    models['densenet'] = xrv.models.DenseNet(weights="densenet121-res224-all")
    models['densenet'].to(device)
    models['densenet'].eval()
    model_info['densenet'] = {
        'name': 'DenseNet121-All',
        'pathologies': models['densenet'].pathologies,
        'description': 'Mod√®le g√©n√©ral pour toutes pathologies'
    }
    print("‚úÖ DenseNet121-All charg√© / loaded")
except Exception as e:
    print(f"‚ùå Erreur DenseNet: {e}")

# 2. Mod√®le CheXpert
try:
    models['chexpert'] = xrv.models.DenseNet(weights="densenet121-res224-chexpert")
    models['chexpert'].to(device)
    models['chexpert'].eval()
    model_info['chexpert'] = {
        'name': 'CheXpert',
        'pathologies': models['chexpert'].pathologies,
        'description': 'Sp√©cialis√© pour donn√©es CheXpert'
    }
    print("‚úÖ CheXpert charg√© / loaded")
except Exception as e:
    print(f"‚ùå Erreur CheXpert: {e}")

# 3. Mod√®le NIH
try:
    models['nih'] = xrv.models.DenseNet(weights="densenet121-res224-nih")
    models['nih'].to(device)
    models['nih'].eval()
    model_info['nih'] = {
        'name': 'NIH',
        'pathologies': models['nih'].pathologies,
        'description': 'Entra√Æn√© sur dataset NIH'
    }
    print("‚úÖ NIH charg√© / loaded")
except Exception as e:
    print(f"‚ùå Erreur NIH: {e}")

if not models:
    raise Exception("Aucun mod√®le n'a pu √™tre charg√©")

# S√©lectionner le mod√®le principal
main_model_key = list(models.keys())[0]
main_model = models[main_model_key]

print(f"\nüéØ Mod√®les disponibles: {list(models.keys())}")
print(f"üè• Mod√®le principal: {model_info[main_model_key]['name']}")
print(f"üìä Pathologies d√©tectables: {len(main_model.pathologies)}")

## üì§ Chargement de votre Dataset / Upload Your Dataset

### üéØ M√©thodes de Chargement Faciles / Easy Upload Methods

Choisissez votre m√©thode pr√©f√©r√©e :

### M√©thode 1 : Images Individuelles / Individual Images
Id√©al pour 1-10 images

In [None]:
# Chargement d'images individuelles / Upload individual images
print("üì§ M√âTHODE 1: CHARGEMENT D'IMAGES INDIVIDUELLES")
print("üì§ METHOD 1: INDIVIDUAL IMAGE UPLOAD")
print("-" * 60)
print("üí° Conseil: Cette m√©thode est id√©ale pour 1-10 images")
print("üí° Tip: This method is ideal for 1-10 images")
print("")
print("üî∏ Formats support√©s / Supported formats: .jpg, .jpeg, .png, .tiff")
print("üî∏ Vous pouvez s√©lectionner plusieurs images en m√™me temps")
print("üî∏ You can select multiple images at once")
print("")
print("üëÜ Cliquez 'Choisir des fichiers' ci-dessous:")

# Interface de chargement
uploaded_files = files.upload()

individual_images = []
individual_filenames = []

if uploaded_files:
    print(f"\nüìÅ {len(uploaded_files)} fichier(s) charg√©(s) / file(s) uploaded")
    
    for filename, file_content in uploaded_files.items():
        try:
            # V√©rifier l'extension
            if not filename.lower().endswith(('.jpg', '.jpeg', '.png', '.tiff', '.tif')):
                print(f"‚ö†Ô∏è {filename}: Format non support√©")
                continue
            
            # Charger l'image
            image = Image.open(io.BytesIO(file_content))
            
            # Convertir en niveaux de gris si n√©cessaire
            if image.mode != 'L':
                image = image.convert('L')
            
            # Convertir en array numpy
            img_array = np.array(image)
            
            individual_images.append(img_array)
            individual_filenames.append(filename)
            
            print(f"‚úÖ {filename}: {img_array.shape} - Charg√©e avec succ√®s")
            
        except Exception as e:
            print(f"‚ùå Erreur avec {filename}: {e}")
    
    print(f"\nüéâ {len(individual_images)} image(s) pr√™te(s) pour l'analyse!")
    
    # Aper√ßu rapide des images charg√©es
    if len(individual_images) > 0:
        n_preview = min(4, len(individual_images))
        fig, axes = plt.subplots(1, n_preview, figsize=(4*n_preview, 4))
        
        if n_preview == 1:
            axes = [axes]
        
        for i in range(n_preview):
            axes[i].imshow(individual_images[i], cmap='gray')
            axes[i].set_title(f'{individual_filenames[i][:20]}...\n{individual_images[i].shape}')
            axes[i].axis('off')
        
        plt.suptitle('Aper√ßu des Images Charg√©es / Preview of Uploaded Images')
        plt.tight_layout()
        plt.show()

else:
    print("‚ÑπÔ∏è Aucune image charg√©e / No images uploaded")

### M√©thode 2 : Archive ZIP / ZIP Archive
Id√©al pour beaucoup d'images (10+)

In [None]:
# Chargement d'archive ZIP / Upload ZIP archive
print("üì¶ M√âTHODE 2: CHARGEMENT D'ARCHIVE ZIP")
print("üì¶ METHOD 2: ZIP ARCHIVE UPLOAD")
print("-" * 60)
print("üí° Conseil: Cette m√©thode est id√©ale pour 10+ images")
print("üí° Tip: This method is ideal for 10+ images")
print("")
print("üî∏ Cr√©ez un fichier .zip contenant vos images de radiographies")
print("üî∏ Create a .zip file containing your X-ray images")
print("üî∏ Les images peuvent √™tre dans des sous-dossiers")
print("üî∏ Images can be in subdirectories")
print("")
print("üëÜ Cliquez 'Choisir des fichiers' pour s√©lectionner votre fichier .zip:")

# Interface de chargement ZIP
uploaded_zip = files.upload()

zip_images = []
zip_filenames = []

if uploaded_zip:
    zip_filename = list(uploaded_zip.keys())[0]
    
    if zip_filename.lower().endswith('.zip'):
        print(f"üì¶ Extraction de {zip_filename}...")
        
        try:
            # Cr√©er le fichier ZIP temporaire
            with open('temp_dataset.zip', 'wb') as f:
                f.write(uploaded_zip[zip_filename])
            
            # Extraire le ZIP
            with zipfile.ZipFile('temp_dataset.zip', 'r') as zip_ref:
                zip_ref.extractall('custom_dataset')
            
            # Trouver toutes les images dans l'extraction
            image_extensions = ['*.jpg', '*.jpeg', '*.png', '*.tiff', '*.tif']
            image_files = []
            
            for ext in image_extensions:
                # Chercher r√©cursivement
                image_files.extend(glob.glob(os.path.join('custom_dataset', '**', ext), recursive=True))
                image_files.extend(glob.glob(os.path.join('custom_dataset', '**', ext.upper()), recursive=True))
            
            print(f"üîç {len(image_files)} image(s) trouv√©e(s) dans l'archive")
            
            # Charger toutes les images
            for img_path in tqdm(image_files, desc="Chargement des images"):
                try:
                    image = Image.open(img_path)
                    
                    # Convertir en niveaux de gris
                    if image.mode != 'L':
                        image = image.convert('L')
                    
                    img_array = np.array(image)
                    
                    zip_images.append(img_array)
                    zip_filenames.append(os.path.basename(img_path))
                    
                except Exception as e:
                    print(f"‚ö†Ô∏è Erreur avec {img_path}: {e}")
            
            print(f"\nüéâ {len(zip_images)} image(s) charg√©e(s) depuis l'archive!")
            
            # Statistiques du dataset
            if len(zip_images) > 0:
                sizes = [img.shape for img in zip_images]
                print(f"üìä Tailles d'images d√©tect√©es: {set(sizes)}")
                
                # Aper√ßu d'un √©chantillon al√©atoire
                n_sample = min(6, len(zip_images))
                sample_indices = np.random.choice(len(zip_images), n_sample, replace=False)
                
                fig, axes = plt.subplots(2, 3, figsize=(15, 10))
                axes = axes.flatten()
                
                for i, idx in enumerate(sample_indices):
                    axes[i].imshow(zip_images[idx], cmap='gray')
                    axes[i].set_title(f'{zip_filenames[idx][:20]}...\n{zip_images[idx].shape}')
                    axes[i].axis('off')
                
                plt.suptitle(f'√âchantillon du Dataset (6/{len(zip_images)} images)')
                plt.tight_layout()
                plt.show()
            
            # Nettoyer le fichier temporaire
            os.remove('temp_dataset.zip')
            
        except Exception as e:
            print(f"‚ùå Erreur lors de l'extraction: {e}")
    
    else:
        print("‚ùå Le fichier charg√© n'est pas une archive ZIP")

else:
    print("‚ÑπÔ∏è Aucune archive charg√©e / No archive uploaded")

### M√©thode 3 : Dataset d'Exemple / Sample Dataset
Pour tester sans vos propres donn√©es

In [None]:
# Cr√©ation d'un dataset d'exemple pour test / Create sample dataset for testing
print("üß™ M√âTHODE 3: DATASET D'EXEMPLE POUR TEST")
print("üß™ METHOD 3: SAMPLE DATASET FOR TESTING")
print("-" * 60)
print("üí° Utilise cette option si vous n'avez pas encore vos propres images")
print("üí° Use this option if you don't have your own images yet")
print("")

def create_sample_dataset(n_images=8):
    """
    Cr√©er un dataset d'exemple avec diff√©rents types de pathologies simul√©es
    Create sample dataset with different simulated pathology types
    """
    sample_images = []
    sample_filenames = []
    sample_descriptions = []
    
    # Types de cas simul√©s
    case_types = [
        ("normal", "Radiographie normale"),
        ("cardiomegaly", "Cardiom√©galie simul√©e"),
        ("pneumonia", "Pneumonie simul√©e"),
        ("nodule", "Nodule pulmonaire simul√©"),
        ("pneumothorax", "Pneumothorax simul√©"),
        ("infiltration", "Infiltration pulmonaire simul√©e"),
        ("atelectasis", "At√©lectasie simul√©e"),
        ("edema", "≈íd√®me pulmonaire simul√©")
    ]
    
    np.random.seed(42)  # Pour la reproductibilit√©
    
    for i in range(n_images):
        case_type, description = case_types[i % len(case_types)]
        
        # Image de base (thorax normal)
        img = np.random.rand(224, 224) * 0.3 + 0.4
        
        # Structures anatomiques de base
        # Poumons
        img[50:180, 30:100] *= 0.7   # Poumon gauche
        img[50:180, 124:194] *= 0.7  # Poumon droit
        
        # C≈ìur
        img[120:180, 90:134] *= 1.2
        
        # Ajouter pathologies sp√©cifiques
        if case_type == "cardiomegaly":
            # C≈ìur √©largi
            img[110:190, 80:144] *= 1.4
            
        elif case_type == "pneumonia":
            # Consolidation dans poumon droit
            img[80:140, 140:180] *= 1.6
            
        elif case_type == "nodule":
            # Nodule rond dans poumon gauche
            center_y, center_x = 100, 70
            y, x = np.ogrid[:224, :224]
            mask = (x - center_x)**2 + (y - center_y)**2 <= 8**2
            img[mask] *= 2.0
            
        elif case_type == "pneumothorax":
            # Zone hyperlucide (air) dans poumon droit
            img[60:120, 150:190] *= 0.3
            
        elif case_type == "infiltration":
            # Infiltrats diffus
            img[70:150, 40:90] *= 1.3    # Poumon gauche
            img[90:160, 130:170] *= 1.2  # Poumon droit
            
        elif case_type == "atelectasis":
            # Collapsus partiel poumon gauche
            img[80:130, 35:85] *= 1.7
            
        elif case_type == "edema":
            # ≈íd√®me diffus bilat√©ral
            img[60:170, 35:95] *= 1.4    # Gauche
            img[60:170, 129:189] *= 1.4  # Droit
            img[100:170, 85:139] *= 1.3  # Base cardiaque
        
        # Normaliser l'image
        img = np.clip(img, 0, 1)
        
        # Appliquer un l√©ger flou pour un aspect plus r√©aliste
        img = cv2.GaussianBlur(img, (3, 3), 0)
        
        sample_images.append(img)
        sample_filenames.append(f"sample_{i+1:02d}_{case_type}.png")
        sample_descriptions.append(f"Cas {i+1}: {description}")
    
    return sample_images, sample_filenames, sample_descriptions

# Cr√©er le dataset d'exemple
choice = input("Voulez-vous cr√©er un dataset d'exemple ? (y/n): ").lower()

sample_images = []
sample_filenames = []
sample_descriptions = []

if choice in ['y', 'yes', 'oui', 'o']:
    print("\nüîÑ Cr√©ation du dataset d'exemple...")
    sample_images, sample_filenames, sample_descriptions = create_sample_dataset(8)
    
    print(f"‚úÖ {len(sample_images)} images d'exemple cr√©√©es!")
    
    # Affichage du dataset d'exemple
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    axes = axes.flatten()
    
    for i, (img, filename, desc) in enumerate(zip(sample_images, sample_filenames, sample_descriptions)):
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'{filename}\n{desc}', fontsize=10)
        axes[i].axis('off')
    
    plt.suptitle('Dataset d\'Exemple Cr√©√© / Sample Dataset Created', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
else:
    print("‚ÑπÔ∏è Dataset d'exemple non cr√©√© / Sample dataset not created")

## üìã Consolidation du Dataset / Dataset Consolidation

Regroupons toutes les images charg√©es en un seul dataset :

In [None]:
# Consolidation de toutes les images en un dataset unifi√©
print("üìã CONSOLIDATION DU DATASET")
print("üìã DATASET CONSOLIDATION")
print("=" * 50)

# Combiner toutes les sources d'images
all_images = []
all_filenames = []
all_sources = []

# Ajouter images individuelles
if 'individual_images' in locals() and individual_images:
    all_images.extend(individual_images)
    all_filenames.extend(individual_filenames)
    all_sources.extend(['Individual'] * len(individual_images))
    print(f"‚ûï {len(individual_images)} image(s) individuelles ajout√©es")

# Ajouter images du ZIP
if 'zip_images' in locals() and zip_images:
    all_images.extend(zip_images)
    all_filenames.extend(zip_filenames)
    all_sources.extend(['ZIP Archive'] * len(zip_images))
    print(f"‚ûï {len(zip_images)} image(s) du ZIP ajout√©es")

# Ajouter images d'exemple
if 'sample_images' in locals() and sample_images:
    all_images.extend(sample_images)
    all_filenames.extend(sample_filenames)
    all_sources.extend(['Sample'] * len(sample_images))
    print(f"‚ûï {len(sample_images)} image(s) d'exemple ajout√©es")

# V√©rifier si nous avons des images
if not all_images:
    print("‚ùå Aucune image disponible pour l'analyse!")
    print("üí° Veuillez charger des images en utilisant l'une des m√©thodes ci-dessus")
else:
    print(f"\nüéâ DATASET CONSOLID√â: {len(all_images)} image(s) au total")
    
    # Cr√©er un DataFrame pour organiser les m√©tadonn√©es
    dataset_info = pd.DataFrame({
        'ID': range(len(all_images)),
        'Filename': all_filenames,
        'Source': all_sources,
        'Shape': [img.shape for img in all_images],
        'Size_MB': [img.nbytes / 1024 / 1024 for img in all_images],
        'Min_Pixel': [img.min() for img in all_images],
        'Max_Pixel': [img.max() for img in all_images]
    })
    
    print("\nüìä INFORMATIONS DU DATASET:")
    print(dataset_info.to_string(index=False))
    
    # Statistiques g√©n√©rales
    print(f"\nüìà STATISTIQUES:")
    print(f"   ‚Ä¢ Nombre total d'images: {len(all_images)}")
    print(f"   ‚Ä¢ Tailles d'images uniques: {len(set([img.shape for img in all_images]))}")
    print(f"   ‚Ä¢ Taille totale du dataset: {sum(dataset_info['Size_MB']):.2f} MB")
    print(f"   ‚Ä¢ Sources: {', '.join(set(all_sources))}")
    
    # Affichage d'un r√©sum√© visuel
    if len(all_images) > 0:
        # Graphique des sources
        source_counts = pd.Series(all_sources).value_counts()
        
        fig, axes = plt.subplots(1, 2, figsize=(15, 5))
        
        # Graphique en secteurs des sources
        axes[0].pie(source_counts.values, labels=source_counts.index, autopct='%1.1f%%')
        axes[0].set_title('R√©partition par Source\nSource Distribution')
        
        # Histogramme des tailles d'images
        image_sizes = [img.shape[0] * img.shape[1] for img in all_images]
        axes[1].hist(image_sizes, bins=min(10, len(set(image_sizes))), alpha=0.7, edgecolor='black')
        axes[1].set_xlabel('Nombre de Pixels')
        axes[1].set_ylabel('Nombre d\'Images')
        axes[1].set_title('Distribution des Tailles\nSize Distribution')
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    print("\n‚úÖ Dataset consolid√© et pr√™t pour l'analyse!")
    print("‚úÖ Dataset consolidated and ready for analysis!")

## üîß Pr√©paration en Lot / Batch Preprocessing

Pr√©parons toutes les images pour l'analyse avec TorchXRayVision :

In [None]:
def preprocess_dataset_batch(images, target_size=(224, 224)):
    """
    Pr√©paration en lot de toutes les images du dataset
    Batch preprocessing of all images in the dataset
    """
    print("üîß PR√âPARATION EN LOT DU DATASET")
    print("üîß BATCH PREPROCESSING OF DATASET")
    print("=" * 50)
    
    processed_images = []
    processed_tensors = []
    preprocessing_info = []
    
    print(f"üìè Redimensionnement vers: {target_size}")
    print(f"üéØ Nombre d'images √† traiter: {len(images)}")
    print("")
    
    for i, img in enumerate(tqdm(images, desc="Pr√©paration des images")):
        try:
            # Copie de l'image originale
            img_work = img.copy()
            
            # Informations originales
            original_shape = img_work.shape
            original_dtype = img_work.dtype
            original_range = (img_work.min(), img_work.max())
            
            # Redimensionnement si n√©cessaire
            if img_work.shape != target_size:
                img_work = cv2.resize(img_work, target_size)
            
            # Normalisation des pixels entre 0 et 1
            if img_work.max() > 1:
                img_work = img_work.astype(np.float32) / 255.0
            else:
                img_work = img_work.astype(np.float32)
            
            # Normalisation Z-score pour TorchXRayVision
            mean_val = np.mean(img_work)
            std_val = np.std(img_work)
            
            if std_val > 0:
                img_normalized = (img_work - mean_val) / std_val
            else:
                img_normalized = img_work - mean_val
            
            # Conversion en tensor PyTorch
            img_tensor = torch.FloatTensor(img_normalized)
            img_tensor = img_tensor.unsqueeze(0).unsqueeze(0)  # [1, 1, H, W]
            img_tensor = img_tensor.to(device)
            
            # Stocker les r√©sultats
            processed_images.append(img_work)
            processed_tensors.append(img_tensor)
            
            # Informations de pr√©paration
            prep_info = {
                'original_shape': original_shape,
                'original_dtype': str(original_dtype),
                'original_range': original_range,
                'processed_shape': img_work.shape,
                'processed_range': (img_work.min(), img_work.max()),
                'normalized_range': (img_normalized.min(), img_normalized.max()),
                'mean': mean_val,
                'std': std_val,
                'tensor_shape': tuple(img_tensor.shape)
            }
            preprocessing_info.append(prep_info)
            
        except Exception as e:
            print(f"‚ùå Erreur avec l'image {i}: {e}")
            continue
    
    print(f"\n‚úÖ {len(processed_images)}/{len(images)} images pr√©par√©es avec succ√®s")
    
    # Statistiques de pr√©paration
    if preprocessing_info:
        prep_df = pd.DataFrame(preprocessing_info)
        
        print(f"\nüìä STATISTIQUES DE PR√âPARATION:")
        print(f"   ‚Ä¢ Forme finale: {target_size}")
        print(f"   ‚Ä¢ Range normalis√© moyen: [{prep_df['normalized_range'].apply(lambda x: x[0]).mean():.3f}, {prep_df['normalized_range'].apply(lambda x: x[1]).mean():.3f}]")
        print(f"   ‚Ä¢ Moyenne des moyennes: {prep_df['mean'].mean():.3f}")
        print(f"   ‚Ä¢ Moyenne des √©carts-types: {prep_df['std'].mean():.3f}")
        
        # Graphique des statistiques de normalisation
        fig, axes = plt.subplots(1, 3, figsize=(18, 5))
        
        # Distribution des moyennes
        axes[0].hist(prep_df['mean'], bins=15, alpha=0.7, color='skyblue', edgecolor='black')
        axes[0].set_xlabel('Moyenne des Pixels')
        axes[0].set_ylabel('Nombre d\'Images')
        axes[0].set_title('Distribution des Moyennes\nper Image')
        axes[0].grid(True, alpha=0.3)
        
        # Distribution des √©carts-types
        axes[1].hist(prep_df['std'], bins=15, alpha=0.7, color='lightcoral', edgecolor='black')
        axes[1].set_xlabel('√âcart-type des Pixels')
        axes[1].set_ylabel('Nombre d\'Images')
        axes[1].set_title('Distribution des √âcarts-types\nper Image')
        axes[1].grid(True, alpha=0.3)
        
        # Corr√©lation moyenne vs √©cart-type
        axes[2].scatter(prep_df['mean'], prep_df['std'], alpha=0.7, s=60)
        axes[2].set_xlabel('Moyenne des Pixels')
        axes[2].set_ylabel('√âcart-type des Pixels')
        axes[2].set_title('Corr√©lation Moyenne vs\n√âcart-type')
        axes[2].grid(True, alpha=0.3)
        
        plt.suptitle('Statistiques de Normalisation du Dataset', fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
    
    return processed_images, processed_tensors, preprocessing_info

# Appliquer le pr√©processing si nous avons des images
if 'all_images' in locals() and all_images:
    processed_images, processed_tensors, prep_info = preprocess_dataset_batch(all_images)
    
    print("\nüéØ Dataset pr√™t pour l'analyse avec TorchXRayVision!")
    print("üéØ Dataset ready for TorchXRayVision analysis!")
else:
    print("‚ùå Aucune image √† pr√©parer")

## üîç Analyse en Lot / Batch Analysis

Analysons maintenant toutes les images avec les mod√®les TorchXRayVision :

In [None]:
def analyze_dataset_batch(processed_tensors, filenames, models_dict, threshold=0.3):
    """
    Analyse en lot de tout le dataset avec tous les mod√®les disponibles
    Batch analysis of entire dataset with all available models
    """
    print("üîç ANALYSE EN LOT DU DATASET")
    print("üîç BATCH DATASET ANALYSIS")
    print("=" * 60)
    
    print(f"üìä Images √† analyser: {len(processed_tensors)}")
    print(f"ü§ñ Mod√®les disponibles: {list(models_dict.keys())}")
    print(f"üéØ Seuil de d√©tection: {threshold}")
    print("")
    
    # Stocker tous les r√©sultats
    all_results = {}
    analysis_summary = []
    
    # Analyser avec chaque mod√®le
    for model_name, model in models_dict.items():
        print(f"üîÑ Analyse avec le mod√®le {model_name}...")
        
        model_results = []
        pathologies = model.pathologies
        
        # Analyser chaque image
        for i, (tensor, filename) in enumerate(tqdm(zip(processed_tensors, filenames), 
                                                   desc=f"Analyse {model_name}",
                                                   total=len(processed_tensors))):
            try:
                with torch.no_grad():
                    outputs = model(tensor)
                    probabilities = torch.sigmoid(outputs).cpu().numpy().flatten()
                
                # Cr√©er le r√©sultat pour cette image
                image_result = {
                    'image_id': i,
                    'filename': filename,
                    'model': model_name
                }
                
                # Ajouter les probabilit√©s pour chaque pathologie
                for pathology, prob in zip(pathologies, probabilities):
                    image_result[pathology] = prob
                    image_result[f'{pathology}_detected'] = prob > threshold
                
                # Compter les d√©tections totales
                image_result['total_detections'] = sum(prob > threshold for prob in probabilities)
                image_result['max_probability'] = max(probabilities)
                image_result['avg_probability'] = np.mean(probabilities)
                
                model_results.append(image_result)
                
            except Exception as e:
                print(f"‚ùå Erreur avec {filename} sur {model_name}: {e}")
                continue
        
        all_results[model_name] = model_results
        
        # R√©sum√© pour ce mod√®le
        if model_results:
            total_detections = sum(result['total_detections'] for result in model_results)
            avg_detections_per_image = total_detections / len(model_results)
            
            summary = {
                'model': model_name,
                'images_analyzed': len(model_results),
                'total_detections': total_detections,
                'avg_detections_per_image': avg_detections_per_image,
                'max_detections_single_image': max(result['total_detections'] for result in model_results),
                'images_with_detections': sum(1 for result in model_results if result['total_detections'] > 0)
            }
            analysis_summary.append(summary)
    
    print(f"\n‚úÖ Analyse termin√©e pour {len(processed_tensors)} images")
    
    # Affichage du r√©sum√©
    if analysis_summary:
        summary_df = pd.DataFrame(analysis_summary)
        
        print("\nüìä R√âSUM√â DE L'ANALYSE:")
        print(summary_df.to_string(index=False))
        
        # Graphiques de r√©sum√©
        fig, axes = plt.subplots(2, 2, figsize=(16, 10))
        
        # 1. D√©tections totales par mod√®le
        axes[0, 0].bar(summary_df['model'], summary_df['total_detections'], 
                      color=['skyblue', 'lightcoral', 'lightgreen'][:len(summary_df)])
        axes[0, 0].set_title('D√©tections Totales par Mod√®le')
        axes[0, 0].set_ylabel('Nombre de D√©tections')
        axes[0, 0].tick_params(axis='x', rotation=45)
        axes[0, 0].grid(True, alpha=0.3)
        
        # 2. Moyenne de d√©tections par image
        axes[0, 1].bar(summary_df['model'], summary_df['avg_detections_per_image'],
                      color=['orange', 'purple', 'brown'][:len(summary_df)])
        axes[0, 1].set_title('Moyenne D√©tections/Image')
        axes[0, 1].set_ylabel('D√©tections par Image')
        axes[0, 1].tick_params(axis='x', rotation=45)
        axes[0, 1].grid(True, alpha=0.3)
        
        # 3. Images avec vs sans d√©tections
        models_names = summary_df['model'].tolist()
        with_detections = summary_df['images_with_detections'].tolist()
        without_detections = [summary_df['images_analyzed'].iloc[i] - with_detections[i] 
                            for i in range(len(with_detections))]
        
        x = np.arange(len(models_names))
        width = 0.35
        
        axes[1, 0].bar(x, with_detections, width, label='Avec D√©tections', color='tomato')
        axes[1, 0].bar(x, without_detections, width, bottom=with_detections, 
                      label='Sans D√©tections', color='lightblue')
        axes[1, 0].set_title('Images avec/sans D√©tections')
        axes[1, 0].set_ylabel('Nombre d\'Images')
        axes[1, 0].set_xticks(x)
        axes[1, 0].set_xticklabels(models_names, rotation=45)
        axes[1, 0].legend()
        
        # 4. Maximum de d√©tections sur une seule image
        axes[1, 1].bar(summary_df['model'], summary_df['max_detections_single_image'],
                      color=['gold', 'silver', 'bronze'][:len(summary_df)])
        axes[1, 1].set_title('Maximum D√©tections\n(Une Seule Image)')
        axes[1, 1].set_ylabel('Nombre de D√©tections')
        axes[1, 1].tick_params(axis='x', rotation=45)
        axes[1, 1].grid(True, alpha=0.3)
        
        plt.suptitle('R√âSUM√â DE L\'ANALYSE DU DATASET', fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()
    
    return all_results, analysis_summary

# Effectuer l'analyse si nous avons des tenseurs pr√©par√©s
if 'processed_tensors' in locals() and processed_tensors:
    batch_results, batch_summary = analyze_dataset_batch(processed_tensors, all_filenames, models)
    
    print("\nüéâ Analyse en lot termin√©e avec succ√®s!")
    print("üéâ Batch analysis completed successfully!")
else:
    print("‚ùå Aucun tensor pr√©par√© pour l'analyse")

## üìä Analyse D√©taill√©e des R√©sultats / Detailed Results Analysis

In [None]:
def create_detailed_analysis(batch_results, filenames, threshold=0.3):
    """
    Analyse d√©taill√©e avec visualisations pour l'ensemble du dataset
    Detailed analysis with visualizations for the entire dataset
    """
    print("üìä ANALYSE D√âTAILL√âE DES R√âSULTATS")
    print("üìä DETAILED RESULTS ANALYSIS")
    print("=" * 60)
    
    if not batch_results:
        print("‚ùå Aucun r√©sultat disponible pour l'analyse")
        return
    
    # Prendre le premier mod√®le pour l'analyse principale
    main_model = list(batch_results.keys())[0]
    main_results = batch_results[main_model]
    
    print(f"üéØ Analyse principale bas√©e sur le mod√®le: {main_model}")
    print(f"üìà Nombre d'images analys√©es: {len(main_results)}")
    
    # 1. Analyse des pathologies les plus fr√©quentes
    pathologies = [col for col in main_results[0].keys() 
                  if col not in ['image_id', 'filename', 'model', 'total_detections', 
                               'max_probability', 'avg_probability'] 
                  and not col.endswith('_detected')]
    
    # Compter les d√©tections par pathologie
    pathology_counts = {}
    pathology_avg_probs = {}
    
    for pathology in pathologies:
        detections = sum(1 for result in main_results if result.get(f'{pathology}_detected', False))
        avg_prob = np.mean([result.get(pathology, 0) for result in main_results])
        
        pathology_counts[pathology] = detections
        pathology_avg_probs[pathology] = avg_prob
    
    # Trier par fr√©quence
    sorted_pathologies = sorted(pathology_counts.items(), key=lambda x: x[1], reverse=True)
    
    print(f"\nüè• TOP 10 PATHOLOGIES D√âTECT√âES:")
    for i, (pathology, count) in enumerate(sorted_pathologies[:10]):
        percentage = (count / len(main_results)) * 100
        avg_prob = pathology_avg_probs[pathology]
        print(f"   {i+1:2d}. {pathology:25s}: {count:3d}/{len(main_results)} ({percentage:5.1f}%) - Prob moy: {avg_prob:.3f}")
    
    # 2. Analyse des images les plus probl√©matiques
    print(f"\nüö® IMAGES AVEC LE PLUS DE PATHOLOGIES:")
    sorted_images = sorted(main_results, key=lambda x: x['total_detections'], reverse=True)
    
    for i, result in enumerate(sorted_images[:5]):
        detected_paths = [path for path in pathologies 
                         if result.get(f'{path}_detected', False)]
        print(f"   {i+1}. {result['filename']:30s}: {result['total_detections']} d√©tections")
        if detected_paths:
            print(f"      ‚Üí {', '.join(detected_paths[:3])}{'...' if len(detected_paths) > 3 else ''}")
    
    # 3. Visualisations compl√®tes
    create_comprehensive_visualizations(batch_results, pathologies, filenames, threshold)
    
    # 4. Comparaison inter-mod√®les si disponible
    if len(batch_results) > 1:
        create_model_comparison_analysis(batch_results, pathologies)
    
    # 5. Matrice de corr√©lation des pathologies
    create_pathology_correlation_matrix(main_results, pathologies)
    
    return sorted_pathologies, sorted_images

def create_comprehensive_visualizations(batch_results, pathologies, filenames, threshold):
    """
    Cr√©er des visualisations compl√®tes pour l'analyse du dataset
    Create comprehensive visualizations for dataset analysis
    """
    main_model = list(batch_results.keys())[0]
    results = batch_results[main_model]
    
    fig = plt.figure(figsize=(20, 16))
    gs = fig.add_gridspec(4, 3, height_ratios=[1, 1, 1, 1], hspace=0.3, wspace=0.3)
    
    # 1. Heatmap des d√©tections par image et pathologie
    ax1 = fig.add_subplot(gs[0, :])
    
    # Cr√©er matrice de d√©tection
    top_pathologies = sorted(pathologies, 
                           key=lambda p: sum(1 for r in results if r.get(f'{p}_detected', False)), 
                           reverse=True)[:15]  # Top 15 pour la lisibilit√©
    
    detection_matrix = []
    image_labels = []
    
    for result in results:
        row = [1 if result.get(f'{path}_detected', False) else 0 for path in top_pathologies]
        detection_matrix.append(row)
        image_labels.append(result['filename'][:15] + '...' if len(result['filename']) > 15 else result['filename'])
    
    detection_matrix = np.array(detection_matrix).T  # Transpose pour avoir pathologies en lignes
    
    sns.heatmap(detection_matrix, 
                xticklabels=image_labels,
                yticklabels=[p[:15] for p in top_pathologies],
                cmap='Reds', cbar_kws={'label': 'D√©tection (0=Non, 1=Oui)'},
                ax=ax1)
    ax1.set_title(f'Matrice de D√©tection - {len(results)} Images vs Top 15 Pathologies', 
                 fontsize=14, fontweight='bold')
    ax1.set_xlabel('Images du Dataset')
    ax1.set_ylabel('Pathologies')
    
    # Rotation des labels pour lisibilit√©
    plt.setp(ax1.get_xticklabels(), rotation=45, ha='right')
    
    # 2. Distribution des d√©tections par image
    ax2 = fig.add_subplot(gs[1, 0])
    detections_per_image = [result['total_detections'] for result in results]
    ax2.hist(detections_per_image, bins=range(max(detections_per_image)+2), 
            alpha=0.7, color='skyblue', edgecolor='black')
    ax2.set_xlabel('Nombre de D√©tections')
    ax2.set_ylabel('Nombre d\'Images')
    ax2.set_title('Distribution des\nD√©tections par Image')
    ax2.grid(True, alpha=0.3)
    
    # 3. Top pathologies (barres)
    ax3 = fig.add_subplot(gs[1, 1])
    top_10_paths = top_pathologies[:10]
    counts = [sum(1 for r in results if r.get(f'{path}_detected', False)) for path in top_10_paths]
    
    bars = ax3.barh(range(len(top_10_paths)), counts, color='lightcoral', alpha=0.7)
    ax3.set_yticks(range(len(top_10_paths)))
    ax3.set_yticklabels([p[:12] for p in top_10_paths])
    ax3.set_xlabel('Nombre de D√©tections')
    ax3.set_title('Top 10 Pathologies\nles Plus Fr√©quentes')
    ax3.grid(True, alpha=0.3)
    
    # Ajouter les valeurs sur les barres
    for i, (bar, count) in enumerate(zip(bars, counts)):
        width = bar.get_width()
        ax3.text(width + 0.1, bar.get_y() + bar.get_height()/2, 
                f'{count}', ha='left', va='center', fontweight='bold')
    
    # 4. Distribution des probabilit√©s moyennes
    ax4 = fig.add_subplot(gs[1, 2])
    avg_probs = [result['avg_probability'] for result in results]
    ax4.hist(avg_probs, bins=20, alpha=0.7, color='lightgreen', edgecolor='black')
    ax4.axvline(x=threshold, color='red', linestyle='--', linewidth=2, label=f'Seuil ({threshold})')
    ax4.set_xlabel('Probabilit√© Moyenne')
    ax4.set_ylabel('Nombre d\'Images')
    ax4.set_title('Distribution des\nProbabilit√©s Moyennes')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    # 5. Corr√©lation entre d√©tections et probabilit√©s max
    ax5 = fig.add_subplot(gs[2, 0])
    max_probs = [result['max_probability'] for result in results]
    ax5.scatter(detections_per_image, max_probs, alpha=0.6, s=50)
    ax5.set_xlabel('Nombre de D√©tections')
    ax5.set_ylabel('Probabilit√© Maximale')
    ax5.set_title('D√©tections vs\nProbabilit√© Max')
    ax5.grid(True, alpha=0.3)
    
    # 6. Images par cat√©gorie de s√©v√©rit√©
    ax6 = fig.add_subplot(gs[2, 1])
    
    # Cat√©goriser les images par s√©v√©rit√©
    normal_images = sum(1 for r in results if r['total_detections'] == 0)
    mild_images = sum(1 for r in results if 1 <= r['total_detections'] <= 2)
    moderate_images = sum(1 for r in results if 3 <= r['total_detections'] <= 5)
    severe_images = sum(1 for r in results if r['total_detections'] > 5)
    
    categories = ['Normal\n(0)', 'L√©ger\n(1-2)', 'Mod√©r√©\n(3-5)', 'S√©v√®re\n(6+)']
    counts_sev = [normal_images, mild_images, moderate_images, severe_images]
    colors_sev = ['green', 'yellow', 'orange', 'red']
    
    wedges, texts, autotexts = ax6.pie(counts_sev, labels=categories, colors=colors_sev,
                                      autopct='%1.1f%%', startangle=90)
    ax6.set_title('R√©partition par\nNiveau de S√©v√©rit√©')
    
    # 7. Timeline/ordre des images
    ax7 = fig.add_subplot(gs[2, 2])
    image_indices = range(len(results))
    ax7.plot(image_indices, detections_per_image, 'bo-', alpha=0.7, markersize=4)
    ax7.set_xlabel('Index de l\'Image')
    ax7.set_ylabel('Nombre de D√©tections')
    ax7.set_title('√âvolution des D√©tections\nSelon l\'Ordre')
    ax7.grid(True, alpha=0.3)
    
    # 8. Comparaison probabilit√©s vs d√©tections (scatter large)
    ax8 = fig.add_subplot(gs[3, :])
    
    # Pour chaque pathologie, afficher prob moyenne vs fr√©quence de d√©tection
    path_freq = []
    path_avg_prob = []
    path_names = []
    
    for pathology in top_pathologies[:20]:  # Top 20
        freq = sum(1 for r in results if r.get(f'{pathology}_detected', False))
        avg_prob = np.mean([r.get(pathology, 0) for r in results])
        
        path_freq.append(freq)
        path_avg_prob.append(avg_prob)
        path_names.append(pathology)
    
    scatter = ax8.scatter(path_freq, path_avg_prob, s=100, alpha=0.7, c=range(len(path_freq)), cmap='viridis')
    
    # Ajouter les noms des pathologies
    for i, (freq, prob, name) in enumerate(zip(path_freq, path_avg_prob, path_names)):
        ax8.annotate(name[:10], (freq, prob), xytext=(5, 5), textcoords='offset points', 
                    fontsize=8, alpha=0.8)
    
    ax8.set_xlabel('Fr√©quence de D√©tection (nombre d\'images)')
    ax8.set_ylabel('Probabilit√© Moyenne')
    ax8.set_title('Analyse Pathologie: Fr√©quence vs Probabilit√© Moyenne', fontweight='bold')
    ax8.grid(True, alpha=0.3)
    
    plt.suptitle(f'ANALYSE COMPL√àTE DU DATASET - {len(results)} Images', 
                fontsize=18, fontweight='bold')
    plt.show()

def create_pathology_correlation_matrix(results, pathologies):
    """
    Cr√©er une matrice de corr√©lation entre pathologies
    Create correlation matrix between pathologies
    """
    print("\nüîó ANALYSE DES CORR√âLATIONS ENTRE PATHOLOGIES")
    
    # Cr√©er matrice de probabilit√©s
    prob_matrix = []
    for pathology in pathologies:
        probs = [result.get(pathology, 0) for result in results]
        prob_matrix.append(probs)
    
    prob_matrix = np.array(prob_matrix)
    correlation_matrix = np.corrcoef(prob_matrix)
    
    # Visualiser seulement les corr√©lations les plus significatives
    plt.figure(figsize=(12, 10))
    
    # Masquer la diagonale et les corr√©lations faibles
    mask = np.zeros_like(correlation_matrix, dtype=bool)
    mask[np.triu_indices_from(mask)] = True
    mask[np.abs(correlation_matrix) < 0.3] = True
    
    sns.heatmap(correlation_matrix, 
                mask=mask,
                annot=True, 
                fmt='.2f', 
                cmap='RdBu_r',
                vmin=-1, vmax=1,
                xticklabels=[p[:12] for p in pathologies],
                yticklabels=[p[:12] for p in pathologies],
                cbar_kws={'label': 'Coefficient de Corr√©lation'})
    
    plt.title('Matrice de Corr√©lation entre Pathologies\n(Seulement corr√©lations |r| > 0.3)', 
             fontsize=14, fontweight='bold')
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    plt.tight_layout()
    plt.show()
    
    # Trouver les corr√©lations les plus fortes
    strong_correlations = []
    for i in range(len(pathologies)):
        for j in range(i+1, len(pathologies)):
            corr = correlation_matrix[i, j]
            if abs(corr) > 0.5:  # Corr√©lations fortes
                strong_correlations.append((pathologies[i], pathologies[j], corr))
    
    if strong_correlations:
        print("\nüîó CORR√âLATIONS FORTES D√âTECT√âES (|r| > 0.5):")
        strong_correlations.sort(key=lambda x: abs(x[2]), reverse=True)
        for path1, path2, corr in strong_correlations[:10]:
            direction = "positive" if corr > 0 else "n√©gative"
            print(f"   ‚Ä¢ {path1} ‚Üî {path2}: r = {corr:.3f} ({direction})")

# Effectuer l'analyse d√©taill√©e si nous avons des r√©sultats
if 'batch_results' in locals() and batch_results:
    top_pathologies, problematic_images = create_detailed_analysis(batch_results, all_filenames)
else:
    print("‚ùå Aucun r√©sultat disponible pour l'analyse d√©taill√©e")

## üíæ Export des R√©sultats / Export Results

Sauvegardons et exportons tous les r√©sultats pour usage ult√©rieur :

In [None]:
def export_dataset_results(batch_results, filenames, processed_images, analysis_summary):
    """
    Exporter tous les r√©sultats dans diff√©rents formats
    Export all results in different formats
    """
    print("üíæ EXPORT DES R√âSULTATS")
    print("üíæ EXPORTING RESULTS")
    print("=" * 50)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # 1. Export en CSV (pour Excel, analyse statistique)
    print("üìä Export CSV...")
    
    for model_name, results in batch_results.items():
        df = pd.DataFrame(results)
        csv_filename = f'results/dataset_analysis_{model_name}_{timestamp}.csv'
        df.to_csv(csv_filename, index=False)
        print(f"   ‚úÖ {csv_filename}")
    
    # 2. Export du r√©sum√© ex√©cutif
    print("üìã Export r√©sum√© ex√©cutif...")
    
    summary_df = pd.DataFrame(analysis_summary)
    summary_filename = f'results/dataset_summary_{timestamp}.csv'
    summary_df.to_csv(summary_filename, index=False)
    print(f"   ‚úÖ {summary_filename}")
    
    # 3. Export JSON complet (pour int√©gration avec d'autres outils)
    print("üóÇÔ∏è Export JSON...")
    
    json_data = {
        'metadata': {
            'timestamp': timestamp,
            'total_images': len(filenames),
            'models_used': list(batch_results.keys()),
            'analysis_date': datetime.now().isoformat()
        },
        'results': batch_results,
        'summary': analysis_summary,
        'filenames': filenames
    }
    
    json_filename = f'results/complete_analysis_{timestamp}.json'
    with open(json_filename, 'w', encoding='utf-8') as f:
        json.dump(json_data, f, indent=2, ensure_ascii=False, default=str)
    print(f"   ‚úÖ {json_filename}")
    
    # 4. Rapport textuel d√©taill√©
    print("üìù G√©n√©ration rapport textuel...")
    
    report_filename = f'results/detailed_report_{timestamp}.txt'
    generate_detailed_text_report(batch_results, analysis_summary, report_filename, timestamp)
    print(f"   ‚úÖ {report_filename}")
    
    # 5. Package de t√©l√©chargement
    print("üì¶ Cr√©ation package de t√©l√©chargement...")
    
    # Zipper tous les r√©sultats
    zip_filename = f'dataset_analysis_complete_{timestamp}.zip'
    with zipfile.ZipFile(zip_filename, 'w') as zipf:
        # Ajouter tous les fichiers du dossier results
        for filename in glob.glob('results/*'):
            if timestamp in filename:
                zipf.write(filename, os.path.basename(filename))
    
    print(f"   ‚úÖ {zip_filename}")
    
    # 6. Interface de t√©l√©chargement Colab
    print("\nüì§ T√âL√âCHARGEMENT DES R√âSULTATS:")
    print("Choisissez quels fichiers t√©l√©charger:")
    
    download_options = {
        '1': ('Package complet (ZIP)', zip_filename),
        '2': ('R√©sum√© ex√©cutif (CSV)', summary_filename),
        '3': ('Rapport d√©taill√© (TXT)', report_filename),
        '4': ('Donn√©es compl√®tes (JSON)', json_filename)
    }
    
    for key, (desc, filename) in download_options.items():
        print(f"   {key}. {desc}")
    
    # Interface de s√©lection
    choice = input("\nEntrez le num√©ro de votre choix (ou 'all' pour tout t√©l√©charger): ")
    
    if choice.lower() == 'all':
        for desc, filename in download_options.values():
            try:
                files.download(filename)
                print(f"‚úÖ T√©l√©charg√©: {filename}")
            except Exception as e:
                print(f"‚ùå Erreur t√©l√©chargement {filename}: {e}")
    elif choice in download_options:
        desc, filename = download_options[choice]
        try:
            files.download(filename)
            print(f"‚úÖ T√©l√©charg√©: {filename}")
        except Exception as e:
            print(f"‚ùå Erreur t√©l√©chargement: {e}")
    else:
        print("‚ÑπÔ∏è Aucun t√©l√©chargement s√©lectionn√©")
    
    return {
        'zip_file': zip_filename,
        'summary_file': summary_filename,
        'report_file': report_filename,
        'json_file': json_filename
    }

def generate_detailed_text_report(batch_results, analysis_summary, filename, timestamp):
    """
    G√©n√©rer un rapport textuel d√©taill√©
    Generate detailed text report
    """
    with open(filename, 'w', encoding='utf-8') as f:
        f.write("=" * 80 + "\n")
        f.write("RAPPORT D'ANALYSE DE DATASET PERSONNALIS√â TORCHXRAYVISION\n")
        f.write("CUSTOM DATASET ANALYSIS REPORT - TORCHXRAYVISION\n")
        f.write("=" * 80 + "\n")
        f.write(f"Date d'analyse: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Timestamp: {timestamp}\n")
        f.write("\n")
        
        # R√©sum√© ex√©cutif
        f.write("R√âSUM√â EX√âCUTIF\n")
        f.write("-" * 50 + "\n")
        
        total_images = len(batch_results[list(batch_results.keys())[0]])
        f.write(f"Nombre total d'images analys√©es: {total_images}\n")
        f.write(f"Mod√®les utilis√©s: {', '.join(batch_results.keys())}\n")
        
        for summary in analysis_summary:
            f.write(f"\nMod√®le {summary['model']}:\n")
            f.write(f"  - Images avec d√©tections: {summary['images_with_detections']}/{summary['images_analyzed']}\n")
            f.write(f"  - D√©tections moyennes par image: {summary['avg_detections_per_image']:.2f}\n")
            f.write(f"  - Maximum d√©tections (une image): {summary['max_detections_single_image']}\n")
        
        # D√©tails par mod√®le
        for model_name, results in batch_results.items():
            f.write(f"\n\n{'='*60}\n")
            f.write(f"ANALYSE D√âTAILL√âE - MOD√àLE {model_name}\n")
            f.write(f"{'='*60}\n")
            
            # Top pathologies
            pathologies = [col for col in results[0].keys() 
                          if col not in ['image_id', 'filename', 'model', 'total_detections', 
                                       'max_probability', 'avg_probability'] 
                          and not col.endswith('_detected')]
            
            pathology_counts = {}
            for pathology in pathologies:
                count = sum(1 for result in results if result.get(f'{pathology}_detected', False))
                pathology_counts[pathology] = count
            
            sorted_paths = sorted(pathology_counts.items(), key=lambda x: x[1], reverse=True)
            
            f.write("\nTOP 15 PATHOLOGIES D√âTECT√âES:\n")
            for i, (pathology, count) in enumerate(sorted_paths[:15]):
                percentage = (count / len(results)) * 100
                f.write(f"  {i+1:2d}. {pathology:30s}: {count:3d} ({percentage:5.1f}%)\n")
            
            # Images les plus probl√©matiques
            f.write("\nIMAGES LES PLUS PROBL√âMATIQUES:\n")
            sorted_images = sorted(results, key=lambda x: x['total_detections'], reverse=True)
            
            for i, result in enumerate(sorted_images[:10]):
                f.write(f"  {i+1:2d}. {result['filename']:40s}: {result['total_detections']} d√©tections\n")
        
        # Recommandations
        f.write(f"\n\n{'='*60}\n")
        f.write("RECOMMANDATIONS ET ACTIONS SUGG√âR√âES\n")
        f.write(f"{'='*60}\n")
        
        main_results = batch_results[list(batch_results.keys())[0]]
        high_detection_images = [r for r in main_results if r['total_detections'] > 5]
        
        if high_detection_images:
            f.write(f"\nATTENTION: {len(high_detection_images)} image(s) avec 6+ pathologies d√©tect√©es.\n")
            f.write("Recommandation: R√©vision clinique prioritaire.\n")
        
        normal_images = [r for r in main_results if r['total_detections'] == 0]
        f.write(f"\nImages normales: {len(normal_images)}/{len(main_results)} ({len(normal_images)/len(main_results)*100:.1f}%)\n")
        
        f.write("\n" + "=" * 80 + "\n")
        f.write("Fin du rapport\n")
        f.write("=" * 80 + "\n")

# Effectuer l'export si nous avons des r√©sultats
if 'batch_results' in locals() and batch_results:
    export_files = export_dataset_results(batch_results, all_filenames, processed_images, batch_summary)
    
    print("\nüéâ EXPORT TERMIN√â AVEC SUCC√àS!")
    print("üéâ EXPORT COMPLETED SUCCESSFULLY!")
    print("\nüìã Fichiers g√©n√©r√©s:")
    for desc, filename in export_files.items():
        print(f"   ‚Ä¢ {desc}: {filename}")
else:
    print("‚ùå Aucun r√©sultat √† exporter")

## üéâ R√©sum√© du Tutorial / Tutorial Summary

### Ce que vous avez accompli / What you accomplished:

In [None]:
# G√©n√©ration du r√©sum√© final du tutorial
print("üéâ" * 50)
print("TUTORIAL 6 TERMIN√â AVEC SUCC√àS!")
print("TUTORIAL 6 COMPLETED SUCCESSFULLY!")
print("üéâ" * 50)

print("\nüìä R√âSUM√â DE VOS ACCOMPLISSEMENTS:")
print("üìä SUMMARY OF YOUR ACHIEVEMENTS:")
print("=" * 70)

if 'all_images' in locals() and all_images:
    print(f"\n‚úÖ CHARGEMENT DE DONN√âES:")
    print(f"   ‚Ä¢ {len(all_images)} image(s) charg√©e(s) avec succ√®s")
    
    # D√©tails par source
    source_summary = {}
    if 'all_sources' in locals():
        for source in set(all_sources):
            count = all_sources.count(source)
            source_summary[source] = count
        
        for source, count in source_summary.items():
            print(f"   ‚Ä¢ {source}: {count} image(s)")

if 'processed_images' in locals() and processed_images:
    print(f"\n‚úÖ PR√âPARATION DES DONN√âES:")
    print(f"   ‚Ä¢ {len(processed_images)} image(s) pr√©par√©e(s) pour l'analyse")
    print(f"   ‚Ä¢ Redimensionnement vers 224x224 pixels")
    print(f"   ‚Ä¢ Normalisation Z-score appliqu√©e")
    print(f"   ‚Ä¢ Conversion en tenseurs PyTorch r√©ussie")

if 'models' in locals() and models:
    print(f"\n‚úÖ MOD√àLES UTILIS√âS:")
    for model_name, model_data in model_info.items():
        if model_name in models:
            print(f"   ‚Ä¢ {model_data['name']}: {len(model_data['pathologies'])} pathologies")

if 'batch_results' in locals() and batch_results:
    print(f"\n‚úÖ ANALYSE R√âALIS√âE:")
    
    main_model_key = list(batch_results.keys())[0]
    main_results = batch_results[main_model_key]
    
    total_detections = sum(result['total_detections'] for result in main_results)
    images_with_findings = sum(1 for result in main_results if result['total_detections'] > 0)
    max_detections = max(result['total_detections'] for result in main_results)
    
    print(f"   ‚Ä¢ {len(main_results)} image(s) analys√©e(s) avec IA")
    print(f"   ‚Ä¢ {total_detections} pathologies d√©tect√©es au total")
    print(f"   ‚Ä¢ {images_with_findings}/{len(main_results)} image(s) avec anomalies")
    print(f"   ‚Ä¢ Maximum {max_detections} pathologies sur une image")

if 'export_files' in locals() and export_files:
    print(f"\n‚úÖ R√âSULTATS EXPORT√âS:")
    print(f"   ‚Ä¢ Rapport CSV pour analyse statistique")
    print(f"   ‚Ä¢ Fichier JSON pour int√©gration logicielle")
    print(f"   ‚Ä¢ Rapport textuel d√©taill√©")
    print(f"   ‚Ä¢ Package ZIP complet")

print(f"\nüéì COMP√âTENCES ACQUISES:")
print(f"   üîπ Chargement facile de datasets personnalis√©s")
print(f"   üîπ Pr√©paration automatique d'images m√©dicales")
print(f"   üîπ Analyse en lot avec TorchXRayVision")
print(f"   üîπ Comparaison multi-mod√®les")
print(f"   üîπ Visualisation et interpr√©tation des r√©sultats")
print(f"   üîπ Export pour usage clinique et recherche")

print(f"\nüöÄ APPLICATIONS PRATIQUES:")
print(f"   üè• Analyse de cas cliniques personnels")
print(f"   üìä √âtudes de recherche m√©dicale")
print(f"   üìö Projets √©tudiants en m√©decine")
print(f"   üî¨ Validation d'IA sur populations locales")
print(f"   üìà Audit qualit√© radiologique")

print(f"\nüí° PROCHAINES √âTAPES SUGG√âR√âES:")
print(f"   üìñ R√©vision des tutoriels 1-5 pour approfondir")
print(f"   üîÑ Test avec vos propres donn√©es m√©dicales")
print(f"   üìä Analyse statistique avanc√©e des r√©sultats")
print(f"   ü§ù Collaboration avec radiologues pour validation")
print(f"   üìù R√©daction d'articles de recherche")

print(f"\n‚ö†Ô∏è RAPPELS IMPORTANTS:")
print(f"   üîê Respecter la confidentialit√© des donn√©es patients")
print(f"   üè• Toujours valider avec expertise m√©dicale")
print(f"   üìã L'IA assiste mais ne remplace pas le m√©decin")
print(f"   üìä Documenter m√©thodes et limitations")

print(f"\n" + "="*70)
print(f"üôè F√âLICITATIONS! Vous ma√Ætrisez maintenant l'int√©gration")
print(f"   de datasets personnalis√©s avec TorchXRayVision!")
print(f"")
print(f"üåü CONGRATULATIONS! You now master custom dataset")
print(f"   integration with TorchXRayVision!")
print(f"="*70)

# Suggestion pour continuer
print(f"\nüîÑ POUR CONTINUER VOTRE APPRENTISSAGE:")
print(f"   ‚Ä¢ Testez avec diff√©rents types d'images")
print(f"   ‚Ä¢ Comparez les mod√®les sur vos donn√©es")
print(f"   ‚Ä¢ Int√©grez dans vos workflows de recherche")
print(f"   ‚Ä¢ Partagez vos d√©couvertes avec la communaut√©")

print(f"\nüéØ Vous √™tes maintenant pr√™t(e) pour des applications")
print(f"   concr√®tes en recherche m√©dicale et pratique clinique!")