# TP2 : Perceptron et Multi-Layer Perceptron (MLP)

Ce notebook pr√©sente l'impl√©mentation d'un perceptron monocouche et d'un perceptron multicouche (MLP) pour la classification d'images MNIST et Fashion-MNIST.

## Objectifs
- Impl√©menter un perceptron simple pour la classification MNIST
- D√©velopper un MLP avec plusieurs couches cach√©es
- Comparer diff√©rents optimiseurs, fonctions d'activation et techniques de r√©gularisation
- Analyser les performances avec des matrices de confusion et des visualisations
- Appliquer le meilleur mod√®le sur Fashion-MNIST

---
# Partie 1 : Perceptron Monocouche

## 1.1 Import des biblioth√®ques

In [None]:
# Import des biblioth√®ques n√©cessaires
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.metrics import confusion_matrix, classification_report

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist, fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD, Adam

# Configuration pour des visualisations claires
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

## 1.2 Chargement et pr√©paration des donn√©es MNIST

In [None]:
# Chargement du dataset MNIST
(x_train, y_train), (x_test, y_test) = mnist.load_data()

print(f"Forme des donn√©es d'entra√Ænement : {x_train.shape}")
print(f"Forme des labels d'entra√Ænement : {y_train.shape}")
print(f"Forme des donn√©es de test : {x_test.shape}")
print(f"Forme des labels de test : {y_test.shape}")

In [None]:
# Visualisation de quelques exemples
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(f'Label: {y_train[i]}')
    ax.axis('off')
plt.suptitle('Exemples du dataset MNIST', fontsize=16)
plt.tight_layout()
plt.show()

## 1.3 Pr√©traitement des donn√©es

In [None]:
# Aplatissement des images 28x28 en vecteurs de 784 dimensions
x_train_flat = x_train.reshape(-1, 784)
x_test_flat = x_test.reshape(-1, 784)

# Normalisation des valeurs de pixels entre 0 et 1
x_train_normalized = x_train_flat.astype('float32') / 255.0
x_test_normalized = x_test_flat.astype('float32') / 255.0

# One-hot encoding des labels
y_train_categorical = to_categorical(y_train, 10)
y_test_categorical = to_categorical(y_test, 10)

print(f"Forme apr√®s aplatissement : {x_train_flat.shape}")
print(f"Forme apr√®s one-hot encoding : {y_train_categorical.shape}")
print(f"Exemple de label one-hot pour {y_train[0]} : {y_train_categorical[0]}")

## 1.4 Impl√©mentation du Perceptron Monocouche

Un perceptron monocouche est un r√©seau de neurones sans couche cach√©e. Il connecte directement l'entr√©e (784 neurones) √† la sortie (10 classes) avec une activation Softmax.

In [None]:
# Cr√©ation du mod√®le perceptron
def create_perceptron():
    model = models.Sequential([
        layers.Dense(10, activation='softmax', input_shape=(784,))
    ])
    return model

perceptron = create_perceptron()
perceptron.summary()

## 1.5 Compilation et entra√Ænement avec SGD

In [None]:
# Compilation avec l'optimiseur SGD
perceptron.compile(
    optimizer='sgd',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Entra√Ænement du mod√®le
history_sgd = perceptron.fit(
    x_train_normalized, y_train_categorical,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

## 1.6 √âvaluation du mod√®le avec SGD

In [None]:
# √âvaluation sur l'ensemble de test
test_loss_sgd, test_accuracy_sgd = perceptron.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"\n=== Perceptron avec SGD ===")
print(f"Loss sur le test : {test_loss_sgd:.4f}")
print(f"Accuracy sur le test : {test_accuracy_sgd:.4f}")

In [None]:
# Visualisation des courbes d'apprentissage
def plot_history(history, title):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Accuracy
    ax1.plot(history.history['accuracy'], label='Train')
    ax1.plot(history.history['val_accuracy'], label='Validation')
    ax1.set_title(f'{title} - Accuracy')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Accuracy')
    ax1.legend()
    ax1.grid(True)
    
    # Loss
    ax2.plot(history.history['loss'], label='Train')
    ax2.plot(history.history['val_loss'], label='Validation')
    ax2.set_title(f'{title} - Loss')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Loss')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

plot_history(history_sgd, 'Perceptron avec SGD')

## 1.7 Test avec l'optimiseur Adam

In [None]:
# Cr√©ation d'un nouveau perceptron pour Adam
perceptron_adam = create_perceptron()

# Compilation avec l'optimiseur Adam
perceptron_adam.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Entra√Ænement
history_adam = perceptron_adam.fit(
    x_train_normalized, y_train_categorical,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

In [None]:
# √âvaluation avec Adam
test_loss_adam, test_accuracy_adam = perceptron_adam.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"\n=== Perceptron avec Adam ===")
print(f"Loss sur le test : {test_loss_adam:.4f}")
print(f"Accuracy sur le test : {test_accuracy_adam:.4f}")

# Comparaison
print(f"\n=== Comparaison SGD vs Adam ===")
print(f"SGD - Accuracy: {test_accuracy_sgd:.4f}, Loss: {test_loss_sgd:.4f}")
print(f"Adam - Accuracy: {test_accuracy_adam:.4f}, Loss: {test_loss_adam:.4f}")
print(f"Am√©lioration: {(test_accuracy_adam - test_accuracy_sgd)*100:.2f}%")

In [None]:
plot_history(history_adam, 'Perceptron avec Adam')

## 1.8 Matrice de confusion

In [None]:
# Pr√©dictions sur l'ensemble de test (utilisation du mod√®le Adam)
y_pred_proba = perceptron_adam.predict(x_test_normalized, verbose=0)
y_pred = np.argmax(y_pred_proba, axis=1)

# Calcul de la matrice de confusion
cm = confusion_matrix(y_test, y_pred)

# Affichage de la matrice de confusion
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', square=True)
plt.title('Matrice de Confusion - Perceptron (Adam)', fontsize=16)
plt.ylabel('Vraie classe')
plt.xlabel('Classe pr√©dite')
plt.show()

# Rapport de classification
print("\nRapport de classification:")
print(classification_report(y_test, y_pred))

## 1.9 Affichage des exemples mal classifi√©s

In [None]:
# Identification des exemples mal classifi√©s
misclassified_indices = np.where(y_pred != y_test)[0]
print(f"Nombre d'exemples mal classifi√©s : {len(misclassified_indices)} sur {len(y_test)}")

# Affichage de 20 exemples mal classifi√©s
num_display = min(20, len(misclassified_indices))
fig, axes = plt.subplots(4, 5, figsize=(15, 12))
for i, ax in enumerate(axes.flat):
    if i < num_display:
        idx = misclassified_indices[i]
        ax.imshow(x_test[idx], cmap='gray')
        ax.set_title(f'Vrai: {y_test[idx]}\nPr√©dit: {y_pred[idx]}', color='red')
        ax.axis('off')
    else:
        ax.axis('off')
plt.suptitle('Exemples mal classifi√©s', fontsize=16)
plt.tight_layout()
plt.show()

---
# Partie 2 : Multi-Layer Perceptron (MLP)

## 2.1 Impl√©mentation du MLP de base

Nous allons cr√©er un MLP avec deux couches cach√©es pour am√©liorer les performances du perceptron simple.

In [None]:
# Cr√©ation du mod√®le MLP
def create_mlp():
    model = models.Sequential([
        layers.Dense(256, activation='relu', input_shape=(784,)),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    return model

mlp = create_mlp()
mlp.summary()

In [None]:
# Compilation avec Adam
mlp.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Entra√Ænement
history_mlp = mlp.fit(
    x_train_normalized, y_train_categorical,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

## 2.2 √âvaluation du MLP de base

In [None]:
# √âvaluation
test_loss_mlp, test_accuracy_mlp = mlp.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"\n=== MLP de base ===")
print(f"Loss sur le test : {test_loss_mlp:.4f}")
print(f"Accuracy sur le test : {test_accuracy_mlp:.4f}")

# Comparaison avec le perceptron
print(f"\n=== Comparaison Perceptron vs MLP ===")
print(f"Perceptron (Adam) - Accuracy: {test_accuracy_adam:.4f}")
print(f"MLP - Accuracy: {test_accuracy_mlp:.4f}")
print(f"Am√©lioration: {(test_accuracy_mlp - test_accuracy_adam)*100:.2f}%")

In [None]:
plot_history(history_mlp, 'MLP de base')

---
## 2.3 Exp√©rimentations et optimisations

Nous allons maintenant tester diff√©rentes configurations pour optimiser notre MLP.

### 2.3.1 Test de diff√©rentes fonctions d'activation

In [None]:
# Stockage des r√©sultats
results = []

# Test avec ReLU (d√©j√† fait, pour r√©f√©rence)
results.append({
    'Mod√®le': 'MLP - ReLU',
    'Test Loss': test_loss_mlp,
    'Test Accuracy': test_accuracy_mlp
})

In [None]:
# Test avec PReLU
print("\n=== Test avec PReLU ===")
mlp_prelu = models.Sequential([
    layers.Dense(256, input_shape=(784,)),
    layers.PReLU(),
    layers.Dense(128),
    layers.PReLU(),
    layers.Dense(10, activation='softmax')
])

mlp_prelu.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_prelu = mlp_prelu.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_prelu, test_accuracy_prelu = mlp_prelu.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"PReLU - Loss: {test_loss_prelu:.4f}, Accuracy: {test_accuracy_prelu:.4f}")
results.append({'Mod√®le': 'MLP - PReLU', 'Test Loss': test_loss_prelu, 'Test Accuracy': test_accuracy_prelu})

In [None]:
# Test avec LeakyReLU
print("\n=== Test avec LeakyReLU ===")
mlp_leaky = models.Sequential([
    layers.Dense(256, input_shape=(784,)),
    layers.LeakyReLU(alpha=0.1),
    layers.Dense(128),
    layers.LeakyReLU(alpha=0.1),
    layers.Dense(10, activation='softmax')
])

mlp_leaky.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_leaky = mlp_leaky.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_leaky, test_accuracy_leaky = mlp_leaky.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"LeakyReLU - Loss: {test_loss_leaky:.4f}, Accuracy: {test_accuracy_leaky:.4f}")
results.append({'Mod√®le': 'MLP - LeakyReLU', 'Test Loss': test_loss_leaky, 'Test Accuracy': test_accuracy_leaky})

### 2.3.2 Ajout de Dropout

In [None]:
# Test avec Dropout (0.25)
print("\n=== Test avec Dropout (0.25) ===")
mlp_dropout = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dropout(0.25),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.25),
    layers.Dense(10, activation='softmax')
])

mlp_dropout.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_dropout = mlp_dropout.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_dropout, test_accuracy_dropout = mlp_dropout.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Dropout (0.25) - Loss: {test_loss_dropout:.4f}, Accuracy: {test_accuracy_dropout:.4f}")
results.append({'Mod√®le': 'MLP - Dropout 0.25', 'Test Loss': test_loss_dropout, 'Test Accuracy': test_accuracy_dropout})

### 2.3.3 Ajout de Batch Normalization

In [None]:
# Test avec Batch Normalization
print("\n=== Test avec Batch Normalization ===")
mlp_bn = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.BatchNormalization(),
    layers.Dense(128, activation='relu'),
    layers.BatchNormalization(),
    layers.Dense(10, activation='softmax')
])

mlp_bn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_bn = mlp_bn.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_bn, test_accuracy_bn = mlp_bn.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Batch Normalization - Loss: {test_loss_bn:.4f}, Accuracy: {test_accuracy_bn:.4f}")
results.append({'Mod√®le': 'MLP - Batch Norm', 'Test Loss': test_loss_bn, 'Test Accuracy': test_accuracy_bn})

### 2.3.4 Test de diff√©rents taux de Dropout

In [None]:
# Test avec Dropout (0.2)
print("\n=== Test avec Dropout (0.2) ===")
mlp_dropout_02 = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dropout(0.2),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

mlp_dropout_02.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_dropout_02 = mlp_dropout_02.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_dropout_02, test_accuracy_dropout_02 = mlp_dropout_02.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Dropout (0.2) - Loss: {test_loss_dropout_02:.4f}, Accuracy: {test_accuracy_dropout_02:.4f}")
results.append({'Mod√®le': 'MLP - Dropout 0.2', 'Test Loss': test_loss_dropout_02, 'Test Accuracy': test_accuracy_dropout_02})

In [None]:
# Test avec Dropout (0.3)
print("\n=== Test avec Dropout (0.3) ===")
mlp_dropout_03 = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(10, activation='softmax')
])

mlp_dropout_03.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_dropout_03 = mlp_dropout_03.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_dropout_03, test_accuracy_dropout_03 = mlp_dropout_03.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Dropout (0.3) - Loss: {test_loss_dropout_03:.4f}, Accuracy: {test_accuracy_dropout_03:.4f}")
results.append({'Mod√®le': 'MLP - Dropout 0.3', 'Test Loss': test_loss_dropout_03, 'Test Accuracy': test_accuracy_dropout_03})

### 2.3.5 Variation du taux d'apprentissage et de l'optimiseur

In [None]:
# Test avec Adam et learning rate = 0.001 (default)
print("\n=== Test avec Adam (lr=0.001) ===")
mlp_adam_001 = create_mlp()
mlp_adam_001.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
history_adam_001 = mlp_adam_001.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_adam_001, test_accuracy_adam_001 = mlp_adam_001.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Adam (lr=0.001) - Loss: {test_loss_adam_001:.4f}, Accuracy: {test_accuracy_adam_001:.4f}")
results.append({'Mod√®le': 'MLP - Adam lr=0.001', 'Test Loss': test_loss_adam_001, 'Test Accuracy': test_accuracy_adam_001})

In [None]:
# Test avec Adam et learning rate = 0.0001
print("\n=== Test avec Adam (lr=0.0001) ===")
mlp_adam_0001 = create_mlp()
mlp_adam_0001.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
history_adam_0001 = mlp_adam_0001.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_adam_0001, test_accuracy_adam_0001 = mlp_adam_0001.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"Adam (lr=0.0001) - Loss: {test_loss_adam_0001:.4f}, Accuracy: {test_accuracy_adam_0001:.4f}")
results.append({'Mod√®le': 'MLP - Adam lr=0.0001', 'Test Loss': test_loss_adam_0001, 'Test Accuracy': test_accuracy_adam_0001})

In [None]:
# Test avec SGD et learning rate = 0.01
print("\n=== Test avec SGD (lr=0.01) ===")
mlp_sgd_001 = create_mlp()
mlp_sgd_001.compile(optimizer=SGD(learning_rate=0.01), loss='categorical_crossentropy', metrics=['accuracy'])
history_sgd_001 = mlp_sgd_001.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_sgd_001, test_accuracy_sgd_001 = mlp_sgd_001.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"SGD (lr=0.01) - Loss: {test_loss_sgd_001:.4f}, Accuracy: {test_accuracy_sgd_001:.4f}")
results.append({'Mod√®le': 'MLP - SGD lr=0.01', 'Test Loss': test_loss_sgd_001, 'Test Accuracy': test_accuracy_sgd_001})

In [None]:
# Test avec SGD et learning rate = 0.1
print("\n=== Test avec SGD (lr=0.1) ===")
mlp_sgd_01 = create_mlp()
mlp_sgd_01.compile(optimizer=SGD(learning_rate=0.1), loss='categorical_crossentropy', metrics=['accuracy'])
history_sgd_01 = mlp_sgd_01.fit(
    x_train_normalized, y_train_categorical,
    epochs=10, batch_size=32, validation_split=0.2, verbose=1
)

test_loss_sgd_01, test_accuracy_sgd_01 = mlp_sgd_01.evaluate(x_test_normalized, y_test_categorical, verbose=0)
print(f"SGD (lr=0.1) - Loss: {test_loss_sgd_01:.4f}, Accuracy: {test_accuracy_sgd_01:.4f}")
results.append({'Mod√®le': 'MLP - SGD lr=0.1', 'Test Loss': test_loss_sgd_01, 'Test Accuracy': test_accuracy_sgd_01})

### 2.3.6 Tableau r√©capitulatif de toutes les exp√©rimentations

In [None]:
# Cr√©ation du DataFrame de r√©sultats
results_df = pd.DataFrame(results)
results_df = results_df.sort_values('Test Accuracy', ascending=False)
results_df['Test Accuracy %'] = (results_df['Test Accuracy'] * 100).round(2)

print("\n" + "="*80)
print("TABLEAU R√âCAPITULATIF DES PERFORMANCES")
print("="*80)
print(results_df.to_string(index=False))
print("="*80)

# Identification du meilleur mod√®le
best_model_row = results_df.iloc[0]
print(f"\nüèÜ Meilleur mod√®le : {best_model_row['Mod√®le']}")
print(f"   Accuracy : {best_model_row['Test Accuracy %']:.2f}%")
print(f"   Loss : {best_model_row['Test Loss']:.4f}")

In [None]:
# Visualisation des performances
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Accuracy
ax1.barh(results_df['Mod√®le'], results_df['Test Accuracy %'], color='skyblue')
ax1.set_xlabel('Test Accuracy (%)', fontsize=12)
ax1.set_title('Comparaison des Accuracies', fontsize=14)
ax1.grid(axis='x', alpha=0.3)

# Loss
ax2.barh(results_df['Mod√®le'], results_df['Test Loss'], color='salmon')
ax2.set_xlabel('Test Loss', fontsize=12)
ax2.set_title('Comparaison des Losses', fontsize=14)
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

---
## 2.4 Interpr√©tation des r√©sultats

### 2.4.1 Quel mod√®le fonctionne le mieux et pourquoi ?

D'apr√®s nos exp√©riences, nous pouvons observer que :

1. **Le MLP surpasse le Perceptron simple** : L'ajout de couches cach√©es permet au r√©seau d'apprendre des repr√©sentations plus complexes et non-lin√©aires des donn√©es, ce qui am√©liore significativement la pr√©cision.

2. **L'optimiseur Adam est g√©n√©ralement plus performant que SGD** : Adam adapte automatiquement le taux d'apprentissage pour chaque param√®tre, ce qui acc√©l√®re la convergence et am√©liore la performance finale.

3. **Les diff√©rentes fonctions d'activation** : ReLU est g√©n√©ralement un bon choix par d√©faut. PReLU et LeakyReLU peuvent apporter de l√©g√®res am√©liorations en permettant un petit gradient pour les valeurs n√©gatives.

### 2.4.2 Le Dropout am√©liore-t-il les performances ?

Le Dropout est une technique de r√©gularisation qui aide √† pr√©venir le surapprentissage en d√©sactivant al√©atoirement des neurones pendant l'entra√Ænement :

- **Avantages** : R√©duit le surapprentissage, am√©liore la g√©n√©ralisation
- **Inconv√©nients** : Peut l√©g√®rement r√©duire la pr√©cision sur l'ensemble d'entra√Ænement
- **Optimal** : Un taux de dropout entre 0.2 et 0.3 est g√©n√©ralement un bon compromis

### 2.4.3 Impact de l'ajout de couches suppl√©mentaires

- **Plus de couches** = Plus de capacit√© d'apprentissage
- Cependant, trop de couches peuvent conduire au surapprentissage si les donn√©es sont limit√©es
- Les couches de normalisation (Batch Normalization) aident √† stabiliser l'entra√Ænement de r√©seaux plus profonds

### 2.4.4 Conclusions

- Un MLP avec 2-3 couches cach√©es, activation ReLU, optimizer Adam, et un l√©ger Dropout est g√©n√©ralement optimal pour MNIST
- La normalisation des donn√©es est cruciale pour de bonnes performances
- Le choix du taux d'apprentissage a un impact significatif sur la convergence

---
## 2.5 Application du meilleur mod√®le sur Fashion-MNIST

Maintenant, appliquons notre meilleur architecture sur le dataset Fashion-MNIST pour voir comment elle se g√©n√©ralise √† un probl√®me diff√©rent.

In [None]:
# Chargement de Fashion-MNIST
(x_train_fashion, y_train_fashion), (x_test_fashion, y_test_fashion) = fashion_mnist.load_data()

print(f"Fashion-MNIST - Forme des donn√©es d'entra√Ænement : {x_train_fashion.shape}")
print(f"Fashion-MNIST - Forme des donn√©es de test : {x_test_fashion.shape}")

# Classes Fashion-MNIST
fashion_class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                       'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

In [None]:
# Visualisation de quelques exemples Fashion-MNIST
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train_fashion[i], cmap='gray')
    ax.set_title(f'{fashion_class_names[y_train_fashion[i]]}')
    ax.axis('off')
plt.suptitle('Exemples du dataset Fashion-MNIST', fontsize=16)
plt.tight_layout()
plt.show()

In [None]:
# Pr√©traitement des donn√©es Fashion-MNIST
x_train_fashion_flat = x_train_fashion.reshape(-1, 784).astype('float32') / 255.0
x_test_fashion_flat = x_test_fashion.reshape(-1, 784).astype('float32') / 255.0

y_train_fashion_categorical = to_categorical(y_train_fashion, 10)
y_test_fashion_categorical = to_categorical(y_test_fashion, 10)

print(f"Donn√©es pr√©trait√©es - Train: {x_train_fashion_flat.shape}, Test: {x_test_fashion_flat.shape}")

In [None]:
# Cr√©ation du meilleur mod√®le (bas√© sur les exp√©riences)
# Utilisons un MLP avec Dropout 0.2 et Adam
best_model_fashion = models.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dropout(0.2),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

best_model_fashion.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("\n=== Entra√Ænement sur Fashion-MNIST ===")
history_fashion = best_model_fashion.fit(
    x_train_fashion_flat, y_train_fashion_categorical,
    epochs=15,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

In [None]:
# √âvaluation sur Fashion-MNIST
test_loss_fashion, test_accuracy_fashion = best_model_fashion.evaluate(
    x_test_fashion_flat, y_test_fashion_categorical, verbose=0
)

print(f"\n=== R√©sultats sur Fashion-MNIST ===")
print(f"Test Loss : {test_loss_fashion:.4f}")
print(f"Test Accuracy : {test_accuracy_fashion:.4f}")

print(f"\n=== Comparaison MNIST vs Fashion-MNIST ===")
print(f"MNIST - Accuracy : {test_accuracy_mlp:.4f}")
print(f"Fashion-MNIST - Accuracy : {test_accuracy_fashion:.4f}")
print(f"Diff√©rence : {(test_accuracy_mlp - test_accuracy_fashion)*100:.2f}%")

In [None]:
# Courbes d'apprentissage pour Fashion-MNIST
plot_history(history_fashion, 'MLP sur Fashion-MNIST')

In [None]:
# Matrice de confusion pour Fashion-MNIST
y_pred_fashion_proba = best_model_fashion.predict(x_test_fashion_flat, verbose=0)
y_pred_fashion = np.argmax(y_pred_fashion_proba, axis=1)

cm_fashion = confusion_matrix(y_test_fashion, y_pred_fashion)

plt.figure(figsize=(12, 10))
sns.heatmap(cm_fashion, annot=True, fmt='d', cmap='Blues', square=True,
            xticklabels=fashion_class_names, yticklabels=fashion_class_names)
plt.title('Matrice de Confusion - Fashion-MNIST', fontsize=16)
plt.ylabel('Vraie classe')
plt.xlabel('Classe pr√©dite')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

print("\nRapport de classification Fashion-MNIST:")
print(classification_report(y_test_fashion, y_pred_fashion, target_names=fashion_class_names))

In [None]:
# Exemples de pr√©dictions correctes
correct_indices = np.where(y_pred_fashion == y_test_fashion)[0]
print(f"Nombre de pr√©dictions correctes : {len(correct_indices)} sur {len(y_test_fashion)}")

fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for i, ax in enumerate(axes.flat):
    idx = correct_indices[i]
    ax.imshow(x_test_fashion[idx], cmap='gray')
    ax.set_title(f'Pr√©dit: {fashion_class_names[y_pred_fashion[idx]]}\nVrai: {fashion_class_names[y_test_fashion[idx]]}',
                 color='green', fontsize=10)
    ax.axis('off')
plt.suptitle('Exemples de pr√©dictions correctes - Fashion-MNIST', fontsize=16)
plt.tight_layout()
plt.show()

In [None]:
# Exemples mal classifi√©s sur Fashion-MNIST
misclassified_fashion = np.where(y_pred_fashion != y_test_fashion)[0]
print(f"Nombre d'exemples mal classifi√©s : {len(misclassified_fashion)} sur {len(y_test_fashion)}")

num_display = min(20, len(misclassified_fashion))
fig, axes = plt.subplots(4, 5, figsize=(15, 12))
for i, ax in enumerate(axes.flat):
    if i < num_display:
        idx = misclassified_fashion[i]
        ax.imshow(x_test_fashion[idx], cmap='gray')
        ax.set_title(f'Vrai: {fashion_class_names[y_test_fashion[idx]]}\nPr√©dit: {fashion_class_names[y_pred_fashion[idx]]}',
                     color='red', fontsize=9)
        ax.axis('off')
    else:
        ax.axis('off')
plt.suptitle('Exemples mal classifi√©s - Fashion-MNIST', fontsize=16)
plt.tight_layout()
plt.show()

---
## Conclusions finales

### R√©sum√© des apprentissages

Dans ce TP, nous avons explor√© :

1. **Perceptron monocouche** : Un mod√®le simple qui atteint ~92% d'accuracy sur MNIST
2. **Multi-Layer Perceptron** : L'ajout de couches cach√©es am√©liore significativement les performances (~97-98%)
3. **Optimisation** : Adam converge plus rapidement que SGD pour ce type de probl√®me
4. **R√©gularisation** : Le Dropout et la Batch Normalization aident √† am√©liorer la g√©n√©ralisation
5. **Fonctions d'activation** : ReLU, PReLU et LeakyReLU donnent des r√©sultats similaires
6. **Fashion-MNIST** : Plus difficile que MNIST (accuracy ~88-90%) car les images sont plus complexes

### Recommandations

Pour des probl√®mes de classification d'images similaires :
- Utiliser un MLP avec 2-3 couches cach√©es
- Optimizer Adam avec learning rate ~0.001
- Ajouter un Dropout l√©ger (0.2-0.25) pour √©viter le surapprentissage
- Normaliser les donn√©es d'entr√©e
- Utiliser ReLU comme fonction d'activation par d√©faut

Pour am√©liorer encore les performances, on pourrait explorer :
- Des r√©seaux convolutifs (CNN) qui sont mieux adapt√©s aux images
- L'augmentation de donn√©es
- Des architectures plus profondes
- L'optimisation des hyperparam√®tres par grid search ou random search