Ilyas Ustun

# Neural Network Multiclass Classification Exercise Solutions

## Objective
In this exercise, you will build and train a neural network to classify wine types using the Wine dataset. You'll learn how to:

1. Load and explore classification data
2. Prepare data for neural network classification
3. Build a neural network for multiclass classification
4. Train and evaluate the classification model
5. Visualize classification results and performance metrics

## Dataset
We'll use the Wine dataset from sklearn, which contains chemical analysis of wines from three different cultivars. Our goal is to classify wines into one of three classes based on 13 chemical features.

## Task 1: Import Required Libraries

**Solution Explanation:**
We import all necessary libraries for data handling, model building, and visualization. Each library serves a specific purpose:
- `tensorflow`: For building and training neural networks
- `numpy`: For numerical operations
- `matplotlib` & `seaborn`: For creating visualizations
- `sklearn`: For dataset loading, preprocessing, and evaluation metrics
- `pandas`: For data manipulation and analysis

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.utils.class_weight import compute_class_weight

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")
print(f"NumPy version: {np.__version__}")

ModuleNotFoundError: No module named 'tensorflow'

## Task 2: Load and Explore the Dataset

**Solution Explanation:**
We load the Wine dataset and explore its structure to understand what we're working with. This includes checking the shape of data, feature names, target classes, and their distributions.

In [None]:
# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

print("Dataset Information:")
print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Feature names: {wine.feature_names}")
print(f"Target names: {wine.target_names}")
print(f"Number of classes: {len(wine.target_names)}")
print(f"Dataset description: {wine.DESCR[:300]}...")

In [None]:
# Create DataFrame for easier exploration
df = pd.DataFrame(X, columns=wine.feature_names)
df['target'] = y
df['target_name'] = [wine.target_names[i] for i in y]

print("\nDataset Statistics:")
print(df.describe())

# Check class distribution
print("\nClass Distribution:")
class_counts = pd.Series(y).value_counts().sort_index()
for i, (class_idx, count) in enumerate(class_counts.items()):
    print(f"Class {class_idx} ({wine.target_names[class_idx]}): {count} samples ({count/len(y)*100:.1f}%)")

In [None]:
# Visualize class distribution and feature relationships
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Class distribution
axes[0, 0].bar(range(len(wine.target_names)), class_counts.values, 
               color=['skyblue', 'lightcoral', 'lightgreen'])
axes[0, 0].set_xlabel('Wine Class')
axes[0, 0].set_ylabel('Number of Samples')
axes[0, 0].set_title('Distribution of Wine Classes')
axes[0, 0].set_xticks(range(len(wine.target_names)))
axes[0, 0].set_xticklabels(wine.target_names, rotation=45)

# Feature correlation heatmap (top 8 features)
top_features = ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 
                'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols']
corr_matrix = df[top_features].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, 
            ax=axes[0, 1], fmt='.2f')
axes[0, 1].set_title('Feature Correlation Matrix (Top 8 Features)')

# Scatter plot of two important features
colors = ['red', 'blue', 'green']
for i, target_name in enumerate(wine.target_names):
    mask = y == i
    axes[1, 0].scatter(X[mask, 0], X[mask, 6], 
                      c=colors[i], label=target_name, alpha=0.7)
axes[1, 0].set_xlabel('Alcohol')
axes[1, 0].set_ylabel('Flavanoids')
axes[1, 0].set_title('Wine Classes by Alcohol vs Flavanoids')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Box plot of alcohol content by class
df.boxplot(column='alcohol', by='target_name', ax=axes[1, 1])
axes[1, 1].set_title('Alcohol Content Distribution by Wine Class')
axes[1, 1].set_xlabel('Wine Class')
axes[1, 1].set_ylabel('Alcohol Content')

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("- Dataset is relatively balanced with slight imbalance")
print("- Features show different scales (need normalization)")
print("- Clear separation between classes in alcohol vs flavanoids plot")
print("- Some features are correlated (multicollinearity present)")

## Task 3: Prepare the Data

**Solution Explanation:**
Data preparation for classification includes:
1. Split data into training (70%), validation (15%), and testing (15%) sets
2. Normalize features using StandardScaler
3. Convert target labels to categorical format (one-hot encoding) for neural networks
4. Calculate class weights to handle potential imbalance

In [None]:
# Split data into train, validation, and test sets
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.15, random_state=42, stratify=y
)

X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.176, random_state=42, stratify=y_temp  # 0.176 * 0.85 ≈ 0.15
)

print("Data split completed:")
print(f"Training samples: {X_train.shape[0]} ({X_train.shape[0]/len(X)*100:.1f}%)")
print(f"Validation samples: {X_val.shape[0]} ({X_val.shape[0]/len(X)*100:.1f}%)")
print(f"Testing samples: {X_test.shape[0]} ({X_test.shape[0]/len(X)*100:.1f}%)")

# Check class distribution in each set
print("\nClass distribution in each set:")
for name, y_subset in [('Train', y_train), ('Validation', y_val), ('Test', y_test)]:
    counts = np.bincount(y_subset)
    print(f"{name}: {counts} -> {counts/len(y_subset)*100}")

In [None]:
# Normalize features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

print("Data normalization completed:")
print(f"Training set shape: {X_train_scaled.shape}")
print(f"Validation set shape: {X_val_scaled.shape}")
print(f"Test set shape: {X_test_scaled.shape}")

# Verify normalization
print(f"\nAfter normalization - Training data mean: {np.mean(X_train_scaled):.6f}")
print(f"After normalization - Training data std: {np.std(X_train_scaled):.6f}")

In [None]:
# Convert labels to categorical (one-hot encoding)
num_classes = len(wine.target_names)
y_train_categorical = tf.keras.utils.to_categorical(y_train, num_classes)
y_val_categorical = tf.keras.utils.to_categorical(y_val, num_classes)
y_test_categorical = tf.keras.utils.to_categorical(y_test, num_classes)

print("Label encoding completed:")
print(f"Original labels shape: {y_train.shape}")
print(f"Categorical labels shape: {y_train_categorical.shape}")
print(f"Number of classes: {num_classes}")

# Show example of one-hot encoding
print("\nExample of one-hot encoding:")
print(f"Original label: {y_train[0]} ({wine.target_names[y_train[0]]})")
print(f"One-hot encoded: {y_train_categorical[0]}")

# Calculate class weights for handling imbalance
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
class_weight_dict = dict(enumerate(class_weights))
print(f"\nClass weights: {class_weight_dict}")

## Task 4: Build a Neural Network for Classification

**Solution Explanation:**
We create a Sequential model with:
- Input layer: 13 features (wine chemical properties)
- Hidden layer 1: 64 neurons with ReLU activation
- Hidden layer 2: 32 neurons with ReLU activation
- Hidden layer 3: 16 neurons with ReLU activation
- Output layer: 3 neurons with softmax activation for multiclass classification

In [None]:
# Create a Sequential model for classification
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(num_classes, activation='softmax')  # Softmax for multiclass
])

print("Model created successfully!")
print("\nModel Architecture:")
print("- Input layer: 13 features (wine chemical properties)")
print("- Hidden layer 1: 64 neurons with ReLU activation")
print("- Hidden layer 2: 32 neurons with ReLU activation")
print("- Hidden layer 3: 16 neurons with ReLU activation")
print("- Output layer: 3 neurons with softmax activation (multiclass)")

# Display model architecture
model.summary()

## Task 5: Compile the Model

**Solution Explanation:**
Model compilation configures the training process:
- **Optimizer**: Adam (adaptive learning rate, works well for most problems)
- **Loss function**: Categorical crossentropy (standard for multiclass classification)
- **Metrics**: Accuracy (percentage of correct predictions)

In [None]:
# Configure the model for training
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',    # For multiclass classification
    metrics=['accuracy']                # Classification accuracy
)

print("Model compiled successfully!")
print("\nCompilation settings:")
print("- Optimizer: Adam (adaptive learning rate)")
print("- Loss function: Categorical Crossentropy")
print("- Metrics: Accuracy")
print("\nKey differences from regression:")
print("- Loss: MSE → Categorical Crossentropy")
print("- Metrics: MAE → Accuracy")
print("- Output activation: None → Softmax")

## Task 6: Train the Model

**Solution Explanation:**
Training parameters:
- **Epochs**: 150 (number of complete passes through the training data)
- **Batch size**: 16 (smaller batch size for small dataset)
- **Validation data**: Separate validation set for monitoring
- **Class weights**: Handle class imbalance
- **Verbose**: 1 (show training progress)

In [None]:
# Train the model
print("Starting training...")
history = model.fit(
    X_train_scaled, y_train_categorical,
    epochs=150,
    batch_size=16,
    validation_data=(X_val_scaled, y_val_categorical),
    class_weight=class_weight_dict,
    verbose=1
)

print("\nTraining completed!")
print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final validation accuracy: {history.history['val_accuracy'][-1]:.4f}")

## Task 7: Visualize Training Progress

**Solution Explanation:**
Training visualization helps us understand:
- **Loss curves**: How well the model is learning (decreasing loss is good)
- **Accuracy curves**: Classification performance over time
- **Overfitting detection**: Gap between training and validation metrics

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Plot loss
axes[0].plot(history.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss (Categorical Crossentropy)')
axes[0].set_title('Model Loss During Training')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot accuracy
axes[1].plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Model Accuracy During Training')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Training analysis
final_train_loss = history.history['loss'][-1]
final_val_loss = history.history['val_loss'][-1]
final_train_acc = history.history['accuracy'][-1]
final_val_acc = history.history['val_accuracy'][-1]

print("Training Analysis:")
print(f"Final training loss: {final_train_loss:.4f}")
print(f"Final validation loss: {final_val_loss:.4f}")
print(f"Final training accuracy: {final_train_acc:.4f}")
print(f"Final validation accuracy: {final_val_acc:.4f}")
print(f"\nOverfitting indicators:")
print(f"Loss gap: {((final_val_loss - final_train_loss) / final_train_loss * 100):.1f}% higher validation loss")
print(f"Accuracy gap: {((final_train_acc - final_val_acc) / final_train_acc * 100):.1f}% lower validation accuracy")

## Task 8: Test the Model

**Solution Explanation:**
Model evaluation on unseen test data gives us the true performance:
- **Test Accuracy**: Percentage of correct predictions
- **Classification Report**: Precision, recall, and F1-score for each class
- **Confusion Matrix**: Shows prediction vs actual class distribution

In [None]:
# Evaluate model on test set
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test_categorical, verbose=0)
print(f"Test Results:")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.1f}%)")

# Make predictions
y_pred_proba = model.predict(X_test_scaled)
y_pred = np.argmax(y_pred_proba, axis=1)

print(f"\nPrediction Analysis:")
print(f"Correctly classified: {np.sum(y_pred == y_test)} out of {len(y_test)}")
print(f"Misclassified: {np.sum(y_pred != y_test)} out of {len(y_test)}")

In [None]:
# Detailed classification report
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Show first 15 predictions with confidence scores
print("\nFirst 15 Predictions with Confidence:")
print("Predicted | Actual | Confidence | Correct")
print("-" * 45)
for i in range(min(15, len(y_test))):
    pred_class = y_pred[i]
    actual_class = y_test[i]
    confidence = np.max(y_pred_proba[i])
    correct = "✓" if pred_class == actual_class else "✗"
    
    pred_name = wine.target_names[pred_class][:8]
    actual_name = wine.target_names[actual_class][:8]
    
    print(f"{pred_name:>8} | {actual_name:>6} | {confidence:>8.3f} | {correct:>7}")

## Task 9: Visualize Classification Results

**Solution Explanation:**
Result visualization helps us understand model performance:
- **Confusion Matrix**: Shows which classes are confused with each other
- **Classification Metrics**: Precision, recall, F1-score per class
- **Prediction Confidence**: Distribution of prediction probabilities
- **Feature Importance**: Which features contribute most to predictions

In [None]:
# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=wine.target_names, yticklabels=wine.target_names,
            ax=axes[0, 0])
axes[0, 0].set_xlabel('Predicted')
axes[0, 0].set_ylabel('Actual')
axes[0, 0].set_title('Confusion Matrix')

# 2. Classification metrics comparison
report = classification_report(y_test, y_pred, target_names=wine.target_names, output_dict=True)
metrics = ['precision', 'recall', 'f1-score']
x_pos = np.arange(len(wine.target_names))
width = 0.25

for i, metric in enumerate(metrics):
    values = [report[class_name][metric] for class_name in wine.target_names]
    axes[0, 1].bar(x_pos + i * width, values, width, label=metric.capitalize())

axes[0, 1].set_xlabel('Wine Class')
axes[0, 1].set_ylabel('Score')
axes[0, 1].set_title('Classification Metrics by Class')
axes[0, 1].set_xticks(x_pos + width)
axes[0, 1].set_xticklabels(wine.target_names)
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# 3. Prediction confidence distribution
confidence_scores = np.max(y_pred_proba, axis=1)
axes[1, 0].hist(confidence_scores, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[1, 0].axvline(np.mean(confidence_scores), color='red', linestyle='--', 
                   label=f'Mean: {np.mean(confidence_scores):.3f}')
axes[1, 0].set_xlabel('Prediction Confidence')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Distribution of Prediction Confidence')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 4. Correct vs Incorrect predictions confidence
correct_mask = y_pred == y_test
correct_conf = confidence_scores[correct_mask]
incorrect_conf = confidence_scores[~correct_mask]

axes[1, 1].hist(correct_conf, bins=15, alpha=0.7, label='Correct', color='green')
axes[1, 1].hist(incorrect_conf, bins=15, alpha=0.7, label='Incorrect', color='red')
axes[1, 1].set_xlabel('Prediction Confidence')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Confidence: Correct vs Incorrect Predictions')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Performance summary
print(f"\nModel Performance Summary:")
print(f"Test Accuracy: {test_accuracy:.3f} ({test_accuracy*100:.1f}%)")
print(f"Average Confidence: {np.mean(confidence_scores):.3f}")
print(f"Correct Predictions Confidence: {np.mean(correct_conf):.3f}")
if len(incorrect_conf) > 0:
    print(f"Incorrect Predictions Confidence: {np.mean(incorrect_conf):.3f}")
else:
    print("No incorrect predictions!")

## Task 10: Analyze Model Predictions

**Solution Explanation:**
Let's analyze which samples the model finds most difficult to classify and examine the decision boundaries.

In [None]:
# Analyze difficult predictions (lowest confidence)
confidence_indices = np.argsort(confidence_scores)
print("Most Uncertain Predictions (Lowest Confidence):")
print("Index | Predicted | Actual | Confidence | Features")
print("-" * 70)

for i in range(min(10, len(confidence_indices))):
    idx = confidence_indices[i]
    pred_class = y_pred[idx]
    actual_class = y_test[idx]
    confidence = confidence_scores[idx]
    
    # Show top 3 feature values for this sample
    sample_features = X_test_scaled[idx]
    top_features_idx = np.argsort(np.abs(sample_features))[-3:]
    
    print(f"{idx:>5} | {wine.target_names[pred_class]:>9} | {wine.target_names[actual_class]:>6} | {confidence:>8.3f} | ", end="")
    for j, feat_idx in enumerate(top_features_idx):
        if j > 0:
            print(", ", end="")
        print(f"{wine.feature_names[feat_idx][:8]}:{sample_features[feat_idx]:.2f}", end="")
    print()

# Analyze class-wise performance
print("\nClass-wise Performance Analysis:")
for i, class_name in enumerate(wine.target_names):
    class_mask = y_test == i
    class_accuracy = np.mean(y_pred[class_mask] == y_test[class_mask])
    class_confidence = np.mean(confidence_scores[class_mask])
    print(f"{class_name}: Accuracy={class_accuracy:.3f}, Avg Confidence={class_confidence:.3f}")

## Bonus Challenge: Improved Model with Regularization

**Solution Explanation:**
Let's try to improve our model with:
- **Dropout layers**: Prevent overfitting by randomly setting some neurons to zero
- **Batch normalization**: Normalize inputs to each layer
- **Early stopping**: Stop training when validation loss stops improving
- **Learning rate scheduling**: Reduce learning rate when loss plateaus

In [None]:
print("Building improved model with regularization...")

# Build improved model with regularization
improved_model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

# Compile with custom learning rate
improved_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

improved_model.summary()

In [None]:
# Set up callbacks for improved training
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=10,
    min_lr=0.00001,
    verbose=1
)

print("Training improved model with callbacks...")
improved_history = improved_model.fit(
    X_train_scaled, y_train_categorical,
    epochs=200,
    batch_size=16,
    validation_data=(X_val_scaled, y_val_categorical),
    class_weight=class_weight_dict,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

print(f"\nTraining stopped after {len(improved_history.history['loss'])} epochs")

In [None]:
# Evaluate improved model
improved_test_loss, improved_test_accuracy = improved_model.evaluate(X_test_scaled, y_test_categorical, verbose=0)
improved_y_pred_proba = improved_model.predict(X_test_scaled)
improved_y_pred = np.argmax(improved_y_pred_proba, axis=1)

print(f"Model Comparison:")
print(f"Original Model  - Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.1f}%)")
print(f"Improved Model  - Test Accuracy: {improved_test_accuracy:.4f} ({improved_test_accuracy*100:.1f}%)")
print(f"\nImprovement: {((improved_test_accuracy - test_accuracy) / test_accuracy * 100):.1f}% increase in accuracy")

# Detailed comparison
print("\nDetailed Comparison:")
print("\nOriginal Model Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))

print("\nImproved Model Classification Report:")
print(classification_report(y_test, improved_y_pred, target_names=wine.target_names))

In [None]:
# Compare training histories
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Training loss comparison
axes[0, 0].plot(history.history['loss'], label='Original Model', linewidth=2)
axes[0, 0].plot(improved_history.history['loss'], label='Improved Model', linewidth=2)
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Training Loss')
axes[0, 0].set_title('Training Loss Comparison')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Validation loss comparison
axes[0, 1].plot(history.history['val_loss'], label='Original Model', linewidth=2)
axes[0, 1].plot(improved_history.history['val_loss'], label='Improved Model', linewidth=2)
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Validation Loss')
axes[0, 1].set_title('Validation Loss Comparison')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Training accuracy comparison
axes[1, 0].plot(history.history['accuracy'], label='Original Model', linewidth=2)
axes[1, 0].plot(improved_history.history['accuracy'], label='Improved Model', linewidth=2)
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Training Accuracy')
axes[1, 0].set_title('Training Accuracy Comparison')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Validation accuracy comparison
axes[1, 1].plot(history.history['val_accuracy'], label='Original Model', linewidth=2)
axes[1, 1].plot(improved_history.history['val_accuracy'], label='Improved Model', linewidth=2)
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Validation Accuracy')
axes[1, 1].set_title('Validation Accuracy Comparison')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Summary Questions - Test Your Understanding

Try to answer these questions to check your understanding:

### 1. What does the softmax activation function do in the output layer?

**Answer:** The softmax function converts the raw output scores (logits) into probabilities that sum to 1.0. It ensures each output represents the probability of belonging to that class, making it perfect for multiclass classification.

### 2. Why do we use categorical crossentropy instead of mean squared error for classification?

**Answer:** Categorical crossentropy is designed for probability distributions and penalizes confident wrong predictions more heavily. MSE treats all errors equally, while crossentropy focuses on the probability of the correct class, making it more suitable for classification tasks.

### 3. What does a confusion matrix tell us about our model's performance?

**Answer:** A confusion matrix shows exactly which classes are being confused with each other. The diagonal elements represent correct predictions, while off-diagonal elements show misclassifications. It helps identify which classes are hardest to distinguish.

### 4. What is the difference between precision and recall?

**Answer:** 
- **Precision**: Of all positive predictions, how many were actually correct? (TP / (TP + FP))
- **Recall**: Of all actual positives, how many did we correctly identify? (TP / (TP + FN))
- High precision = few false positives; High recall = few false negatives

### 5. How do dropout and batch normalization help improve model performance?

**Answer:** 
- **Dropout**: Randomly sets neurons to zero during training, preventing overfitting by forcing the network to not rely on specific neurons
- **Batch Normalization**: Normalizes inputs to each layer, stabilizing training and allowing higher learning rates
- Both techniques act as regularization, improving generalization

## Bonus: Feature Importance Analysis

**Solution Explanation:**
Let's analyze which features contribute most to the model's predictions using a simple technique.

In [None]:
# Simple feature importance analysis
# We'll use the weights from the first layer as a proxy for feature importance
first_layer_weights = improved_model.layers[0].get_weights()[0]  # Shape: (13, 128)

# Calculate feature importance as the sum of absolute weights
feature_importance = np.sum(np.abs(first_layer_weights), axis=1)

# Create DataFrame for easier visualization
importance_df = pd.DataFrame({
    'feature': wine.feature_names,
    'importance': feature_importance
}).sort_values('importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(12, 8))
plt.barh(range(len(importance_df)), importance_df['importance'])
plt.yticks(range(len(importance_df)), importance_df['feature'])
plt.xlabel('Feature Importance (Sum of Absolute Weights)')
plt.title('Feature Importance Analysis')
plt.gca().invert_yaxis()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Top 5 Most Important Features:")
for i, (_, row) in enumerate(importance_df.head().iterrows()):
    print(f"{i+1}. {row['feature']}: {row['importance']:.3f}")

print("\nLeast Important Features:")
for i, (_, row) in enumerate(importance_df.tail(3).iterrows()):
    print(f"{row['feature']}: {row['importance']:.3f}")

## Key Takeaways

🎯 **Main Learning Points:**

1. **Classification vs Regression**: Different output layers (softmax vs linear), loss functions (categorical crossentropy vs MSE), and metrics (accuracy vs MAE)

2. **Data preparation is crucial**: Stratified splits maintain class distribution, normalization helps convergence, one-hot encoding enables multiclass classification

3. **Evaluation is multifaceted**: Accuracy, precision, recall, F1-score, and confusion matrices provide different insights

4. **Regularization prevents overfitting**: Dropout, batch normalization, and early stopping improve generalization

5. **Visualization aids understanding**: Training curves, confusion matrices, and confidence distributions reveal model behavior

6. **Class imbalance matters**: Class weights help handle uneven distributions

7. **Model comparison guides improvement**: Systematic comparison helps identify better architectures and hyperparameters

**🎉 Classification exercise completed successfully!** You now have a solid foundation in neural network classification using TensorFlow and Keras.

**Next Steps:**
- Try different datasets (Iris, Digits, etc.)
- Experiment with different architectures
- Explore advanced techniques (ensemble methods, transfer learning)
- Practice with real-world datasets