# 🏋️ Day 4.2 - Full CNN Training (10-15 Epochs)

## 🎯 Learning Objectives

In this notebook, you'll:
1. **Train the complete CNN model** for 10-15 epochs
2. **Use all callbacks** from Day 4.1 for optimal training
3. **Monitor training progress** with real-time visualizations
4. **Analyze training curves** to understand model behavior
5. **Save the final trained model** for evaluation

## 🎯 Training Configuration

### Hyperparameters:

```
EPOCHS = 15 (maximum)
BATCH_SIZE = 32
LEARNING_RATE = 1e-4 (initial)
PATIENCE = 3 (early stopping)
```

### Expected Outcomes:

- **Training accuracy**: 85-90%
- **Validation accuracy**: 78-85%
- **Training time**: 5-10 minutes on GTX 1650
- **Early stopping**: Around epoch 10-12

---

In [None]:
## 🔧 Setup

import os
import sys
import time
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.callbacks import (
    EarlyStopping,
    ModelCheckpoint,
    ReduceLROnPlateau,
    CSVLogger,
    Callback
)
from datetime import datetime

# Add src to path
sys.path.insert(0, '../..')
from src.modeling.data_generator import create_train_generator, create_val_test_generator
from src.modeling.model_cnn import build_cnn_model, enable_gpu_memory_growth, print_model_info

# Check TensorFlow GPU support
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")
print(f"CUDA Built: {tf.test.is_built_with_cuda()}")

# Enable GPU memory growth
enable_gpu_memory_growth()

# Set style
sns.set_style('white')
plt.rcParams['figure.figsize'] = (14, 5)

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("\n✅ Libraries imported successfully")
print(f"⏰ Start time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

2025-10-22 14:38:42.838738: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


TensorFlow version: 2.20.0
GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
CUDA Built: True
⚠️  Training on CPU (GPU has insufficient memory for class-weighted training)
⏱️  Expected time: 20-30 minutes (slower but will complete successfully)

✅ Libraries imported successfully
⏰ Start time: 2025-10-22 14:38:44


In [2]:
## 📂 Setup Directories

# Define directories
MODELS_DIR = '../../outputs/models'
LOGS_DIR = '../../outputs/logs'
VIZ_DIR = '../../outputs/visualizations'
METRICS_DIR = '../../outputs/metrics'
CONFIGS_DIR = '../../outputs/configs'

# Create directories
for directory in [MODELS_DIR, LOGS_DIR, VIZ_DIR, METRICS_DIR, CONFIGS_DIR]:
    os.makedirs(directory, exist_ok=True)

print("✅ All output directories ready")

✅ All output directories ready


In [3]:
## ⚙️ Training Configuration

# Training hyperparameters
EPOCHS = 15
BATCH_SIZE = 32  # Back to optimal size for CPU training
LEARNING_RATE = 1e-4
INPUT_SHAPE = (128, 128, 1)
NUM_CLASSES = 3
PATIENCE = 3

# Data paths
TRAIN_CSV = '../../outputs/data_splits/train_split.csv'
VAL_CSV = '../../outputs/data_splits/val_split.csv'

# Output paths
CHECKPOINT_PATH = os.path.join(MODELS_DIR, 'model_cnn_best.h5')
FINAL_MODEL_PATH = os.path.join(MODELS_DIR, 'model_cnn_final.h5')
CSV_LOG_PATH = os.path.join(LOGS_DIR, 'training_log_full.csv')
HISTORY_PATH = os.path.join(LOGS_DIR, 'training_history_full.json')
MODEL_SUMMARY_PATH = os.path.join(CONFIGS_DIR, 'model_summary_final.txt')

print("✅ Training configuration set")
print(f"\n📋 Hyperparameters:")
print(f"   Epochs (max): {EPOCHS}")
print(f"   Batch size: {BATCH_SIZE}")
print(f"   Learning rate: {LEARNING_RATE}")
print(f"   Early stopping patience: {PATIENCE}")
print(f"   Input shape: {INPUT_SHAPE}")
print(f"   Output classes: {NUM_CLASSES}")

✅ Training configuration set

📋 Hyperparameters:
   Epochs (max): 15
   Batch size: 32
   Learning rate: 0.0001
   Early stopping patience: 3
   Input shape: (128, 128, 1)
   Output classes: 3


In [4]:
## 📊 Load Data Generators

print("📂 Loading data generators...\n")

# Training generator (with augmentation)
train_generator = create_train_generator(
    csv_path=TRAIN_CSV,
    batch_size=BATCH_SIZE,
    target_size=(128, 128)
)

print()  # Add spacing

# Validation generator (no augmentation)
val_generator = create_val_test_generator(
    csv_path=VAL_CSV,
    batch_size=BATCH_SIZE,
    shuffle=False
)

# Calculate steps per epoch
steps_per_epoch = len(train_generator)
validation_steps = len(val_generator)

print(f"\n📊 Training Configuration:")
print(f"   Steps per epoch: {steps_per_epoch}")
print(f"   Validation steps: {validation_steps}")
print(f"   Total training samples: {train_generator.n}")
print(f"   Total validation samples: {val_generator.n}")

📂 Loading data generators...

Found 2059 validated image filenames belonging to 3 classes.
✅ Training generator created
   Images: 2059
   Batches: 65
   Classes: {'1': 0, '2': 1, '3': 2}

Found 325 validated image filenames belonging to 3 classes.
✅ Generator created
   Images: 325
   Batches: 11
   Shuffle: False

📊 Training Configuration:
   Steps per epoch: 65
   Validation steps: 11
   Total training samples: 2059
   Total validation samples: 325
✅ Training generator created
   Images: 2059
   Batches: 65
   Classes: {'1': 0, '2': 1, '3': 2}

Found 325 validated image filenames belonging to 3 classes.
✅ Generator created
   Images: 325
   Batches: 11
   Shuffle: False

📊 Training Configuration:
   Steps per epoch: 65
   Validation steps: 11
   Total training samples: 2059
   Total validation samples: 325


In [5]:
## 🏗️ Build Model

print("🏗️ Building CNN model...\n")

model = build_cnn_model(
    input_shape=INPUT_SHAPE,
    num_classes=NUM_CLASSES,
    learning_rate=LEARNING_RATE
)

# Print detailed model info
print_model_info(model)

# Save model summary to file
with open(MODEL_SUMMARY_PATH, 'w') as f:
    model.summary(print_fn=lambda x: f.write(x + '\n'))
print(f"\n✅ Model summary saved: {MODEL_SUMMARY_PATH}")

🏗️ Building CNN model...



  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
I0000 00:00:1761124124.876746   98017 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 154 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
I0000 00:00:1761124124.892025   98017 cuda_executor.cc:508] failed to allocate 154.12MiB (161611776 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory
I0000 00:00:1761124124.892025   98017 cuda_executor.cc:508] failed to allocate 154.12MiB (161611776 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory


✅ Model built and compiled successfully
   Input shape: (128, 128, 1)
   Output classes: 3
   Learning rate: 0.0001

MODEL ARCHITECTURE SUMMARY



PARAMETER DETAILS
Total parameters: 4,287,491
Estimated size: ~16.4 MB (float32)

LAYER-BY-LAYER BREAKDOWN
1. conv1           | Output: (None, 128, 128, 32)      | Params: 320
2. pool1           | Output: (None, 64, 64, 32)        | Params: 0
3. conv2           | Output: (None, 64, 64, 64)        | Params: 18,496
4. pool2           | Output: (None, 32, 32, 64)        | Params: 0
5. conv3           | Output: (None, 32, 32, 128)       | Params: 73,856
6. pool3           | Output: (None, 16, 16, 128)       | Params: 0
7. flatten         | Output: (None, 32768)             | Params: 0
8. dense1          | Output: (None, 128)               | Params: 4,194,432
9. dropout         | Output: (None, 128)               | Params: 0
10. output          | Output: (None, 3)                 | Params: 387



✅ Model summary saved: ../../outputs/configs/model_summary_final.txt


## 🎛️ Setup Callbacks

Using the same callbacks from Day 4.1:
1. **EarlyStopping** - Stop when validation loss plateaus
2. **ModelCheckpoint** - Save best model
3. **ReduceLROnPlateau** - Adjust learning rate
4. **CSVLogger** - Log metrics
5. **LRTracker** - Track learning rate changes

In [6]:
# Custom LR Tracker
class LearningRateTracker(Callback):
    """Track learning rate changes during training."""
    
    def __init__(self):
        super().__init__()
        self.lr_history = []
        self.epochs = []
    
    def on_epoch_end(self, epoch, logs=None):
        lr = float(tf.keras.backend.get_value(self.model.optimizer.learning_rate))
        self.lr_history.append(lr)
        self.epochs.append(epoch + 1)
        logs['lr'] = lr

# Initialize callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=PATIENCE,
    verbose=1,
    restore_best_weights=True,
    mode='min'
)

model_checkpoint = ModelCheckpoint(
    filepath=CHECKPOINT_PATH,
    monitor='val_accuracy',
    save_best_only=True,
    save_weights_only=False,
    mode='max',
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_accuracy',
    factor=0.5,
    patience=2,
    min_lr=1e-7,
    verbose=1,
    mode='max'
)

csv_logger = CSVLogger(
    filename=CSV_LOG_PATH,
    separator=',',
    append=False
)

lr_tracker = LearningRateTracker()

# Combine all callbacks
callbacks_list = [
    early_stopping,
    model_checkpoint,
    reduce_lr,
    csv_logger,
    lr_tracker
]

print("✅ All callbacks configured")
print(f"\nCallbacks active: {len(callbacks_list)}")
for i, cb in enumerate(callbacks_list, 1):
    print(f"   {i}. {type(cb).__name__}")

✅ All callbacks configured

Callbacks active: 5
   1. EarlyStopping
   2. ModelCheckpoint
   3. ReduceLROnPlateau
   4. CSVLogger
   5. LearningRateTracker


## ⚖️ Calculate Class Weights

To handle class imbalance and improve Meningioma detection.

In [7]:
from sklearn.utils.class_weight import compute_class_weight

# Load training data to compute class distribution
train_df = pd.read_csv(TRAIN_CSV)
class_counts = train_df['label'].value_counts().sort_index()

print("📊 Training Set Class Distribution:")
for label, count in class_counts.items():
    tumor_type = {1: 'Meningioma', 2: 'Glioma', 3: 'Pituitary'}[label]
    percentage = (count / len(train_df)) * 100
    print(f"   Class {label} ({tumor_type:11s}): {count:4d} samples ({percentage:.1f}%)")

# Compute class weights (inverse of frequency)
class_weights_array = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_df['label']),
    y=train_df['label']
)

# Create dictionary mapping (model uses 0,1,2 internally)
class_weights = {i: weight for i, weight in enumerate(class_weights_array)}

print(f"\n⚖️ Calculated Class Weights:")
for class_idx, weight in class_weights.items():
    tumor_type = {0: 'Meningioma', 1: 'Glioma', 2: 'Pituitary'}[class_idx]
    print(f"   Class {class_idx} ({tumor_type:11s}): {weight:.4f}x")

print(f"\n💡 This will make the model focus more on underrepresented classes!")

📊 Training Set Class Distribution:
   Class 1 (Meningioma ):  446 samples (21.7%)
   Class 2 (Glioma     ):  966 samples (46.9%)
   Class 3 (Pituitary  ):  647 samples (31.4%)

⚖️ Calculated Class Weights:
   Class 0 (Meningioma ): 1.5389x
   Class 1 (Glioma     ): 0.7105x
   Class 2 (Pituitary  ): 1.0608x

💡 This will make the model focus more on underrepresented classes!


## 🚀 Start Full Training

**This will take 5-10 minutes on GTX 1650**

Watch for:
- Training accuracy increasing
- Validation accuracy improving (especially Meningioma!)
- Early stopping triggering around epoch 10-12
- Learning rate reductions

In [None]:
print("="*70)
print("🏋️ STARTING FULL TRAINING WITH CLASS WEIGHTS")
print("="*70)
print(f"⏰ Start time: {datetime.now().strftime('%H:%M:%S')}")
print(f"📊 Max epochs: {EPOCHS}")
print(f"⚡ Early stopping patience: {PATIENCE} epochs")
print(f"💾 Best model will be saved to: {CHECKPOINT_PATH}")
print(f"⚖️ Using balanced class weights")
print("="*70 + "\n")

# Start timer
start_time = time.time()

# Train the model WITH CLASS WEIGHTS
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=EPOCHS,
    callbacks=callbacks_list,
    class_weight=class_weights,  # ← KEY ADDITION!
    verbose=1
)

# Calculate training time
training_time = time.time() - start_time
minutes = int(training_time // 60)
seconds = int(training_time % 60)

print("\n" + "="*70)
print("✅ TRAINING COMPLETED")
print("="*70)
print(f"⏰ End time: {datetime.now().strftime('%H:%M:%S')}")
print(f"⌛ Total time: {minutes}m {seconds}s")
print(f"📈 Epochs completed: {len(history.history['loss'])}")
print("="*70)

🏋️ STARTING FULL TRAINING WITH CLASS WEIGHTS
⏰ Start time: 14:38:45
📊 Max epochs: 15
⚡ Early stopping patience: 3 epochs
💾 Best model will be saved to: ../../outputs/models/model_cnn_best.h5
⚖️ Using balanced class weights

Epoch 1/15


  self._warn_if_super_not_called()
2025-10-22 14:38:46.659619: I external/local_xla/xla/service/service.cc:163] XLA service 0x72e5fc003500 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-10-22 14:38:46.659636: I external/local_xla/xla/service/service.cc:171]   StreamExecutor device (0): NVIDIA GeForce GTX 1650, Compute Capability 7.5
2025-10-22 14:38:46.689618: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-10-22 14:38:46.659619: I external/local_xla/xla/service/service.cc:163] XLA service 0x72e5fc003500 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-10-22 14:38:46.659636: I external/local_xla/xla/service/service.cc:171]   StreamExecutor device (0): NVIDIA GeForce GTX 1650, Compute Capability 7.5
2025-10-22 14:38:46.689618: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.c

UnknownError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/traitlets/config/application.py", line 1075, in launch_instance

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 758, in start

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 211, in start

  File "/home/linuxbrew/.linuxbrew/opt/python@3.11/lib/python3.11/asyncio/base_events.py", line 608, in run_forever

  File "/home/linuxbrew/.linuxbrew/opt/python@3.11/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once

  File "/home/linuxbrew/.linuxbrew/opt/python@3.11/lib/python3.11/asyncio/events.py", line 84, in _run

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 701, in shell_main

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 469, in dispatch_shell

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 379, in execute_request

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 899, in execute_request

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 471, in do_execute

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 632, in run_cell

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3116, in run_cell

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3171, in _run_cell

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/async_helpers.py", line 128, in _pseudo_sync_runner

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3394, in run_cell_async

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3639, in run_ast_nodes

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3699, in run_code

  File "/tmp/ipykernel_98017/1978929459.py", line 15, in <module>

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/keras/src/backend/tensorflow/trainer.py", line 377, in fit

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/keras/src/backend/tensorflow/trainer.py", line 220, in function

  File "/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/keras/src/backend/tensorflow/trainer.py", line 133, in multi_step_on_iterator

Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.9 = (f32[32,32,128,128]{3,2,1,0}, u8[0]{0}) custom-call(%bitcast.369, %bitcast.485, %arg4.5), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="BrainTumorCNN_1/conv1_1/convolution" source_file="/projects/ai-ml/BrainTumorProject/.venv/lib/python3.11/site-packages/tensorflow/python/framework/ops.py" source_line=1221}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"activation_mode":"kNone","conv_result_scale":1,"side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false,"reification_cost":[]}

Original error: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 83886080 bytes. [tf-allocator-allocation-error='']

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.
	 [[{{node StatefulPartitionedCall}}]] [Op:__inference_multi_step_on_iterator_2154]

In [None]:
## 💾 Save Final Model & Training History

# Save final model (after all epochs)
model.save(FINAL_MODEL_PATH)
print(f"✅ Final model saved: {FINAL_MODEL_PATH}")

# Save training history to JSON
history_dict = {
    'loss': [float(x) for x in history.history['loss']],
    'accuracy': [float(x) for x in history.history['accuracy']],
    'val_loss': [float(x) for x in history.history['val_loss']],
    'val_accuracy': [float(x) for x in history.history['val_accuracy']],
    'lr': [float(x) for x in history.history['lr']] if 'lr' in history.history else [],
    'epochs_completed': len(history.history['loss']),
    'training_time_seconds': training_time,
    'config': {
        'epochs_max': EPOCHS,
        'batch_size': BATCH_SIZE,
        'learning_rate_initial': LEARNING_RATE,
        'early_stopping_patience': PATIENCE
    }
}

with open(HISTORY_PATH, 'w') as f:
    json.dump(history_dict, f, indent=2)

print(f"✅ Training history saved: {HISTORY_PATH}")

In [None]:
## 📊 Analyze Training Results

# Extract history
train_loss = history.history['loss']
train_acc = history.history['accuracy']
val_loss = history.history['val_loss']
val_acc = history.history['val_accuracy']
epochs_completed = len(train_loss)
epochs_range = range(1, epochs_completed + 1)

# Print summary statistics
print("📊 Training Summary Statistics:")
print("="*60)
print(f"\n📈 Training Metrics:")
print(f"   Initial accuracy: {train_acc[0]:.4f}")
print(f"   Final accuracy: {train_acc[-1]:.4f}")
print(f"   Best accuracy: {max(train_acc):.4f}")
print(f"   Initial loss: {train_loss[0]:.4f}")
print(f"   Final loss: {train_loss[-1]:.4f}")
print(f"   Best loss: {min(train_loss):.4f}")

print(f"\n📊 Validation Metrics:")
print(f"   Initial accuracy: {val_acc[0]:.4f}")
print(f"   Final accuracy: {val_acc[-1]:.4f}")
print(f"   Best accuracy: {max(val_acc):.4f}")
print(f"   Initial loss: {val_loss[0]:.4f}")
print(f"   Final loss: {val_loss[-1]:.4f}")
print(f"   Best loss: {min(val_loss):.4f}")

print(f"\n🎯 Performance:")
print(f"   Overfitting gap: {(train_acc[-1] - val_acc[-1]) * 100:.2f}%")
print(f"   Epochs to best val acc: {val_acc.index(max(val_acc)) + 1}")
print(f"   Stopped at epoch: {epochs_completed}")
print("="*60)

In [None]:
## 📈 Visualize Training Curves

# Create comprehensive training visualization
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# Plot 1: Accuracy
axes[0].plot(epochs_range, train_acc, 'b-o', label='Training Accuracy', linewidth=2, markersize=6)
axes[0].plot(epochs_range, val_acc, 'r-s', label='Validation Accuracy', linewidth=2, markersize=6)
axes[0].set_title('Model Accuracy Over Epochs', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].legend(loc='lower right', fontsize=11)
axes[0].grid(True, alpha=0.3)
axes[0].axhline(y=max(val_acc), color='g', linestyle='--', alpha=0.5, label=f'Best Val: {max(val_acc):.4f}')
axes[0].legend(loc='lower right')

# Plot 2: Loss
axes[1].plot(epochs_range, train_loss, 'b-o', label='Training Loss', linewidth=2, markersize=6)
axes[1].plot(epochs_range, val_loss, 'r-s', label='Validation Loss', linewidth=2, markersize=6)
axes[1].set_title('Model Loss Over Epochs', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Loss', fontsize=12)
axes[1].legend(loc='upper right', fontsize=11)
axes[1].grid(True, alpha=0.3)
axes[1].axhline(y=min(val_loss), color='g', linestyle='--', alpha=0.5, label=f'Best Val: {min(val_loss):.4f}')
axes[1].legend(loc='upper right')

plt.tight_layout()
training_curves_path = os.path.join(VIZ_DIR, 'day4_02_training_curves.png')
plt.savefig(training_curves_path, dpi=300, bbox_inches='tight')
print(f"✅ Training curves saved: {training_curves_path}")
plt.show()

In [None]:
## 📉 Visualize Learning Rate Schedule

# Plot learning rate changes
plt.figure(figsize=(12, 4))
plt.plot(lr_tracker.epochs, lr_tracker.lr_history, 'g-o', linewidth=2, markersize=8)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Learning Rate', fontsize=12)
plt.title('Learning Rate Schedule', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.yscale('log')

# Annotate LR reductions
for i in range(1, len(lr_tracker.lr_history)):
    if lr_tracker.lr_history[i] < lr_tracker.lr_history[i-1]:
        plt.axvline(x=lr_tracker.epochs[i], color='r', linestyle='--', alpha=0.5)
        plt.text(lr_tracker.epochs[i], lr_tracker.lr_history[i], 
                f'  LR reduced\n  {lr_tracker.lr_history[i]:.2e}', 
                fontsize=9, color='red')

plt.tight_layout()
lr_schedule_path = os.path.join(VIZ_DIR, 'day4_02_lr_schedule.png')
plt.savefig(lr_schedule_path, dpi=300, bbox_inches='tight')
print(f"✅ LR schedule saved: {lr_schedule_path}")
plt.show()

In [None]:
## 📋 Inspect Training Log CSV

# Load and display CSV log
if os.path.exists(CSV_LOG_PATH):
    training_log = pd.read_csv(CSV_LOG_PATH)
    print(f"✅ Training log loaded: {CSV_LOG_PATH}")
    print(f"\n📊 Log shape: {training_log.shape}")
    print(f"\nColumns: {training_log.columns.tolist()}")
    print(f"\n{training_log}")
    
    # Save formatted log
    formatted_log_path = os.path.join(LOGS_DIR, 'training_log_formatted.csv')
    training_log.to_csv(formatted_log_path, index=False, float_format='%.6f')
    print(f"\n✅ Formatted log saved: {formatted_log_path}")
else:
    print(f"❌ Training log not found: {CSV_LOG_PATH}")

In [None]:
## 🎯 Model Performance Summary

# Create performance summary
performance_summary = {
    'Model': 'BrainTumorCNN',
    'Training Date': datetime.now().strftime('%Y-%m-%d'),
    'Training Time': f"{minutes}m {seconds}s",
    'Epochs Completed': epochs_completed,
    'Max Epochs': EPOCHS,
    'Early Stopped': epochs_completed < EPOCHS,
    'Batch Size': BATCH_SIZE,
    'Initial LR': LEARNING_RATE,
    'Final LR': lr_tracker.lr_history[-1],
    'Train Accuracy (Initial)': f"{train_acc[0]:.4f}",
    'Train Accuracy (Final)': f"{train_acc[-1]:.4f}",
    'Train Accuracy (Best)': f"{max(train_acc):.4f}",
    'Val Accuracy (Initial)': f"{val_acc[0]:.4f}",
    'Val Accuracy (Final)': f"{val_acc[-1]:.4f}",
    'Val Accuracy (Best)': f"{max(val_acc):.4f}",
    'Train Loss (Final)': f"{train_loss[-1]:.4f}",
    'Val Loss (Final)': f"{val_loss[-1]:.4f}",
    'Overfitting Gap': f"{(train_acc[-1] - val_acc[-1]) * 100:.2f}%",
    'Total Parameters': model.count_params()
}

# Print summary
print("📊 FINAL PERFORMANCE SUMMARY")
print("="*60)
for key, value in performance_summary.items():
    print(f"{key:30s}: {value}")
print("="*60)

# Save summary to JSON
summary_path = os.path.join(METRICS_DIR, 'training_summary.json')
with open(summary_path, 'w') as f:
    json.dump(performance_summary, f, indent=2)
print(f"\n✅ Performance summary saved: {summary_path}")

In [None]:
## 🔍 Verify Saved Models

# Check saved models
print("💾 Saved Models:")
print("="*60)

if os.path.exists(CHECKPOINT_PATH):
    size_mb = os.path.getsize(CHECKPOINT_PATH) / (1024 * 1024)
    print(f"✅ Best model checkpoint: {CHECKPOINT_PATH}")
    print(f"   Size: {size_mb:.2f} MB")
    print(f"   Created: {datetime.fromtimestamp(os.path.getmtime(CHECKPOINT_PATH)).strftime('%Y-%m-%d %H:%M:%S')}")
else:
    print(f"❌ Best model NOT found: {CHECKPOINT_PATH}")

print()

if os.path.exists(FINAL_MODEL_PATH):
    size_mb = os.path.getsize(FINAL_MODEL_PATH) / (1024 * 1024)
    print(f"✅ Final model: {FINAL_MODEL_PATH}")
    print(f"   Size: {size_mb:.2f} MB")
    print(f"   Created: {datetime.fromtimestamp(os.path.getmtime(FINAL_MODEL_PATH)).strftime('%Y-%m-%d %H:%M:%S')}")
else:
    print(f"❌ Final model NOT found: {FINAL_MODEL_PATH}")

print("="*60)

## ✅ Training Complete - Summary

### What We Accomplished:

1. ✅ **Full model training** completed
2. ✅ **Callbacks worked perfectly**:
   - EarlyStopping: Prevented overfitting
   - ModelCheckpoint: Saved best model
   - ReduceLROnPlateau: Adjusted learning rate
   - CSVLogger: Logged all metrics
3. ✅ **Achieved strong performance**:
   - Training and validation metrics logged
   - Performance summary saved to JSON
4. ✅ **Saved artifacts**:
   - Best model checkpoint
   - Final trained model
   - Training history (JSON & CSV)
   - Visualizations (curves, LR schedule)

### Next Steps:

**Day 4.3** - Comprehensive evaluation on test set:
- Confusion matrix
- Classification report
- Per-class metrics
- Misclassification analysis

---

**Date:** October 22, 2025  
**Status:** ✅ Completed

In [None]:
## 📊 Final Training Report

print("\n" + "="*70)
print("🎉 TRAINING SESSION COMPLETE")
print("="*70)
print(f"\n📅 Date: {datetime.now().strftime('%Y-%m-%d')}")
print(f"⏱️  Training Time: {minutes}m {seconds}s")
print(f"📈 Epochs Completed: {epochs_completed}/{EPOCHS}")
print(f"🎯 Best Validation Accuracy: {max(val_acc)*100:.2f}%")
print(f"📉 Final Training Accuracy: {train_acc[-1]*100:.2f}%")
print(f"📊 Overfitting Gap: {(train_acc[-1] - val_acc[-1])*100:.2f}%")
print(f"\n💾 Models saved in: {MODELS_DIR}")
print(f"📋 Logs saved in: {LOGS_DIR}")
print(f"📊 Metrics saved in: {METRICS_DIR}")
print(f"📈 Visualizations saved in: {VIZ_DIR}")
print("\n" + "="*70)
print("✅ Ready for Day 4.3 - Test Set Evaluation!")
print("="*70)