# 1. Introduction & Objectives

**Purpose**:

Investigate the impact of data augmentation on image classification.
Compare results with and without augmentation across different models.
Improve model generalization and performance.
Selected Models for this Experiment:  

✅ InceptionResNetV2   
✅ DenseNet121  
✅ Xception  
✅ MobileNetV2  

# 2. Import Required Libraries & Configuration

In [7]:
# Standard libraries
import os
import sys
import importlib
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
# TensorFlow / Keras
import tensorflow as tf
# Sklearn metrics
from sklearn.metrics import classification_report, confusion_matrix


#----------------------Setting Up Project Paths and Configurations---------------------------------------#

# Get the current notebook directory
CURRENT_DIR = Path(os.getcwd()).resolve()
# Automatically find the project root (go up 2 level)
PROJECT_ROOT = CURRENT_DIR.parents[2]
# Add project root to sys.path
sys.path.append(str(PROJECT_ROOT))

# Function to get relative paths from project root
def get_relative_path(absolute_path):
    return str(Path(absolute_path).relative_to(PROJECT_ROOT))

# Print project root directory
print(f"Project Root Directory: {PROJECT_ROOT.name}")  # Display only the root folder name

# Import project configuration
import config  # Now Python can find config.py
importlib.reload(config)  # Reload config to ensure any updates are applied
#-------------------------------------------------------------------------------------------------#

# Define dataset paths using config.py
train_pickle_path = Path(config.XTRAIN_FINAL_ENCODED_PATH)  # Metadata dataframe
images_dir_train_path = Path(config.RAW_IMAGE_TRAIN_DIR)   # Image directory

# Display project root path
print(f"\nMetadata dataframe Directory: {train_pickle_path}")
print(f"\nImage directory: {train_pickle_path}")

Project Root Directory: Data_Scientist_Rakuten_Project-main

Metadata dataframe Directory: D:\Data_Science\Append_Data_Engineer_AWS_MLOPS\Data_Scientist_Rakuten_Project-main\data\processed\X_train_final_encoded.pkl

Image directory: D:\Data_Science\Append_Data_Engineer_AWS_MLOPS\Data_Scientist_Rakuten_Project-main\data\processed\X_train_final_encoded.pkl


# 3. Data Preprocessing with Augmentation

## 3.1 Load Image Data

In [10]:
import importlib
import config
import src.data_preprocessing.data_loader # Initial import
importlib.reload(config)  # Force reload the module
importlib.reload(src.data_preprocessing.data_loader)  # Force reload the module
from src.data_preprocessing.data_loader import load_image_generators


# Load training and validation data with augmentation enabled


# Define parameters
IMG_SIZE = (224, 224)  
BATCH_SIZE = 32

# With  augmentation (par default)
train_generator, valid_generator = load_image_generators(
    X_train_im, X_test_im, config.RAW_IMAGE_TRAIN_DIR, IMG_SIZE, BATCH_SIZE
)




NameError: name 'X_train_im' is not defined

## 3.2  Apply Data Augmentation

In [None]:
from src.data_preprocessing.preprocessing import apply_augmentation

# Apply transformations like flipping, rotation, zoom, etc.
augmented_train_generator = apply_augmentation(train_generator)


## 3.3 Display Sample Augmented Images

In [None]:
from src.data_preprocessing.preprocessing import show_augmented_images

# Display some randomly augmented images
show_augmented_images(augmented_train_generator)


# 4. Model Training with Augmented Data

## 4.1 Define Training Hyperparameters

In [None]:
EPOCHS = 20
BATCH_SIZE = 32
LEARNING_RATE = 0.001

## 4.2 Load & Compile Models

In [None]:
from src.modeling_image.model_builder import build_model

# Initialize models
models = {
    "InceptionResNetV2": build_model("InceptionResNetV2", (299, 299, 3), 27),
    "DenseNet121": build_model("DenseNet121", (224, 224, 3), 27),
    "Xception": build_model("Xception", (299, 299, 3), 27),
    "MobileNetV2": build_model("MobileNetV2", (224, 224, 3), 27)
}


## 4.3 Train Models Using Augmented Data

In [None]:
from src.modeling_image.training import train_model

# Train each model and store history
history_dict = {}
for name, model in models.items():
    print(f" Training {name}...")
    history_dict[name] = train_model(model, augmented_train_generator, val_generator, EPOCHS, BATCH_SIZE)


# 5. Model Evaluation

## 5.1 Learning Curves

In [None]:
from src.modeling_image.evaluation import plot_learning_curves

for name, history in history_dict.items():
    plot_learning_curves(history, title=f"Learning Curve - {name} (Augmented Data)")


## 5.2 Compute Classification Metrics

In [None]:
from src.modeling_image.evaluation import evaluate_model

# Evaluate each model
for name, model in models.items():
    print(f"Evaluating {name}...")
    evaluate_model(model, val_generator)


## 5.3 Compute Confusion Matrix

In [None]:
from src.modeling_image.evaluation import plot_confusion_matrix

# Generate confusion matrices
for name, model in models.items():
    print(f"Confusion Matrix - {name}")
    plot_confusion_matrix(model, val_generator)


# 6. Saving the Trained Model

# 7. Customized Model Performance Analysis

# 8. Results & Insights

🔹 Key Observations
Which augmentation techniques improved model performance?
Are some models more sensitive to augmentation?
What is the performance difference between augmented vs. non-augmented models?
Best Performing Models (Based on F1-Score & Accuracy):

Model	Accuracy	F1 Score (Weighted)
InceptionResNetV2	??	??
DenseNet121	??	??
Xception	??	??
MobileNetV2	??	??


# 9. Next Steps
🔜 Transition to: 18_Image_Fine_Tuning.ipynb

Unfreeze top layers for fine-tuning.
Adjust learning rates for better convergence.
Test advanced optimizations (batch normalization, dropout tuning...)