### <font style="color:blue">Project 2: Kaggle Competition - Classification</font>

#### Maximum Points: 100

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Section</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>Data Loader</h3></td> <td><h3>10</h3></td> </tr>
        <tr><td><h3>2</h3></td> <td><h3>Configuration</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>3</h3></td> <td><h3>Evaluation Metric</h3></td> <td><h3>10</h3></td> </tr>
        <tr><td><h3>4</h3></td> <td><h3>Train and Validation</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>5</h3></td> <td><h3>Model</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>6</h3></td> <td><h3>Utils</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>7</h3></td> <td><h3>Experiment</h3></td><td><h3>5</h3></td> </tr>
        <tr><td><h3>8</h3></td> <td><h3>TensorBoard Dev Scalars Log Link</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>9</h3></td> <td><h3>Kaggle Profile Link</h3></td> <td><h3>50</h3></td> </tr>
    </table>
</div>


## <font style="color:green">0.1 Clone git repository to Kaggle workspace</font>

In [None]:
g_inference_only = False

#delete directory if exists
import shutil
import os
if os.path.exists('prj-2-opencv-dl-pytorch'):
    shutil.rmtree('prj-2-opencv-dl-pytorch')
# Clone repository from GitHub
!git clone https://github.com/ramabyg/prj-2-opencv-dl-pytorch.git

# Add to Python path
import sys
sys.path.insert(0, '/kaggle/working/prj-2-opencv-dl-pytorch')

print("[OK] Repository cloned and added to path")



## <font style="color:green">0.2 Import modules </font>

In [None]:
from src.config import get_config
from src.datamodule import KenyanFood13DataModule
from src.model import KenyanFood13Classifier
from src.trainer import train_model
from src.utils import calculate_dataset_mean_std

print("[OK] All modules imported successfully")

## <font style="color:green">1. Data Loader [10 Points]</font>

In this section, you have to write a class or methods, which will be used to get training and validation data loader.

You need to write a custom dataset class to load data.

**Note; There is   no separate validation data. , You will thus have to create your own validation set, by dividing the train data into train and validation data. Usually, we do 80:20 ratio for train and validation, respectively.**


For example:

```python
class KenyanFood13Dataset(Dataset):
    """
    
    """
    
    def __init__(self, *args):
    ....
    ...
    
    def __getitem__(self, idx):
    ...
    ...
    

```


```python
def get_data(args1, *args):
    ....
    ....
    return train_loader, test_loader
```


**Please refer to src/datamodule.py and src/dataset.py.**

## <font style="color:green">2. Configuration [5 Points]</font>

**Define your configuration here.**

For example:


```python
@dataclass
class TrainingConfiguration:
    '''
    Describes configuration of the training process
    '''
    batch_size: int = 10 
    epochs_count: int = 50  
    init_learning_rate: float = 0.1  # initial learning rate for lr scheduler
    log_interval: int = 5  
    test_interval: int = 1  
    data_root: str = "/kaggle/input/opencv-pytorch-project-2-classification-round-3" 
    num_workers: int = 2  
    device: str = 'cuda'  
    
```

In [None]:
from src.config import get_config
# Get configurations (auto-detects Kaggle environment)
# Phase 1 + Phase 2 (RandAugment) settings
train_config, data_config, system_config = get_config(
    num_epochs=70,           # Recommended for Phase 2 (with RandAugment)
    batch_size=8,           # Reduced for memory efficiency with freeze_pct=0.0
    # All other Phase 1+2 settings are now defaults:
    # - freeze_pct=0.0 (train all layers)
    # - learning_rate=0.0001 (optimal for freeze_pct=0.0)
    # - optimizer="adamw"
    # - scheduler="cosine"
    # - label_smoothing=0.1 (in model)
    # - RandAugment (in datamodule)
    # - input_size=384 (memory-efficient)
    use_scheduler=True,
    scheduler="cosine"
)

# Optional: Customize early stopping
# train_config.early_stop_patience = 10  # More patient
# train_config.use_early_stopping = False  # Disable completely

print(f"Training for {train_config.num_epochs} epochs")
print(f"Model: {train_config.model_name}")
print(f"Learning rate: {train_config.learning_rate}")
print(f"Batch size: {train_config.batch_size}")
print(f"Freeze percentage: {train_config.freeze_pct} (0.0 = all layers trainable)")
print(f"Optimizer: {train_config.optimizer}")
print(f"Scheduler: {train_config.scheduler if train_config.use_scheduler else 'none'}")
print(f"Early stopping: patience={train_config.early_stop_patience}")
print(f"Device: {system_config.device}")
print(f"Output directory: {system_config.output_dir}")
print(f"Batch Size: {train_config.batch_size}")


# Option 1: Use model-specific preprocessing (RECOMMENDED for pre-trained models)
# This uses the exact same preprocessing (mean, std, resolution) as the original pre-training
print(f"Using model-specific preprocessing for: {train_config.model_name}")

# Option 2 (Alternative): Calculate from dataset (more accurate for custom stats)
# mean, std = calculate_dataset_mean_std(
#     annotations_file=data_config.annotations_file,
#     img_dir=data_config.img_dir,
#     sample_size=None  # Use more samples on Kaggle, None means use all images
# )

# Option 3 (Alternative): Use pre-computed values (fastest)
# mean = [0.5672, 0.4663, 0.3659]
# std = [0.2484, 0.2561, 0.2600]


Create Data Modules

In [None]:
# Create data module with model-specific preprocessing
data_module = KenyanFood13DataModule(
    data_config=data_config,
    model_name=train_config.model_name
)
data_module.setup()

print(f"[OK] Data module created with {data_module.num_classes} classes")

# Create model
model = KenyanFood13Classifier(train_config, data_module.num_classes)

print(f"[OK] Model created: {train_config.model_name}")

In [None]:
# Diagnostics: check labels, mapping, a sample batch and a model forward pass
import pandas as pd
import torch
import numpy as np

print("--- Diagnostics Start ---")
# Check CSV columns and a small preview
try:
    df = pd.read_csv(data_config.annotations_file)
    print("CSV columns:", list(df.columns))
    print(df.head())
except Exception as e:
    print("Could not read annotations file:", e)

# Show detected label column and class mapping
label_col = getattr(data_module.train_dataset, 'label_col', None)
print("Detected label column:", label_col)
print("class_to_idx mapping (sample):", dict(list(data_module.train_dataset.class_to_idx.items())[:10]))

# Show label distribution (if available)
try:
    if label_col and label_col in df.columns:
        print('\nLabel distribution (top counts):')
        print(df[label_col].value_counts().head(20))
except Exception:
    pass

# Inspect a single batch from train loader
train_loader = data_module.train_dataloader()
images, labels = next(iter(train_loader))
print('\nTrain batch images shape:', images.shape)
print('Train batch labels shape:', labels.shape)
print('Sample label indices:', labels[:10].tolist())

# Reverse mapping idx -> class name
idx_to_class = {v: k for k, v in data_module.train_dataset.class_to_idx.items()}
print('Sample label names:', [idx_to_class.get(int(x), '?') for x in labels[:10]])

# Basic image stats (after preprocessing)
print('Image min/max:', float(images.min()), float(images.max()))

# Quick forward pass through model to inspect outputs
device = torch.device('cuda' if torch.cuda.is_available() and system_config.device == 'cuda' else 'cpu')
print('Using device for diagnostics:', device)
model = model.to(device)
images = images.to(device)
with torch.no_grad():
    logits = model(images)
    probs = torch.softmax(logits, dim=1)
    top1 = probs.argmax(dim=1).cpu().numpy().tolist()
    top_conf_vals, _ = probs.max(dim=1)
    top_conf = top_conf_vals.cpu().numpy().tolist()
    # top_conf = probs.max(dim=1).cpu().numpy().tolist()

print('\nModel predictions (top1 indices):', top1)
print('Model top confidences:', [round(float(x), 4) for x in top_conf])

print('\nData module mean/std:', data_module.mean, data_module.std)
print('--- Diagnostics End ---')

## <font style="color:green">3. Evaluation Metric [10 Points]</font>

**Define methods or classes that will be used in model evaluation. For example, accuracy, f1-score etc.**

In [None]:
# Updated in trainer.py to include more metrics
#we will have methods to calculate accuracy, f1-score, precision, recall.


## <font style="color:green">4. Train and Validation [5 Points]</font>


**Write the methods or classes to be used for training and validation.**

In [None]:
if not g_inference_only:
    # Train the model
    trained_model, _, checkpoint_callback = train_model(
        training_config=train_config,
        data_config=data_config,
        system_config=system_config,
        model=model,
        data_module=data_module
    )

    print(f"\n[OK] Training complete!")
    print(f"Best model: {checkpoint_callback.best_model_path}")
    print(f"Best accuracy: {checkpoint_callback.best_model_score:.4f}")

## <font style="color:green">5. Model [5 Points]</font>

**Define your model in this section.**

**You are allowed to use any pre-trained model.**

**Ans:  Using EfficientNet V2 S, please refer to ./src/model.py**

## <font style="color:green">6. Utils [5 Points]</font>

**Define those methods or classes, which have  not been covered in the above sections.**

**Save outputs to Kaggle output folder in a zip for easy download**

In [None]:
if not g_inference_only:
    import json
    import shutil
    from pathlib import Path

    # Create a clean output directory for the dataset
    dataset_dir = Path("/kaggle/working/kenyan_food_model_output")
    if dataset_dir.exists():
        shutil.rmtree(dataset_dir)
    dataset_dir.mkdir(exist_ok=True)

    # Copy the best checkpoint
    best_checkpoint = Path(checkpoint_callback.best_model_path)
    if best_checkpoint.exists():
        shutil.copy(best_checkpoint, dataset_dir / "best_model.ckpt")
        print(f"✓ Saved best checkpoint: {best_checkpoint.name}")

    # Save training summary as JSON
    summary = {
        "best_val_accuracy": float(checkpoint_callback.best_model_score),
        "num_epochs": train_config.num_epochs,
        "batch_size": train_config.batch_size,
        "learning_rate": train_config.learning_rate,
        "model": train_config.model_name,
        "optimizer": train_config.optimizer,
        "scheduler": train_config.scheduler if train_config.use_scheduler else "none",
        "dataset_mean": data_module.mean,
        "dataset_std": data_module.std,
        "num_classes": data_module.num_classes,
        "checkpoint_path": str(best_checkpoint.name)
    }

    with open(dataset_dir / "training_summary.json", "w") as f:
        json.dump(summary, f, indent=2)
    print(f"✓ Saved training summary")

    # Copy ALL TensorBoard logs (full logs for offline review)
    tb_logs_src = Path(system_config.output_dir) / "kenyan_food_logs"
    if tb_logs_src.exists():
        tb_logs_dest = dataset_dir / "tensorboard_logs"

        print(f"Copying TensorBoard logs from {tb_logs_src}...")
        shutil.copytree(tb_logs_src, tb_logs_dest, dirs_exist_ok=True)

        # Count total size of logs for user info
        total_size_bytes = sum(f.stat().st_size for f in tb_logs_dest.rglob('*') if f.is_file())
        total_size_mb = total_size_bytes / (1024 * 1024)
        print(f"✓ Saved TensorBoard logs ({total_size_mb:.1f} MB)")
    else:
        print(f"[WARN] TensorBoard logs not found at {tb_logs_src}")

    # Create a ZIP file for easy download
    print("\nCreating ZIP archive for download...")
    zip_path = Path("/kaggle/working/kenyan_food_model_output")
    shutil.make_archive(str(zip_path), 'zip', dataset_dir)
    zip_size_mb = Path(f"{zip_path}.zip").stat().st_size / (1024 * 1024)
    print(f"✓ Created kenyan_food_model_output.zip ({zip_size_mb:.1f} MB)")

    print("\n" + "="*60)
    print("[OK] OUTPUT READY FOR DOWNLOAD!")
    print("="*60)
    print("\nFiles saved to: /kaggle/working/kenyan_food_model_output/")
    print("ZIP file: /kaggle/working/kenyan_food_model_output.zip")
    print("\nContents:")
    print("  - best_model.ckpt           : Best model checkpoint")
    print("  - training_summary.json     : Training metrics and config")
    print("  - tensorboard_logs/         : Full TensorBoard event files")
    print("\nTo download:")
    print("1. Click 'Output' tab in the right sidebar")
    print("2. Find 'kenyan_food_model_output.zip'")
    print("3. Click the download button")
    print("\nOr use Kaggle API to create a dataset for reuse in other notebooks!")
    print("="*60)

## Step 6.a: Generate Test Predictions

After training completes, generate predictions on test data and create submission.csv

In [None]:
from src.inference import create_submission
if g_inference_only:
    checkpoint_path = "/kaggle/working/kenyan_food_model_output/best_model.ckpt"
else:
    checkpoint_path = checkpoint_callback.best_model_path

# Generate predictions and create submission.csv
# Automatically detects Kaggle environment and uses correct paths
submission_df = create_submission(
    checkpoint_path=checkpoint_path,
    output_csv_path="/kaggle/working/submission.csv",
    model_config=train_config,
    batch_size=64
)

print(f"[OK] Submission created: /kaggle/working/submission.csv")
print(f"[INFO] Total predictions: {len(submission_df)}")
print(f"\n[INFO] Prediction distribution:")
print(submission_df['label'].value_counts())
print("\n[OK] Ready to download and submit to Kaggle!")

## <font style="color:green">7. Experiment [5 Points]</font>

**Choose your optimizer and LR-scheduler and use the above methods and classes to train your model.**

**Updated in src/trainer.py module.**

## <font style="color:green">8. TensorBoard Log Link [5 Points]</font>

**Share your TensorBoard scalars logs link here You can also share (not mandatory) your GitHub link, if you have pushed this project in GitHub.**


Note: In light of the recent shutdown of tensorboard.dev, we have updated the submission requirements for your project. Instead of sharing a tensorboard.dev link, you are now required to upload your generated TensorBoard event files directly onto the lab. As an alternative, you may also include a screenshot of your TensorBoard output within your Jupyter notebook. This adjustment ensures that your data visualization and model training efforts are thoroughly documented and accessible for evaluation.

You are also welcome (and encouraged) to utilize alternative logging services like wandB or comet. In such instances, you can easily make your project logs publicly accessible and share the link with others.

**Will upload the logs along with this code**

## <font style="color:green">9. Kaggle Profile Link [50 Points]</font>

**Share your Kaggle profile link  with us here to score , points in  the competition.**

**For full points, you need a minimum accuracy of `75%` on the test data. If accuracy is less than `70%`, you gain  no points for this section.**


**Submit `submission.csv` (prediction for images in `test.csv`), in the `Submit Predictions` tab in Kaggle, to get evaluated for  this section.**

[Rama Kaggle Profile](https://www.kaggle.com/ramabyg)