## What this notebook does (simple)

1) Read train/test CSVs that list pig images, bounding boxes, and posture labels.

2) Crop each pig from its bounding box.

3) Train an InceptionV3 image classifier on the cropped pigs.

4) Run the trained model on test crops and save `inception_submission.csv`.



Key files:

- `pig_posture_recognition/train.csv`: rows with `row_id`, `image_id`, `width`, `height`, `bbox`, `class_id`.

- `pig_posture_recognition/test.csv`: same but no `class_id`.

- `pig_posture_recognition/pig_posture_classes.txt`: label names.

- Images live in `pig_posture_recognition/train_images` and `pig_posture_recognition/test_images`.



Main hyperparameters (change if needed):

- Image size `IMGSZ = 299`

- Batch size `BATCH = 16`

- Epochs `EPOCHS = 3`

- Learning rate `LR = 2e-4`

- DataLoader workers `num_workers = 2` (set to 0 on Kaggle if you see worker errors)


## Import Libraries

**Core libraries for the pipeline:**

**Data handling:** `pandas` for CSV manipulation, `PIL` for image operations, `json` for parsing bounding boxes

**Deep learning:** `torch` (PyTorch framework), `torchvision` for pretrained models and image transforms

**Utilities:** `pathlib` for cross-platform paths, `sklearn` for grouped data splitting to prevent leakage

In [None]:
from pathlib import Path
import json
import random
from typing import Iterable, Tuple
import numpy as np
import pandas as pd
from PIL import Image
from sklearn.model_selection import GroupShuffleSplit
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
import torchvision.models as models

## Setup Paths, Device, and Seeds

**Reproducibility through seeds:** Neural networks use random initialization and data shuffling. Setting seeds (`SEED=42`) ensures identical results across runs—critical for debugging and comparing experiments.

**Device selection:** PyTorch can run on CPU or GPU. `torch.cuda.is_available()` detects NVIDIA GPUs, enabling 10-100× speedup for training.

**Path resolution:** `Path('.').resolve()` gives absolute paths, making code portable across different execution contexts.

In [2]:
PROJECT_DIR = Path('.').resolve()
DATA_DIR = PROJECT_DIR / 'pig_posture_recognition'
TRAIN_IMAGES = DATA_DIR / 'train_images'
TEST_IMAGES = DATA_DIR / 'test_images'

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

SEED = 42

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

<torch._C.Generator at 0x16087ee02f0>

## Load Metadata: CSVs and Class Names

**Purpose:** Load training/test data and class label mappings for the pig posture classification task.

**Data structure:**
- `train.csv`: ~16k rows with columns: `row_id`, `image_id`, `width`, `height`, `bbox`, `class_id`
- `test.csv`: ~7k rows (same format but no `class_id` — our prediction target)
- `pig_posture_classes.txt`: 5 posture class names, one per line

**Key operations:**

1. **Load CSVs:** `pd.read_csv()` reads training and test metadata into DataFrames
   - Example train row: `train_00000001_0000`, `00000001.jpg`, 1920×1080, `[967.5,331.5,463.0,447.0]`, class_id=3

2. **Parse class names:** List comprehension reads class file and strips whitespace
   - Result: `['Lateral_lying_left', 'Lateral_lying_right', 'Sitting', 'Standing', 'Sternal_lying']`

3. **Create mappings:**
   - `id_to_name`: {0: 'Lateral_lying_left', 1: 'Lateral_lying_right', ...} for converting predictions to readable labels
   - `name_to_id`: Reverse mapping for potential string-to-ID conversions

**Output:** Display dataset sizes and mappings as sanity check

In [3]:
# Load metadata

train_df = pd.read_csv(DATA_DIR / 'train.csv')
test_df = pd.read_csv(DATA_DIR / 'test.csv')
class_names = [c.strip() for c in (DATA_DIR / 'pig_posture_classes.txt').read_text().splitlines() if c.strip()]

id_to_name = {i: name for i, name in enumerate(class_names)}
name_to_id = {v: k for k, v in id_to_name.items()}

len(train_df), len(test_df), class_names, id_to_name, name_to_id

(16062,
 6872,
 ['Lateral_lying_left',
  'Lateral_lying_right',
  'Sitting',
  'Standing',
  'Sternal_lying'],
 {0: 'Lateral_lying_left',
  1: 'Lateral_lying_right',
  2: 'Sitting',
  3: 'Standing',
  4: 'Sternal_lying'},
 {'Lateral_lying_left': 0,
  'Lateral_lying_right': 1,
  'Sitting': 2,
  'Standing': 3,
  'Sternal_lying': 4})

## Bounding Box Conversion: From YOLO Format to PIL Crop Coordinates

**The coordinate system problem:**

Images use pixel coordinates starting from (0, 0) at the **top-left corner**. Our CSV stores bounding boxes in **YOLO format**: `[center_x, center_y, width, height]` — the center point plus dimensions. However, PIL's `crop()` function needs **corner format**: `(x1, y1, x2, y2)` — top-left corner + bottom-right corner coordinates.

**Visual representation:**
```
Image coordinate system:          YOLO format stores:
(0,0) ───────────> x              • Center point (cx, cy)
  │                                • Box dimensions (w, h)
  │    ┌─────────┐
  │    │  (cx,cy)│                We need to convert to:
  │    │    ●────┤ h              • Top-left corner (x1, y1)  
  │    │    │    │                • Bottom-right corner (x2, y2)
  v    └────┴────┘
  y         w
```

**Conversion formula (step-by-step):**

Given a bounding box `[center_x, center_y, width, height]`, we calculate corner coordinates by moving half the width/height in each direction from the center:

1. **Top-left corner (x1, y1):**
   - `x1 = center_x - width/2` → Move left by half the width
   - `y1 = center_y - height/2` → Move up by half the height

2. **Bottom-right corner (x2, y2):**
   - `x2 = center_x + width/2` → Move right by half the width
   - `y2 = center_y + height/2` → Move down by half the height

**Concrete example:**
```
Input bbox: [967.5, 331.5, 463, 447]
           center_x=967.5, center_y=331.5, width=463, height=447

Step 1 - Calculate x-coordinates:
  x1 = 967.5 - 463/2 = 967.5 - 231.5 = 736.0
  x2 = 967.5 + 463/2 = 967.5 + 231.5 = 1199.0
  
Step 2 - Calculate y-coordinates:
  y1 = 331.5 - 447/2 = 331.5 - 223.5 = 108.0
  y2 = 331.5 + 447/2 = 331.5 + 223.5 = 555.0

Output: (736, 108, 1199, 555)
        (top-left corner at pixel 736,108; bottom-right at 1199,555)
```

**Edge case handling (critical for robustness):**

Images have finite dimensions (e.g., 1920×1080). Bounding boxes near edges might go outside image boundaries, causing crop errors. We clamp coordinates:

```python
x1 = max(0, x_c - w/2)          # Can't go left of pixel 0
y1 = max(0, y_c - h/2)          # Can't go above pixel 0
x2 = min(img_w, x_c + w/2)      # Can't exceed image width
y2 = min(img_h, y_c + h/2)      # Can't exceed image height
```

**Example edge case:**
```
Image: 1920×1080 pixels
Bbox: [50, 100, 200, 150]  (box near top-left edge)

Without clamping:
  x1 = 50 - 100 = -50  ❌ Invalid (negative)
  
With clamping:
  x1 = max(0, -50) = 0  ✓ Valid (starts at edge)
```

**Why integer conversion?**
- PIL's `crop()` requires integer pixel coordinates (you can't crop at pixel 736.5)
- `int()` truncates decimal → `int(736.0) = 736`
- Minor precision loss is acceptable for image processing

**Hyperparameters:**
- `IMGSZ = 299`: InceptionV3's required input size (architecture-specific constraint)
- `BATCH = 16`: Process 16 images simultaneously (GPU memory vs. speed tradeoff)

In [None]:
IMGSZ = 299
BATCH = 16


def bbox_to_xyxy(bbox: Iterable[float], img_w: int, img_h: int) -> Tuple[int, int, int, int]:

    x_c, y_c, w, h = bbox

    x1 = max(0, x_c - w / 2)
    y1 = max(0, y_c - h / 2)
    x2 = min(img_w, x_c + w / 2)
    y2 = min(img_h, y_c + h / 2)

    return int(x1), int(y1), int(x2), int(y2)


(12902, 3160)

## PyTorch Dataset Class: Custom Data Pipeline

**Why custom datasets?** PyTorch's `Dataset` class provides a standardized interface for data loading. We need custom logic because:
1. Images need cropping via bounding boxes (not standard preprocessing)
2. Each sample requires parsing JSON, computing corners, and on-the-fly cropping
3. Train and test datasets return different outputs (labels vs. row_ids)

**The three required methods:**

**1. `__init__(self, df, transform, img_root, has_label)`**
- Stores metadata (DataFrame, transforms, image folder path)
- `reset_index(drop=True)` ensures 0-based indexing for reliable access

**2. `__len__(self)`**
- Returns dataset size (e.g., 16,000 training samples)
- DataLoader uses this to calculate: `num_batches = len(dataset) / batch_size`

**3. `__getitem__(self, idx)`**
- Core logic: given an index, return one preprocessed sample
- **Pipeline:** CSV row → parse bbox → open image → crop pig → apply transforms → return (image_tensor, label)

**Example flow for idx=100:**
```
Step 1: row = df.iloc[100]  # Get row 100
Step 2: bbox = [967.5, 331.5, 463, 447]  # Parse JSON
Step 3: corners = (736, 108, 1199, 555)  # Convert format
Step 4: crop = image.crop(corners)  # Extract pig
Step 5: tensor = transform(crop)  # Resize to 299×299, normalize
Step 6: return (tensor, class_id=3)  # Standing posture
```

**Memory efficiency:** Crops are computed on-the-fly (not stored), saving RAM and enabling dynamic augmentation.

In [None]:

class PigPostureDataset(Dataset):

    def __init__(self, df: pd.DataFrame, transform, img_root: Path, has_label: bool = True):

        self.df = df.reset_index(drop=True)
        self.transform = transform
        self.img_root = img_root
        self.has_label = has_label

    def __len__(self):
        return len(self.df)


    def __getitem__(self, idx):

        row = self.df.iloc[idx]
        bbox = json.loads(row['bbox'])

        x1, y1, x2, y2 = bbox_to_xyxy(bbox, row['width'], row['height'])

        with Image.open(self.img_root / row['image_id']).convert('RGB') as im:

            crop = im.crop((x1, y1, x2, y2))

        image = self.transform(crop)

        if self.has_label:
            return image, int(row['class_id'])

        return image, row['row_id']




### Cell 13: Image Transforms — Detailed Breakdown

**What this cell does:** Defines preprocessing pipelines for training vs validation/test images.

**Why transforms?** Neural networks need:
1. Consistent input size (299×299)
2. Normalized pixel values (ImageNet mean/std)
3. Data augmentation (training only) to prevent overfitting

**Training Transforms (`train_tfms`):**

`train_tfms = T.Compose([...])` — Chains multiple transforms sequentially

1. `T.Resize(int(IMGSZ * 1.15))`
   - Resize to 343×343 (299 × 1.15 ≈ 343)
   - **Why larger than 299?** Next step randomly crops 299×299 from 343×343, creating variation
   - Maintains aspect ratio by resizing both dimensions equally

2. `T.RandomResizedCrop(IMGSZ, scale=(0.8, 1.0), ratio=(0.8, 1.2))`
   - Randomly crops a region that's 80-100% of image area
   - Aspect ratio varies between 0.8:1 and 1.2:1
   - Resizes that crop to 299×299
   - **Effect:** Simulates different zoom levels and perspectives (forces model to recognize pigs at various scales/angles)

3. `T.RandomHorizontalFlip()`
   - 50% chance to flip image left↔right
   - **Why:** Pig postures look similar when flipped; this doubles effective dataset size
   - Note: Lateral_lying_left vs right are labeled separately, but flipping helps learn spatial features

4. `T.ColorJitter(brightness=0.15, contrast=0.15, saturation=0.1, hue=0.05)`
   - Randomly adjusts color properties:
     - **Brightness:** ±15% (simulates different lighting)
     - **Contrast:** ±15% (simulates camera/exposure differences)
     - **Saturation:** ±10% (color intensity variation)
     - **Hue:** ±5% (slight color shifts)
   - **Why:** Makes model robust to lighting/camera variations

5. `T.ToTensor()`
   - Converts PIL Image (H×W×C, 0-255) to PyTorch tensor (C×H×W, 0.0-1.0)
   - Changes: NumPy/PIL format → PyTorch format, uint8 → float32, [0,255] → [0,1]

6. `T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`
   - Standardizes pixel values using ImageNet statistics
   - **Formula:** `pixel_new = (pixel_old - mean) / std`
   - **Why these values?** InceptionV3 was pretrained on ImageNet with these stats; using same normalization ensures compatibility
   - **Result:** Each channel has ~mean=0, ~std=1

**Validation/Test Transforms (`val_tfms`):**

`val_tfms = T.Compose([...])`

1. `T.Resize(int(IMGSZ * 1.15))`
   - Directly resize to 343×343 (no randomness)

2. `T.CenterCrop(IMGSZ)`
   - Crops center 299×299 region (deterministic, no randomness)
   - **Why center?** Most important part of image usually in center; consistent evaluation

3. `T.ToTensor()` — Same as train

4. `T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])` — Same as train

**Key difference:** Training uses augmentation (random crops, color jitter, flips) to artificially expand the dataset; val/test use deterministic transforms for reproducible evaluation.

In [None]:

train_tfms = T.Compose([

    T.Resize(int(IMGSZ * 1.15)),
    T.RandomResizedCrop(IMGSZ, scale=(0.8, 1.0), ratio=(0.8, 1.2)),
    T.RandomHorizontalFlip(),
    T.ColorJitter(brightness=0.15, contrast=0.15, saturation=0.1, hue=0.05),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


val_tfms = T.Compose([
    T.Resize(int(IMGSZ * 1.15)),
    T.CenterCrop(IMGSZ),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

### Cell 15: Train/Validation Split and DataLoaders — Detailed Breakdown

**What this cell does:** Splits train data into training (80%) and validation (20%) sets while keeping all crops from the same image together, then creates efficient data loaders.

**Why split?** We need:
- **Training set:** To update model weights
- **Validation set:** To check if model generalizes (not just memorizing training data)

**Critical constraint:** Same image can have multiple crops (multiple pigs). We MUST keep all crops from one image in the same split (train OR val, never both) to prevent data leakage.

**Step 1: Grouped Split**

1. `splitter = GroupShuffleSplit(test_size=0.2, n_splits=1, random_state=SEED)`
   - **Class:** scikit-learn's GroupShuffleSplit
   - **test_size=0.2:** 20% of data goes to validation (80% to training)
   - **n_splits=1:** Only create 1 split (we just need 1 train/val partition)
   - **random_state=SEED:** Reproducible split (same split every run with same seed)
   - **Key feature:** Respects groups—keeps all samples with same group ID together

2. `train_idx, val_idx = next(splitter.split(train_df, groups=train_df['image_id']))`
   - **`split(...)`:** Generates train/val indices
   - **`groups=train_df['image_id']`:** Group by image_id (e.g., `00000001.jpg`)
     - All rows with `image_id='00000001.jpg'` stay together
     - If `00000001.jpg` has 3 pig crops, all 3 go to train OR all 3 go to val
   - **`next(...)`:** Get the first (and only) split
   - **Result:** `train_idx` = array of row indices for training, `val_idx` = array for validation

3. `train_split = train_df.iloc[train_idx].copy()`
   - Select rows at train_idx positions
   - `.copy()` creates independent DataFrame (avoid SettingWithCopyWarning)
   - **Result:** ~12.8k rows (80% of 16k)

4. `val_split = train_df.iloc[val_idx].copy()`
   - Same as above but with validation indices
   - **Result:** ~3.2k rows (20% of 16k)

**Step 2: Create Datasets**

5. `train_ds = PigPostureDataset(train_split, train_tfms, TRAIN_IMAGES, has_label=True)`
   - Create dataset with training DataFrame and training transforms (augmentation)
   - `has_label=True` means this data has class_id column

6. `val_ds = PigPostureDataset(val_split, val_tfms, TRAIN_IMAGES, has_label=True)`
   - Create dataset with validation DataFrame and validation transforms (no augmentation)

**Step 3: DataLoader Configuration**

7. `num_workers = 2`
   - Number of parallel subprocesses to load data
   - **Why 2?** Small value to avoid multiprocessing issues on Windows while still gaining some speedup
   - **Trade-off:** Higher = faster loading but more memory + potential crashes

**Step 4: Create DataLoaders**

8. `train_loader = DataLoader(train_ds, batch_size=BATCH, shuffle=True, num_workers=num_workers, pin_memory=True)`
   - **train_ds:** Training dataset (~12.8k samples)
   - **batch_size=BATCH:** Load 16 images at a time
   - **shuffle=True:** Randomize order each epoch
     - **Why?** Prevents model from learning order patterns; better generalization
     - Each epoch sees data in different order
   - **num_workers=2:** Use 2 worker processes for parallel loading
   - **pin_memory=True:** Pin memory in RAM for faster GPU transfer

9. `val_loader = DataLoader(val_ds, batch_size=BATCH, shuffle=False, num_workers=num_workers, pin_memory=True)`
   - **val_ds:** Validation dataset (~3.2k samples)
   - **shuffle=False:** Keep consistent order for reproducible validation
     - Validation should be deterministic—same order every time

10. `len(train_ds), len(val_ds)`
    - Displays dataset sizes to verify split worked correctly
    - **Expected:** (~12800, ~3200) or similar 80/20 ratio
    - DataLoader length = ceil(dataset_size / batch_size), e.g., ceil(12800/16) = 800 batches

**Why GroupShuffleSplit matters:** Without grouping, we might have:
- Train: crop #1 from image A
- Val: crop #2 from same image A
- **Problem:** Model sees image A during training, then "evaluates" on same image A—unfair advantage, validation accuracy would be artificially high!

With grouping: All crops from image A go to train OR val, never both—fair evaluation.

In [None]:

# Train/val split grouped by image_id to avoid leakage across crops

splitter = GroupShuffleSplit(test_size=0.2, n_splits=1, random_state=SEED)
train_idx, val_idx = next(splitter.split(train_df, groups=train_df['image_id']))
train_split = train_df.iloc[train_idx].copy()
val_split = train_df.iloc[val_idx].copy()


train_ds = PigPostureDataset(train_split, train_tfms, TRAIN_IMAGES, has_label=True)
val_ds = PigPostureDataset(val_split, val_tfms, TRAIN_IMAGES, has_label=True)



num_workers = 2  # keep small for Windows
train_loader = DataLoader(train_ds, batch_size=BATCH, shuffle=True, num_workers=num_workers, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=BATCH, shuffle=False, num_workers=num_workers, pin_memory=True)

len(train_ds), len(val_ds)

## Transfer Learning with InceptionV3

**Why transfer learning?** Training from scratch requires millions of labeled images and weeks of GPU time. Instead, we leverage ImageNet knowledge (1.4M images, 1000 classes) and adapt it to our 5-class pig problem.

**InceptionV3 architecture:**
- 48 convolutional layers with "inception modules" (parallel filters of different sizes)
- Pretrained on ImageNet: already recognizes edges, textures, shapes, and object parts
- Input: 299×299 RGB images (architecture requirement)
- Original output: 1000 classes

**Our modifications:**
```python
base_model = models.inception_v3(weights=IMAGENET1K_V1, aux_logits=True)
base_model.fc = nn.Linear(2048, 5)  # Replace final layer
```

**What we keep:** All 47 convolutional layers with ImageNet weights → feature extraction backbone

**What we replace:** Final fully-connected layer (1000 outputs → 5 outputs)

**Training strategy:**
1. Freeze conv layers initially? No — we fine-tune ALL weights (better accuracy on small datasets)
2. Lower learning rate (2e-4) prevents catastrophic forgetting of ImageNet features

**aux_logits=True:** InceptionV3 has an auxiliary classifier at layer 17 (combats vanishing gradients). During training, loss = main_loss + 0.3 × aux_loss. During inference, we ignore auxiliary output.

In [None]:
# Inception expects aux_logits=True with weights; grab main logits below

base_model = models.inception_v3(weights=models.Inception_V3_Weights.IMAGENET1K_V1, aux_logits=True)
base_model.fc = nn.Linear(base_model.fc.in_features, len(class_names))
base_model.to(DEVICE)


## Training Configuration: Loss, Optimizer, Scheduler

**Hyperparameters:**
- `EPOCHS = 3`: Full passes through training data (more = better fit, but risk overfitting)
- `LR = 2e-4`: Learning rate (step size for weight updates)

**Loss function: CrossEntropyLoss**
- Combines softmax + negative log-likelihood for multi-class classification
- Formula: `loss = -log(softmax(logits)[true_class])`
- Example: Model outputs `[0.1, 0.05, 0.2, 0.6, 0.05]`, true class=3 → loss = -log(0.6) ≈ 0.51
- Lower loss = more confident correct predictions

**Optimizer: AdamW**
- Adaptive learning rates per parameter (automatically adjusts based on gradient history)
- "W" = decoupled weight decay (L2 regularization): penalizes large weights to prevent overfitting
- `weight_decay=1e-4`: Adds small penalty proportional to weight magnitude

**Learning rate scheduler: CosineAnnealingLR**
- Gradually decreases LR following a cosine curve: LR(epoch) = LR_max × 0.5 × (1 + cos(π × epoch/T_max))
- Why? Early training benefits from larger steps (far from optimum), late training needs fine adjustments
- Example schedule: Epoch 1: 0.0002 → Epoch 2: 0.0001 → Epoch 3: 0.00002

In [None]:
EPOCHS = 3

LR = 2e-4

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(base_model.parameters(), lr=LR, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=EPOCHS)

## Training/Validation Epoch Runner

**Purpose:** Unified function to run one complete pass through data (training with weight updates OR validation without updates).

**Key steps:**
1. **Set mode:** `model.train(True)` enables dropout/batch-norm training behavior; `model.eval()` disables it
2. **Batch loop:** Process 16 images at a time (efficient parallelization)
3. **Forward pass:** Input images → model outputs 5 class scores per image
4. **Compute loss:** How far are predictions from true labels?
5. **Backward pass (training only):**
   - `optimizer.zero_grad()`: Clear old gradients
   - `loss.backward()`: Compute new gradients via backpropagation
   - `optimizer.step()`: Update all 25M weights based on gradients
6. **Track metrics:** Accumulate loss and count correct predictions
7. **Return:** Average loss and accuracy for the epoch

**InceptionV3 quirk:** Returns `(main_output, auxiliary_output)` tuple during training. We extract `outputs[0]` to use only the main classifier.

**Example outputs:**
- Training: loss=0.65, acc=0.75 (75% correct, weights updated 800 times)
- Validation: loss=0.80, acc=0.72 (72% correct, no weight changes)

In [None]:
def run_epoch(model, loader, train_mode: bool):

    model.train(mode=train_mode)

    running_loss, correct, total = 0.0, 0, 0

    for images, targets in loader:

        images, targets = images.to(DEVICE), targets.to(DEVICE)

        with torch.set_grad_enabled(train_mode):

            outputs = model(images)

            if isinstance(outputs, tuple):  # main logits, aux logits
                outputs = outputs[0]
            loss = criterion(outputs, targets)

            if train_mode:

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

        running_loss += loss.item() * images.size(0)
        correct += (outputs.argmax(1) == targets).sum().item()

        total += targets.size(0)

    avg_loss = running_loss / total
    acc = correct / total

    return avg_loss, acc


### Cell 23: Training Loop with Checkpointing — Detailed Breakdown

**What this cell does:** Trains the model for multiple epochs, validates after each epoch, saves the best model, and prints progress.

**Initialization:**

1. `best_acc = 0.0`
   - **Track best validation accuracy** seen so far
   - Initialize to 0.0 (any trained model will beat this)
   - **Purpose:** Only save model when it improves (avoids saving worse versions)

2. `best_path = PROJECT_DIR / 'best_inception.pt'`
   - **Path where we'll save the best model**
   - **Location:** Same folder as notebook
   - **Extension:** .pt is standard for PyTorch models
   - **Contains:** Model weights + validation accuracy

**Main Training Loop:**

3. `for epoch in range(1, EPOCHS + 1):`
   - **Iterate through epochs:** 1, 2, 3 (human-readable numbering)
   - **Each epoch:** Complete pass through all training data + validation

**Step 4: Train for one epoch**

4. `train_loss, train_acc = run_epoch(base_model, train_loader, train_mode=True)`
   - **Call run_epoch with train_mode=True**
   - **What happens inside:**
     1. Loop through all 800 training batches
     2. Forward pass → compute loss → backward pass → update weights
     3. Track loss and accuracy across all batches
   - **Returns:** 
     - `train_loss`: Average loss on training set (e.g., 0.65)
     - `train_acc`: Fraction correctly classified (e.g., 0.75 = 75%)
   - **Time:** Takes most of the epoch time (updating weights is slow)

**Step 5: Validate**

5. `val_loss, val_acc = run_epoch(base_model, val_loader, train_mode=False)`
   - **Call run_epoch with train_mode=False (evaluation mode)**
   - **What happens inside:**
     1. Loop through all 200 validation batches
     2. Forward pass → compute loss → NO backward pass, NO weight updates
     3. Track loss and accuracy
   - **Returns:**
     - `val_loss`: Average loss on validation set (e.g., 0.80)
     - `val_acc`: Fraction correctly classified (e.g., 0.72 = 72%)
   - **Time:** Faster than training (no backprop)
   - **Purpose:** Check if model generalizes (not just memorizing training data)

**Step 6: Adjust learning rate**

6. `scheduler.step()`
   - **Update learning rate** using CosineAnnealingLR
   - **Effect:** LR gradually decreases each epoch
     - Epoch 1: LR ≈ 0.00020
     - Epoch 2: LR ≈ 0.00010
     - Epoch 3: LR ≈ 0.00002
   - **Why?** Fine-tune with smaller steps as training progresses

**Step 7: Save best model (checkpoint)**

7. `if val_acc > best_acc:`
   - **Check:** Did validation accuracy improve?
   - **Example:** 
     - Epoch 1: val_acc=0.70 > best_acc=0.00 → Save
     - Epoch 2: val_acc=0.73 > best_acc=0.70 → Save
     - Epoch 3: val_acc=0.72 < best_acc=0.73 → Don't save
   - **Why check validation?** We care about generalization, not just training performance

8. `best_acc = val_acc`
   - **Update best accuracy** to current validation accuracy
   - **Example:** best_acc = 0.73

9. `torch.save({'model_state': base_model.state_dict(), 'acc': val_acc}, best_path)`
   - **Save checkpoint to disk**
   
   - **`base_model.state_dict()`:** Dictionary of all model parameters
     - Keys: layer names (e.g., 'Conv2d_1a_3x3.conv.weight')
     - Values: weight tensors
     - ~25 million parameters for InceptionV3
   
   - **`{'model_state': ..., 'acc': ...}`:** Save multiple items in one file
     - 'model_state': The model weights
     - 'acc': Validation accuracy (for reference)
   
   - **`torch.save(..., best_path)`:** Serialize to file
     - **Format:** PyTorch's binary format (pickled tensors)
     - **Size:** ~90-100 MB (depends on model)
   
   - **Result:** File `best_inception.pt` contains the best model so far
   - **Why?** If training crashes or we stop early, we can load the best model

**Step 8: Print progress**

10. `print(f"Epoch {epoch}/{EPOCHS} | train loss {train_loss:.4f} acc {train_acc:.3f} | val loss {val_loss:.4f} acc {val_acc:.3f}")`
    - **Display training progress** for human monitoring
    - **Format:**
      - `{epoch}/{EPOCHS}`: e.g., "2/3"
      - `{train_loss:.4f}`: 4 decimal places, e.g., "0.6542"
      - `{train_acc:.3f}`: 3 decimal places, e.g., "0.753"
    - **Example output:**
      ```
      Epoch 1/3 | train loss 0.6542 acc 0.753 | val loss 0.7831 acc 0.702
      Epoch 2/3 | train loss 0.4201 acc 0.841 | val loss 0.5123 acc 0.829
      Epoch 3/3 | train loss 0.3012 acc 0.892 | val loss 0.4856 acc 0.845
      ```
    - **What to look for:**
      - Train loss/acc should improve (decrease/increase)
      - Val acc should improve (if not, might be overfitting)
      - Gap between train/val indicates overfitting (train much better than val)

**Final output:**

11. `best_acc, best_path`
    - **Display best validation accuracy** achieved
    - **Display path** where best model is saved
    - **Example:** `(0.845, PosixPath('/path/to/best_inception.pt'))`
    - **Purpose:** Quick confirmation of training success

**Summary of training flow:**
1. Epoch 1: Train → Validate → Save (first model always best) → Print → Decrease LR
2. Epoch 2: Train → Validate → Save if better → Print → Decrease LR
3. Epoch 3: Train → Validate → Save if better → Print
4. Load best model for inference

In [None]:
best_acc = 0.0

best_path = PROJECT_DIR / 'best_inception.pt'

for epoch in range(1, EPOCHS + 1):

    train_loss, train_acc = run_epoch(base_model, train_loader, train_mode=True)
    val_loss, val_acc = run_epoch(base_model, val_loader, train_mode=False)

    scheduler.step()

    if val_acc > best_acc:
        best_acc = val_acc
        torch.save({'model_state': base_model.state_dict(), 'acc': val_acc}, best_path)

    print(f"Epoch {epoch}/{EPOCHS} | train loss {train_loss:.4f} acc {train_acc:.3f} | val loss {val_loss:.4f} acc {val_acc:.3f}")


best_acc, best_path

### Cell 25: Inference and Submission Generation — Detailed Breakdown

**What this cell does:** Loads the best trained model, makes predictions on test set, and creates a CSV submission file.

**Step 1: Load Best Model**

1. `checkpoint = torch.load(best_path, map_location=DEVICE)`
   - **Load saved checkpoint** from disk
   - **`torch.load(...)`:** Deserializes PyTorch file
   - **`map_location=DEVICE`:** Ensures tensors load to correct device
     - If DEVICE='cuda': load to GPU
     - If DEVICE='cpu': load to CPU (even if saved on GPU)
   - **Result:** Dictionary `{'model_state': state_dict, 'acc': 0.845}`

2. `base_model.load_state_dict(checkpoint['model_state'])`
   - **Restore model weights** from checkpoint
   - **`checkpoint['model_state']`:** The state_dict we saved earlier
   - **Effect:** All model parameters now match the best epoch
   - **Why?** Current model might be from epoch 3, but best was epoch 2

3. `base_model.eval()`
   - **Set model to evaluation mode**
   - **Effect:**
     - Disables dropout (no random neuron dropping)
     - Batch norm uses running statistics (not batch statistics)
   - **Why?** Ensures deterministic, consistent predictions

**Step 2: Create Test DataLoader**

4. `test_ds = PigPostureDataset(test_df, val_tfms, TEST_IMAGES, has_label=False)`
   - **Create test dataset**
   - **test_df:** Test CSV (~7k rows with no class_id column)
   - **val_tfms:** Use validation transforms (no augmentation, deterministic)
   - **TEST_IMAGES:** Folder with test images
   - **has_label=False:** Dataset returns (image, row_id) instead of (image, class_id)

5. `test_loader = DataLoader(test_ds, batch_size=BATCH, shuffle=False, num_workers=2, pin_memory=True)`
   - **Batch test data**
   - **shuffle=False:** CRITICAL for submission—maintain row order
   - **Why?** Predictions must align with row_ids
   - **num_workers=2:** Parallel loading for speed
   - **Result:** ~437 batches (7000 / 16 ≈ 437)

**Step 3: Inference Loop**

6. `preds, row_ids = [], []`
   - **Initialize lists** to collect results
   - **preds:** Will store predicted class_ids (0-4)
   - **row_ids:** Will store corresponding row_ids from test.csv

7. `with torch.no_grad():`
   - **Disable gradient computation**
   - **Effect:** Saves memory, speeds up inference
   - **Why?** We're not training, so no need for gradients
   - **Context manager:** Automatically re-enables gradients after block

8. `for images, ids in test_loader:`
   - **Loop through test batches**
   - **images:** Batch of image tensors (16, 3, 299, 299)
   - **ids:** Batch of row_id strings (16,)
   - **Note:** ids are strings like 'test_00000001_0000', not class labels

9. `images = images.to(DEVICE)`
   - **Move images to GPU/CPU** where model is located
   - **Required:** Model and data must be on same device

10. `outputs = base_model(images)`
    - **Forward pass:** Run images through model
    - **Input:** (16, 3, 299, 299) tensor
    - **Output:** Usually (16, 5) tensor of logits
    - **Special case:** InceptionV3 returns tuple `(main_logits, aux_logits)`

11. `if isinstance(outputs, tuple): outputs = outputs[0]`
    - **Handle InceptionV3 auxiliary logits**
    - **If tuple:** Extract main output, ignore auxiliary
    - **Result:** outputs is (16, 5) tensor of logits

12. `pred = outputs.argmax(1).cpu().tolist()`
    - **Convert logits to class predictions**
    
    - **`outputs.argmax(1)`:** Find class with highest logit for each sample
      - Input: (16, 5) logits
      - `argmax(1)`: argmax along dimension 1 (classes)
      - Output: (16,) tensor of class indices
    
    - **`.cpu()`:** Move tensor to CPU (required before converting to Python)
      - If already on CPU, this is a no-op
    
    - **`.tolist()`:** Convert tensor to Python list
      - Example: `tensor([3, 1, 4, 0, ...])` → `[3, 1, 4, 0, ...]`
    
    - **Result:** List of predicted class_ids (integers 0-4)

13. `preds.extend(pred)`
    - **Append predictions** from this batch to main list
    - **extend vs append:** extend adds each element, append would add entire list as one element

14. `row_ids.extend(ids)`
    - **Append row_ids** from this batch to main list
    - **Critical:** Maintains alignment between predictions and row_ids

**Step 4: Create Submission File**

15. `submission = pd.DataFrame({'row_id': row_ids, 'class_id': preds})`
    - **Build DataFrame** with required columns
    - **Columns:**
      - 'row_id': Test identifiers (e.g., 'test_00000001_0000')
      - 'class_id': Predicted posture classes (0-4)
    - **Example:**
      ```
      row_id                  class_id
      test_00000001_0000      3
      test_00000001_0001      1
      test_00000002_0000      4
      ```
    - **Length:** ~7000 rows (matches test.csv)

16. `submission.to_csv('inception_submission.csv', index=False)`
    - **Write to CSV file**
    - **Filename:** 'inception_submission.csv' in current directory
    - **index=False:** Don't write DataFrame index as a column
    - **Format:** CSV with header row
    - **Result:** File ready for Kaggle submission

17. `submission.head()`
    - **Display first 5 rows** for quick visual check
    - **Purpose:** Verify format looks correct before submitting
    - **Example output:**
      ```
                   row_id  class_id
      0  test_00000001_0000         3
      1  test_00000001_0001         1
      2  test_00000002_0000         4
      3  test_00000002_0001         0
      4  test_00000003_0000         2
      ```

**Class ID mapping (reminder):**
- 0: Lateral_lying_left
- 1: Lateral_lying_right
- 2: Sitting
- 3: Standing
- 4: Sternal_lying

**Critical details:**
- **Order matters:** shuffle=False ensures predictions align with row_ids
- **Best model:** We loaded checkpoint from best validation epoch, not final epoch
- **Deterministic:** eval() mode + no augmentation = same predictions every run
- **Batch processing:** Much faster than one image at a time

In [None]:
# Inference on test set and submission

checkpoint = torch.load(best_path, map_location=DEVICE)
base_model.load_state_dict(checkpoint['model_state'])

base_model.eval()


test_ds = PigPostureDataset(test_df, val_tfms, TEST_IMAGES, has_label=False)
test_loader = DataLoader(test_ds, batch_size=BATCH, shuffle=False, num_workers=2, pin_memory=True)



preds, row_ids = [], []

with torch.no_grad():

    for images, ids in test_loader:

        images = images.to(DEVICE)
        outputs = base_model(images)

        if isinstance(outputs, tuple):

            outputs = outputs[0]

        pred = outputs.argmax(1).cpu().tolist()
        preds.extend(pred)
        row_ids.extend(ids)



submission = pd.DataFrame({'row_id': row_ids, 'class_id': preds})

submission.to_csv('inception_submission.csv', index=False)

submission.head()