# 05 — Modelling & Evaluation (CNN)

**Objective**  
Train and evaluate a baseline CNN to predict powdery mildew from cherry leaf images.

**Inputs**  
- Split manifests: `inputs/manifests/v1/{train,val,test}.csv`
- Images: `inputs/cherry_leaves_dataset/{healthy,powdery_mildew}`

**Outputs (planned)**  
- Trained model artifacts under `artifacts/v1/models/`
- Training history & evaluation plots under `plots/v3/`
- Metrics report under `artifacts/v1/reports/`

**Notes**  
Images will be resized to a fixed input size and normalized. Early stopping and model checkpointing will be used.

In [1]:
from pathlib import Path
import sys

def find_project_root(start: Path) -> Path:
    """Walk up until a folder containing 'src' is found, else return start."""
    p = start
    for _ in range(5):
        if (p / "src").exists():
            return p
        p = p.parent
    return start

PROJECT_ROOT = find_project_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

from src.paths import PROJECT_ROOT, DATA_DIR, MANIFESTS_DIR, PLOTS_DIR, ARTIFACTS_DIR

print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_DIR:", DATA_DIR)
print("MANIFESTS_DIR:", MANIFESTS_DIR)
print("PLOTS_DIR:", PLOTS_DIR)
print("ARTIFACTS_DIR:", ARTIFACTS_DIR)

PROJECT_ROOT: C:\Users\ksstr\Documents\Coding\milestone-project-5
DATA_DIR: C:\Users\ksstr\Documents\Coding\milestone-project-5\inputs\cherry_leaves_dataset
MANIFESTS_DIR: C:\Users\ksstr\Documents\Coding\milestone-project-5\inputs\manifests\v1
PLOTS_DIR: C:\Users\ksstr\Documents\Coding\milestone-project-5\plots\v1
ARTIFACTS_DIR: C:\Users\ksstr\Documents\Coding\milestone-project-5\artifacts


In [2]:
# Modelling configuration
IMG_SIZE = (100, 100)   # (width, height)
BATCH_SIZE = 32
SEED = 42

print("Config → IMG_SIZE:", IMG_SIZE, "| BATCH_SIZE:", BATCH_SIZE, "| SEED:", SEED)

Config → IMG_SIZE: (100, 100) | BATCH_SIZE: 32 | SEED: 42


In [3]:
import pandas as pd

paths = {
    "train": MANIFESTS_DIR / "train.csv",
    "val":   MANIFESTS_DIR / "val.csv",
    "test":  MANIFESTS_DIR / "test.csv",
}

for name, p in paths.items():
    assert p.exists(), f"Missing manifest: {p}"

df_train = pd.read_csv(paths["train"])
df_val   = pd.read_csv(paths["val"])
df_test  = pd.read_csv(paths["test"])

for name, df in [("train", df_train), ("val", df_val), ("test", df_test)]:
    print(f"{name:>5} n={len(df)}")
    vc = df["label"].value_counts(normalize=True).rename("proportion").round(3)
    print(vc, "\n")

display(df_train.head(3))

train n=2945
label
powdery_mildew    0.5
healthy           0.5
Name: proportion, dtype: float64 

  val n=631
label
healthy           0.501
powdery_mildew    0.499
Name: proportion, dtype: float64 

 test n=632
label
powdery_mildew    0.5
healthy           0.5
Name: proportion, dtype: float64 



Unnamed: 0,filepath,label
0,C:\Users\ksstr\Documents\Coding\milestone-proj...,healthy
1,C:\Users\ksstr\Documents\Coding\milestone-proj...,powdery_mildew
2,C:\Users\ksstr\Documents\Coding\milestone-proj...,healthy


### Pre-flight checks

- Split manifests found and loaded successfully.
- Class proportions are approximately balanced across train/val/test.
- Next step: implement a TensorFlow `tf.data` pipeline (decode → resize → normalize → batch → prefetch) using these manifests.