# Tomato Classification (Reproducible Pipeline)

This notebook is the clean, narrative entry point for reproducing the tomato classification study using modular package code.

## 1. Setup

Load the package and reproducible-run config.

In [None]:
from pathlib import Path
import sys

project_root = Path.cwd().resolve().parent if Path.cwd().name == 'notebooks' else Path.cwd()
sys.path.insert(0, str(project_root / 'src'))

from tomato_classifier.config import load_config
from tomato_classifier.pipeline import run_reproducible_pipeline
print('Imports ready')


## 2. Configure Experiment

The default config reproduces the original experimental settings: 5-fold stratified CV, 30 selected attributes, and the same model seeds.

In [None]:
config_path = project_root / 'configs' / 'reproducible_run.yaml'
cfg = load_config(str(config_path))
cfg


## 3. Run Full Pipeline

This executes data loading, per-fold feature selection, model training, evaluation, SHAP analysis, and figure export.

In [None]:
artifacts = run_reproducible_pipeline(cfg)
print('Run metrics JSON:', artifacts.report_json_path)
print('Summary markdown:', artifacts.summary_md_path)


## 4. Inspect Key Results

In [None]:
payload = artifacts.payload
payload['aggregate']


In [None]:
from pprint import pprint
print('Fold test accuracies:')
for model_key in ['rf_metrics','dt_metrics','lr_metrics','baseline_metrics']:
    print('\n' + model_key)
    pprint(payload[model_key])


## 5. Generated Artifacts

- Correlation panel
- Dendrogram panel
- Decision plots by class/fold
- Force plots (TP and rounded TP/FN)
- Final-fold decision tree
- Machine-readable metrics and validation-ready reports


In [None]:
payload['artifacts']
