Confirm the training preprocessing pipeline is correct *before training*:
1) Images load and paths resolve
2) Channel handling is correct (grayscale â†’ RGB if needed)
3) Train/val transforms behave sensibly (no over-cropping)
4) Normalization is correct (for pretrained models)
5) Output tensors have correct shape/range and no NaNs/Infs


Config and Bootstrapping

In [1]:
from _bootstrap import bootstrap
bootstrap()

from pathlib import Path
import pandas as pd
from xai_lab.utils.paths import find_project_root, resolve_path

PROJECT_ROOT = find_project_root()

# --- dataset input ---
CSV_PATH = PROJECT_ROOT / "data/processed/ckplus/splits/train.csv"

# --- column mapping (generic) ---
COL_PATH = "path"
COL_LABEL_NAME = "label_name"   # optional
COL_LABEL_ID = "label"          # optional

# --- params ---
INPUT_SIZE = 224
N_SAMPLES = 12
RANDOM_SEED = 42

OUT_DIR = PROJECT_ROOT / "artifacts/reports/eda"
OUT_DIR.mkdir(parents=True, exist_ok=True)

print("PROJECT_ROOT:", PROJECT_ROOT)
print("CSV_PATH:", CSV_PATH)


PROJECT_ROOT: D:\Kebench\Documents\projects\xai-lab
CSV_PATH: D:\Kebench\Documents\projects\xai-lab\data\processed\ckplus\splits\train.csv


Load CSV + resolve paths + existence check

In [None]:
df = pd.read_csv(CSV_PATH)
df[COL_PATH] = df[COL_PATH].astype(str).str.strip()

df["resolved_path"] = df[COL_PATH].map(lambda p: resolve_path(p, PROJECT_ROOT))

# See if there are missing images
missing_df = df.loc[~df["resolved_path"].map(lambda p: p.exists())].copy()
print("Rows:", len(df), "| Missing:", len(missing_df))

missing_df.head()


Rows: 686 | Missing: 0


Unnamed: 0,path,label,label_name,width,height,mode,sha1,resolved_path
