Skip to content

feat: YOLO-seg training script for pottery crop model (local GPU) #416

@shaoster

Description

@shaoster

Problem / Motivation

Milestone #2 replaces the off-the-shelf rembg/u2net pipeline backing piece_image_crop_service with a pottery-tuned Ultralytics YOLO-seg model. The training dataset will be materialized by #415 in YOLO-seg format with deterministic splits and a versioned dataset card. We need a script that can consume that dataset and produce a trained foreground-segmentation checkpoint on a local developer GPU (Modal GPU remains an optional path for collaborators without local hardware), ready for the evaluation harness in #417 to grade.

This issue STOPS at "can produce a checkpoint with passable Ultralytics val metrics." The rembg-baseline comparison, locked success-criteria check, and production go/no-go report all happen in #7.

Proposed Solution

Add tools/train_crop_model.py wrapping the Ultralytics YOLO-seg trainer, with two execution modes:

  • Local (primary): runs ultralytics directly on the maintainer's local GPU against a dataset on disk. This is the gating path — full training runs and the checkpoint that feeds #417 come from here.
  • Modal (optional): same entrypoint, deployable as a Modal GPU function for collaborators without local hardware. Pulls the dataset from the Modal Volume / S3 location materialized by #415 and writes checkpoints to a versioned Modal Volume path that #419's backend registry can later resolve. Not required to ship this issue, but the abstraction should be cleanly factored so it isn't a rewrite to add.

Model output

A foreground segmentation mask (per the locked design decision in milestone #2). Pottery is a single-class problem here — one foreground class, no maker-mark or sub-part heads. The tight bbox and any padding are derived downstream and are explicitly out of scope for this issue.

Configuration

Training is driven by a YAML config (e.g. tools/configs/crop_model/<name>.yaml) covering at minimum:

  • Base model (e.g. yolov8n-seg.pt, yolov8s-seg.pt) and image size.
  • Epochs, batch size, optimizer, LR schedule, augmentation toggles.
  • Dataset version pointer (the versioned dataset card from #5).
  • Output directory layout under the Modal Volume.

A small set of CLI overrides (--epochs, --batch, --resume, --config, --run-id) is enough — no full hyperparameter sweep machinery in this ticket.

Checkpoint upload

Checkpoints land at a versioned path. Local mode writes to a directory under the repo (gitignored) or a configurable --output-dir; Modal mode writes the same layout to a Modal Volume.

crop-models/<run_id>/
  best.pt
  last.pt
  args.yaml          # full resolved Ultralytics args
  run.json           # run_id, dataset version, git sha, hparams, final val metrics
  results.csv        # Ultralytics per-epoch log

run_id is a deterministic slug (e.g. <config-name>-<utc-timestamp>-<git-shortsha>). The structure is what #419 will read from (regardless of whether the checkpoint sat on local disk first or directly on a Modal Volume); this issue only has to write it consistently. A small --upload-to <modal-volume|s3> flag (or a separate subcommand) is sufficient for shipping local checkpoints to a remote location when serving picks them up.

Resumable runs

--resume with no value resumes the latest checkpoint for the given run_id/output dir; --resume <path> resumes from an explicit checkpoint. Local resume must round-trip a killed-and-restarted process; if Modal mode is exercised, resume must also round-trip a preemption.

Experiment tracking

Minimal, in-repo: every run writes run.json containing run id, resolved hparams, dataset version, git sha, Ultralytics version, final val metrics (mAP50, mAP50-95, mask IoU on val). No W&B / MLflow integration in this ticket — keep dependencies tight. A small tools/list_crop_model_runs.py (or equivalent subcommand) that enumerates crop-models/*/run.json on the Volume is a nice-to-have but not required.

Bazel wiring

tools/train_crop_model.py is exposed as a py_binary (and the Modal entrypoint as a separate target if cleaner). Use the existing patterns in tools/ — no new Python deps without a follow-up (ultralytics lands via the standard dep-add flow; flag if it isn't already declared).

Acceptance Criteria

  • tools/train_crop_model.py exists with a documented CLI and is runnable as a py_binary.
  • At least one example config under tools/configs/crop_model/ (e.g. yolov8n-seg-default.yaml) is checked in and used by tests/docs.
  • Runs locally against a tiny fixture dataset (smoke test) AND against the full #415 dataset on the maintainer's GPU.
  • Modal entrypoint exists and can be invoked (modal run …) — full Modal training is optional; it must work, but the gating training run for this milestone is local.
  • Trained checkpoints, args.yaml, run.json, and results.csv land in a versioned crop-models/<run_id>/ directory locally; --upload-to (or equivalent) ships the same layout to a Modal Volume / S3 path.
  • --resume resumes the latest checkpoint after a simulated process kill and produces a continuous results.csv.
  • run.json captures: run id, dataset version, git sha, Ultralytics version, resolved hparams, final Ultralytics val metrics (mAP50, mAP50-95, mask IoU).
  • A README or top-of-file docstring documents: how to launch a local training run, where checkpoints land, how to optionally run on Modal, and how to resume.
  • At least one full training run completes end-to-end on the #415 dataset (locally) and produces a checkpoint with non-trivial Ultralytics val metrics (i.e. clearly above random — exact thresholds are NOT gated here; #417 sets the production bar).
  • Lints + tests pass (rtk bazel build --config=lint //..., rtk bazel test //...). Tests cover config parsing, run-id generation, and the upload-path layout — full GPU training is exercised manually, not in CI.

Out of Scope

  • Comparing the trained model against rembg, evaluating on the flagged-bad corpus from #1/#2, or making a go/no-go call against the locked success criteria — all of that lives in #7.
  • Serving / inference endpoints, the backend registry, and per-version Modal apps — #419.
  • A/B routing, shadow mode, dashboards — #418.
  • Padding logic (per-potter setting, downstream of the model) — #420 / #419.
  • Hyperparameter sweeps, NAS, distillation, multi-GPU/DDP — explicitly deferred. One config + one GPU is enough to clear this ticket.
  • W&B / MLflow / external experiment trackers.
  • Any changes to piece_image_crop_service.py or the existing rembg path.

Dependencies

  • Depends on #415 (Training Dataset Export Pipeline) — needs the YOLO-seg-formatted dataset and dataset version card.
  • Unblocks #417 (Evaluation Harness & Baseline Report) and #419 (Per-Model-Version Modal Apps + Backend Registry).

Milestone Cross-References

Part of milestone #2 — Custom Pottery Crop Model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions