You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Milestone #2 replaces the off-the-shelf rembg/u2net pipeline backing piece_image_crop_service with a pottery-tuned Ultralytics YOLO-seg model. The training dataset will be materialized by #415 in YOLO-seg format with deterministic splits and a versioned dataset card. We need a script that can consume that dataset and produce a trained foreground-segmentation checkpoint on a local developer GPU (Modal GPU remains an optional path for collaborators without local hardware), ready for the evaluation harness in #417 to grade.
This issue STOPS at "can produce a checkpoint with passable Ultralytics val metrics." The rembg-baseline comparison, locked success-criteria check, and production go/no-go report all happen in #7.
Proposed Solution
Add tools/train_crop_model.py wrapping the Ultralytics YOLO-seg trainer, with two execution modes:
Local (primary): runs ultralytics directly on the maintainer's local GPU against a dataset on disk. This is the gating path — full training runs and the checkpoint that feeds #417 come from here.
Modal (optional): same entrypoint, deployable as a Modal GPU function for collaborators without local hardware. Pulls the dataset from the Modal Volume / S3 location materialized by #415 and writes checkpoints to a versioned Modal Volume path that #419's backend registry can later resolve. Not required to ship this issue, but the abstraction should be cleanly factored so it isn't a rewrite to add.
Model output
A foreground segmentation mask (per the locked design decision in milestone #2). Pottery is a single-class problem here — one foreground class, no maker-mark or sub-part heads. The tight bbox and any padding are derived downstream and are explicitly out of scope for this issue.
Configuration
Training is driven by a YAML config (e.g. tools/configs/crop_model/<name>.yaml) covering at minimum:
Base model (e.g. yolov8n-seg.pt, yolov8s-seg.pt) and image size.
Epochs, batch size, optimizer, LR schedule, augmentation toggles.
Dataset version pointer (the versioned dataset card from #5).
Output directory layout under the Modal Volume.
A small set of CLI overrides (--epochs, --batch, --resume, --config, --run-id) is enough — no full hyperparameter sweep machinery in this ticket.
Checkpoint upload
Checkpoints land at a versioned path. Local mode writes to a directory under the repo (gitignored) or a configurable --output-dir; Modal mode writes the same layout to a Modal Volume.
crop-models/<run_id>/
best.pt
last.pt
args.yaml # full resolved Ultralytics args
run.json # run_id, dataset version, git sha, hparams, final val metrics
results.csv # Ultralytics per-epoch log
run_id is a deterministic slug (e.g. <config-name>-<utc-timestamp>-<git-shortsha>). The structure is what #419 will read from (regardless of whether the checkpoint sat on local disk first or directly on a Modal Volume); this issue only has to write it consistently. A small --upload-to <modal-volume|s3> flag (or a separate subcommand) is sufficient for shipping local checkpoints to a remote location when serving picks them up.
Resumable runs
--resume with no value resumes the latest checkpoint for the given run_id/output dir; --resume <path> resumes from an explicit checkpoint. Local resume must round-trip a killed-and-restarted process; if Modal mode is exercised, resume must also round-trip a preemption.
Experiment tracking
Minimal, in-repo: every run writes run.json containing run id, resolved hparams, dataset version, git sha, Ultralytics version, final val metrics (mAP50, mAP50-95, mask IoU on val). No W&B / MLflow integration in this ticket — keep dependencies tight. A small tools/list_crop_model_runs.py (or equivalent subcommand) that enumerates crop-models/*/run.json on the Volume is a nice-to-have but not required.
Bazel wiring
tools/train_crop_model.py is exposed as a py_binary (and the Modal entrypoint as a separate target if cleaner). Use the existing patterns in tools/ — no new Python deps without a follow-up (ultralytics lands via the standard dep-add flow; flag if it isn't already declared).
Acceptance Criteria
tools/train_crop_model.py exists with a documented CLI and is runnable as a py_binary.
At least one example config under tools/configs/crop_model/ (e.g. yolov8n-seg-default.yaml) is checked in and used by tests/docs.
Runs locally against a tiny fixture dataset (smoke test) AND against the full #415 dataset on the maintainer's GPU.
Modal entrypoint exists and can be invoked (modal run …) — full Modal training is optional; it must work, but the gating training run for this milestone is local.
Trained checkpoints, args.yaml, run.json, and results.csv land in a versioned crop-models/<run_id>/ directory locally; --upload-to (or equivalent) ships the same layout to a Modal Volume / S3 path.
--resume resumes the latest checkpoint after a simulated process kill and produces a continuous results.csv.
run.json captures: run id, dataset version, git sha, Ultralytics version, resolved hparams, final Ultralytics val metrics (mAP50, mAP50-95, mask IoU).
A README or top-of-file docstring documents: how to launch a local training run, where checkpoints land, how to optionally run on Modal, and how to resume.
At least one full training run completes end-to-end on the #415 dataset (locally) and produces a checkpoint with non-trivial Ultralytics val metrics (i.e. clearly above random — exact thresholds are NOT gated here; #417 sets the production bar).
Lints + tests pass (rtk bazel build --config=lint //..., rtk bazel test //...). Tests cover config parsing, run-id generation, and the upload-path layout — full GPU training is exercised manually, not in CI.
Out of Scope
Comparing the trained model against rembg, evaluating on the flagged-bad corpus from #1/#2, or making a go/no-go call against the locked success criteria — all of that lives in #7.
Serving / inference endpoints, the backend registry, and per-version Modal apps — #419.
Problem / Motivation
Milestone #2 replaces the off-the-shelf rembg/u2net pipeline backing
piece_image_crop_servicewith a pottery-tuned Ultralytics YOLO-seg model. The training dataset will be materialized by #415 in YOLO-seg format with deterministic splits and a versioned dataset card. We need a script that can consume that dataset and produce a trained foreground-segmentation checkpoint on a local developer GPU (Modal GPU remains an optional path for collaborators without local hardware), ready for the evaluation harness in #417 to grade.This issue STOPS at "can produce a checkpoint with passable Ultralytics val metrics." The rembg-baseline comparison, locked success-criteria check, and production go/no-go report all happen in #7.
Proposed Solution
Add
tools/train_crop_model.pywrapping the Ultralytics YOLO-seg trainer, with two execution modes:ultralyticsdirectly on the maintainer's local GPU against a dataset on disk. This is the gating path — full training runs and the checkpoint that feeds #417 come from here.Model output
A foreground segmentation mask (per the locked design decision in milestone #2). Pottery is a single-class problem here — one foreground class, no maker-mark or sub-part heads. The tight bbox and any padding are derived downstream and are explicitly out of scope for this issue.
Configuration
Training is driven by a YAML config (e.g.
tools/configs/crop_model/<name>.yaml) covering at minimum:yolov8n-seg.pt,yolov8s-seg.pt) and image size.A small set of CLI overrides (
--epochs,--batch,--resume,--config,--run-id) is enough — no full hyperparameter sweep machinery in this ticket.Checkpoint upload
Checkpoints land at a versioned path. Local mode writes to a directory under the repo (gitignored) or a configurable
--output-dir; Modal mode writes the same layout to a Modal Volume.run_idis a deterministic slug (e.g.<config-name>-<utc-timestamp>-<git-shortsha>). The structure is what #419 will read from (regardless of whether the checkpoint sat on local disk first or directly on a Modal Volume); this issue only has to write it consistently. A small--upload-to <modal-volume|s3>flag (or a separate subcommand) is sufficient for shipping local checkpoints to a remote location when serving picks them up.Resumable runs
--resumewith no value resumes the latest checkpoint for the givenrun_id/output dir;--resume <path>resumes from an explicit checkpoint. Local resume must round-trip a killed-and-restarted process; if Modal mode is exercised, resume must also round-trip a preemption.Experiment tracking
Minimal, in-repo: every run writes
run.jsoncontaining run id, resolved hparams, dataset version, git sha, Ultralytics version, final val metrics (mAP50, mAP50-95, mask IoU on val). No W&B / MLflow integration in this ticket — keep dependencies tight. A smalltools/list_crop_model_runs.py(or equivalent subcommand) that enumeratescrop-models/*/run.jsonon the Volume is a nice-to-have but not required.Bazel wiring
tools/train_crop_model.pyis exposed as apy_binary(and the Modal entrypoint as a separate target if cleaner). Use the existing patterns intools/— no new Python deps without a follow-up (ultralyticslands via the standard dep-add flow; flag if it isn't already declared).Acceptance Criteria
tools/train_crop_model.pyexists with a documented CLI and is runnable as apy_binary.tools/configs/crop_model/(e.g.yolov8n-seg-default.yaml) is checked in and used by tests/docs.modal run …) — full Modal training is optional; it must work, but the gating training run for this milestone is local.args.yaml,run.json, andresults.csvland in a versionedcrop-models/<run_id>/directory locally;--upload-to(or equivalent) ships the same layout to a Modal Volume / S3 path.--resumeresumes the latest checkpoint after a simulated process kill and produces a continuousresults.csv.run.jsoncaptures: run id, dataset version, git sha, Ultralytics version, resolved hparams, final Ultralytics val metrics (mAP50, mAP50-95, mask IoU).rtk bazel build --config=lint //...,rtk bazel test //...). Tests cover config parsing, run-id generation, and the upload-path layout — full GPU training is exercised manually, not in CI.Out of Scope
piece_image_crop_service.pyor the existing rembg path.Dependencies
Milestone Cross-References
Part of milestone #2 — Custom Pottery Crop Model.