Skip to content

feat: add model test harness for validating cog SDK releases#2851

Merged
markphelps merged 5 commits intomainfrom
mphelps/test-harness-cog-examples
Mar 20, 2026
Merged

feat: add model test harness for validating cog SDK releases#2851
markphelps merged 5 commits intomainfrom
mphelps/test-harness-cog-examples

Conversation

@markphelps
Copy link
Contributor

@markphelps markphelps commented Mar 20, 2026

Summary

  • Adds a declarative test harness (tools/test-harness/) that automates building and running cog models against new SDK versions
  • Covers all 10 models in replicate/cog-examples and is designed to be extensible to any model from any repo
  • Models, inputs, and expected outputs are defined in manifest.yaml — adding a new model requires zero code changes

How it works

  1. Clones the target repo (shallow clone, cached per run)
  2. Patches cog.yaml to inject sdk_version (e.g. 0.17.0rc2)
  3. Runs cog build and captures build logs
  4. Runs cog predict / cog train with the specified inputs
  5. Validates output using pluggable validators (exact, contains, file_exists, json_match, etc.)
  6. Produces a console or JSON report

Usage

cd tools/test-harness
python3 -m venv .venv && source .venv/bin/activate && pip install pyyaml

# List models
python -m harness list

# Run all non-GPU models
python -m harness run --no-gpu

# Run a specific model
python -m harness run --model hello-world

# JSON report
python -m harness run --no-gpu --output json --output-file results/report.json

Initial test results (0.17.0-rc2 SDK, non-GPU)

+ hello-world         (12.6s build, 9.8s predict)
+ canary              (12.0s build, 9.3s predict)
+ blur                (9.3s build, 8.6s predict)
+ hello-image         (10.1s build, 8.6s predict)
x hello-concurrency   BUILD FAIL — emit_metric not exported from RC SDK
x hello-context       PREDICT FAIL — current_scope().context missing in coglet
x hello-train         BUILD FAIL — transient DNS error (passes on retry)
- hello-replicate     SKIPPED — no REPLICATE_API_TOKEN

5/8 passed, 1 skipped, 2 FAILED (1 transient)

Two real compatibility issues found:

  1. emit_metric is not importable from cog in the RC SDK (breaks hello-concurrency) fix(sdk): restore emit_metric as deprecated compat shim #2850
  2. Scope.context attribute doesn't exist on coglet's _sdk.Scope (breaks hello-context) feat(coglet): restore Scope.context for per-prediction context #2853

Files

File Purpose
manifest.yaml Declarative test definitions for all 10 cog-examples models
harness/cli.py CLI entry point (run, build, list commands)
harness/runner.py Core loop: clone, patch, build, predict, validate
harness/patcher.py Injects sdk_version and overrides into cog.yaml
harness/validators.py 7 validation strategies
harness/report.py Console + JSON report generation
fixtures/ Test images for blur and resnet

Add a declarative test harness (tools/test-harness/) that automates
building and running cog models against new SDK versions. Designed
for testing cog-examples against RC releases but extensible to any
model in any repo.

Models and their expected inputs/outputs are defined in manifest.yaml.
The harness clones repos, patches cog.yaml with the target sdk_version,
runs cog build + cog predict, and validates outputs using pluggable
validators (exact match, contains, file_exists, json_match, etc.).

Includes all 10 cog-examples models in the manifest and fixture images
for blur/resnet tests.
markphelps and others added 2 commits March 20, 2026 11:42
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: Mark Phelps <209477+markphelps@users.noreply.github.com>
@michaeldwan michaeldwan marked this pull request as ready for review March 20, 2026 15:58
@michaeldwan michaeldwan requested a review from a team as a code owner March 20, 2026 15:58
@michaeldwan michaeldwan marked this pull request as draft March 20, 2026 15:58
@markphelps markphelps marked this pull request as ready for review March 20, 2026 16:51
markphelps and others added 2 commits March 20, 2026 13:34
…ness

Default to downloading the latest stable cog CLI from GitHub releases
and resolving the latest stable SDK version from PyPI, skipping any
alpha/beta/rc tags. Both can be overridden via --cog-version and
--sdk-version CLI flags, or pinned in manifest.yaml defaults.
@markphelps markphelps requested a review from michaeldwan March 20, 2026 18:08
Copy link
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@markphelps markphelps merged commit 1d69171 into main Mar 20, 2026
37 checks passed
@markphelps markphelps deleted the mphelps/test-harness-cog-examples branch March 20, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants