feat: add model test harness for validating cog SDK releases by markphelps · Pull Request #2851 · replicate/cog

markphelps · 2026-03-20T15:40:56Z

Summary

Adds a declarative test harness (tools/test-harness/) that automates building and running cog models against new SDK versions
Covers all 10 models in replicate/cog-examples and is designed to be extensible to any model from any repo
Models, inputs, and expected outputs are defined in manifest.yaml — adding a new model requires zero code changes

How it works

Clones the target repo (shallow clone, cached per run)
Patches cog.yaml to inject sdk_version (e.g. 0.17.0rc2)
Runs cog build and captures build logs
Runs cog predict / cog train with the specified inputs
Validates output using pluggable validators (exact, contains, file_exists, json_match, etc.)
Produces a console or JSON report

Usage

cd tools/test-harness
python3 -m venv .venv && source .venv/bin/activate && pip install pyyaml

# List models
python -m harness list

# Run all non-GPU models
python -m harness run --no-gpu

# Run a specific model
python -m harness run --model hello-world

# JSON report
python -m harness run --no-gpu --output json --output-file results/report.json

Initial test results (0.17.0-rc2 SDK, non-GPU)

+ hello-world         (12.6s build, 9.8s predict)
+ canary              (12.0s build, 9.3s predict)
+ blur                (9.3s build, 8.6s predict)
+ hello-image         (10.1s build, 8.6s predict)
x hello-concurrency   BUILD FAIL — emit_metric not exported from RC SDK
x hello-context       PREDICT FAIL — current_scope().context missing in coglet
x hello-train         BUILD FAIL — transient DNS error (passes on retry)
- hello-replicate     SKIPPED — no REPLICATE_API_TOKEN

5/8 passed, 1 skipped, 2 FAILED (1 transient)

Two real compatibility issues found:

emit_metric is not importable from cog in the RC SDK (breaks hello-concurrency) fix(sdk): restore emit_metric as deprecated compat shim #2850
Scope.context attribute doesn't exist on coglet's _sdk.Scope (breaks hello-context) feat(coglet): restore Scope.context for per-prediction context #2853

Files

File	Purpose
`manifest.yaml`	Declarative test definitions for all 10 cog-examples models
`harness/cli.py`	CLI entry point (`run`, `build`, `list` commands)
`harness/runner.py`	Core loop: clone, patch, build, predict, validate
`harness/patcher.py`	Injects `sdk_version` and overrides into cog.yaml
`harness/validators.py`	7 validation strategies
`harness/report.py`	Console + JSON report generation
`fixtures/`	Test images for blur and resnet

Add a declarative test harness (tools/test-harness/) that automates building and running cog models against new SDK versions. Designed for testing cog-examples against RC releases but extensible to any model in any repo. Models and their expected inputs/outputs are defined in manifest.yaml. The harness clones repos, patches cog.yaml with the target sdk_version, runs cog build + cog predict, and validates outputs using pluggable validators (exact match, contains, file_exists, json_match, etc.). Includes all 10 cog-examples models in the manifest and fixture images for blur/resnet tests.

tools/test-harness/harness/runner.py

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Mark Phelps <209477+markphelps@users.noreply.github.com>

tools/test-harness/manifest.yaml

…ness Default to downloading the latest stable cog CLI from GitHub releases and resolving the latest stable SDK version from PyPI, skipping any alpha/beta/rc tags. Both can be overridden via --cog-version and --sdk-version CLI flags, or pinned in manifest.yaml defaults.

michaeldwan

nice!

github-code-quality bot found potential problems Mar 20, 2026

View reviewed changes

tools/test-harness/harness/runner.py Fixed Show fixed Hide fixed

markphelps and others added 2 commits March 20, 2026 11:42

docs: remove internal reference from README

61111a1

Potential fix for pull request finding 'Empty except'

8dff4e7

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Mark Phelps <209477+markphelps@users.noreply.github.com>

michaeldwan marked this pull request as ready for review March 20, 2026 15:58

michaeldwan requested a review from a team as a code owner March 20, 2026 15:58

michaeldwan marked this pull request as draft March 20, 2026 15:58

markphelps marked this pull request as ready for review March 20, 2026 16:51

markphelps commented Mar 20, 2026

View reviewed changes

tools/test-harness/manifest.yaml Outdated Show resolved Hide resolved

markphelps and others added 2 commits March 20, 2026 13:34

Merge branch 'main' into mphelps/test-harness-cog-examples

7f24b86

markphelps requested a review from michaeldwan March 20, 2026 18:08

michaeldwan approved these changes Mar 20, 2026

View reviewed changes

markphelps merged commit 1d69171 into main Mar 20, 2026
37 checks passed

markphelps deleted the mphelps/test-harness-cog-examples branch March 20, 2026 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add model test harness for validating cog SDK releases#2851

feat: add model test harness for validating cog SDK releases#2851
markphelps merged 5 commits intomainfrom
mphelps/test-harness-cog-examples

markphelps commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

michaeldwan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

markphelps commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Usage

Initial test results (0.17.0-rc2 SDK, non-GPU)

Files

Uh oh!

Uh oh!

Uh oh!

michaeldwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markphelps commented Mar 20, 2026 •

edited

Loading