-
Notifications
You must be signed in to change notification settings - Fork 663
Description
Problem
genkit dev:test-model dispatches test requests to the runtime before all model actions have been registered with the reflection server. Models tested early in the sequence receive 404 errors on /api/runAction and are marked "aborted".
Reproduction
genkit dev:test-model --from-file tests/conform/google-genai/model-conformance.yaml \
-- uv run --active tests/conform/google-genai/conformance_entry.pyObserved behavior
The genkit CLI sends POST /api/runAction requests for multiple models concurrently. Models tested before the Python reflection server finishes action registration get 404s:
Testing model: googleai/imagen-4.0-generate-001
Error: ❌ Failed: Image Output (Generation) Conformance - aborted
Testing model: googleai/gemini-2.5-flash-preview-tts
Error: ❌ Failed: TTS Test - aborted
Testing model: googleai/gemini-2.5-pro
Error: ❌ Failed: Tool Request Conformance - aborted
...
# Python server logs "Started Genkit reflection server" AFTER these tests
...
Testing model: googleai/gemini-2.5-flash
✅ Passed: Tool Request Conformance # This model passes because server is ready
✅ Passed: Structured Output Conformance
✅ Passed: Multiturn Conformance
...
Tests Completed: 7 Passed, 19 Failed
Expected behavior
The CLI should wait until all actions listed in the spec file have been registered by the runtime before dispatching tests. The health check (GET /api/__health) passes early, but action registration may still be in progress.
Root cause
The CLI polls the health endpoint and the runtime file, but these only confirm the server is running — not that the plugin has finished registering all models. The /api/notify endpoint signals readiness, but the CLI appears to start testing immediately after receiving the first health response, before all models from the spec are available.
Suggested fix
After the health check passes, the CLI should verify that the actions needed for testing are actually registered. Options:
- Poll
/api/actions— Check that the model actions referenced in the conformance spec are present before starting tests. - Wait for
/api/notify— Ensure the CLI waits for the runtime to send its notify signal before dispatching tests. - Retry with backoff — When a
runActionreturns 404, retry after a short delay instead of marking "aborted".
Impact
- 19 out of 26 conformance tests fail due to timing, not actual model issues.
- Only the last model in the spec passes because the server is ready by then.
- This affects all non-JS runtimes (Python, Go) which may have slower startup.
Related
- Discovered during: feat(py/tools): add conform CLI with multi-runtime support #4593 (conform tool)
- Python graceful shutdown fix: fix(genkit): handle graceful SIGTERM shutdown in dev_runner #4597 / fix(genkit): dev_runner raises RuntimeError on graceful SIGTERM shutdown #4598
Metadata
Metadata
Assignees
Labels
Type
Projects
Status