Skip to content

bug(cli): genkit dev:test-model starts tests before runtime actions are registered #4599

@yesudeep

Description

@yesudeep

Problem

genkit dev:test-model dispatches test requests to the runtime before all model actions have been registered with the reflection server. Models tested early in the sequence receive 404 errors on /api/runAction and are marked "aborted".

Reproduction

genkit dev:test-model --from-file tests/conform/google-genai/model-conformance.yaml \
  -- uv run --active tests/conform/google-genai/conformance_entry.py

Observed behavior

The genkit CLI sends POST /api/runAction requests for multiple models concurrently. Models tested before the Python reflection server finishes action registration get 404s:

Testing model: googleai/imagen-4.0-generate-001
Error: ❌ Failed: Image Output (Generation) Conformance - aborted
Testing model: googleai/gemini-2.5-flash-preview-tts
Error: ❌ Failed: TTS Test - aborted
Testing model: googleai/gemini-2.5-pro
Error: ❌ Failed: Tool Request Conformance - aborted
...
# Python server logs "Started Genkit reflection server" AFTER these tests
...
Testing model: googleai/gemini-2.5-flash
✅ Passed: Tool Request Conformance  # This model passes because server is ready
✅ Passed: Structured Output Conformance
✅ Passed: Multiturn Conformance
...
Tests Completed: 7 Passed, 19 Failed

Expected behavior

The CLI should wait until all actions listed in the spec file have been registered by the runtime before dispatching tests. The health check (GET /api/__health) passes early, but action registration may still be in progress.

Root cause

The CLI polls the health endpoint and the runtime file, but these only confirm the server is running — not that the plugin has finished registering all models. The /api/notify endpoint signals readiness, but the CLI appears to start testing immediately after receiving the first health response, before all models from the spec are available.

Suggested fix

After the health check passes, the CLI should verify that the actions needed for testing are actually registered. Options:

  1. Poll /api/actions — Check that the model actions referenced in the conformance spec are present before starting tests.
  2. Wait for /api/notify — Ensure the CLI waits for the runtime to send its notify signal before dispatching tests.
  3. Retry with backoff — When a runAction returns 404, retry after a short delay instead of marking "aborted".

Impact

  • 19 out of 26 conformance tests fail due to timing, not actual model issues.
  • Only the last model in the spec passes because the server is ready by then.
  • This affects all non-JS runtimes (Python, Go) which may have slower startup.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions