Skip to content

(Bug) Fix InferenceService readiness bootstrap from missing workload conditions#560

Open
YouNeedCryDear wants to merge 4 commits intomainfrom
fix/isvc-status
Open

(Bug) Fix InferenceService readiness bootstrap from missing workload conditions#560
YouNeedCryDear wants to merge 4 commits intomainfrom
fix/isvc-status

Conversation

@YouNeedCryDear
Copy link
Copy Markdown
Collaborator

@YouNeedCryDear YouNeedCryDear commented Apr 6, 2026

What this PR does

  • converts missing workload readiness conditions into explicit Unknown conditions instead of silent no-ops
  • seeds top-level and component InferenceService conditions on first observation so fresh raw services do not publish readiness before authoritative workload status exists
  • updates readiness-related tests to cover the missing-condition behavior

Why we need it

Fresh raw InferenceService objects can report Ready before DeploymentAvailable / LeaderWorkerSetAvailable has been populated. This closes the bootstrap hole and makes missing workload status block readiness instead of leaking through as Ready=True.

Fixes #545

How to test

  • go test ./pkg/controller/v1beta1/inferenceservice/status
  • go test ./pkg/controller/v1beta1/inferenceservice/reconcilers/deployment
  • go test ./pkg/controller/v1beta1/inferenceservice/reconcilers/lws
  • go test ./pkg/controller/v1beta1/inferenceservice/... currently fails in the envtest-backed controller suite in this sandbox with listen tcp 127.0.0.1:0: bind: operation not permitted

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added inferenceservice InferenceService controller changes controller Controller changes multi-node Multi-node/distributed inference changes tests Test changes config Configuration changes labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ome-manager and model-agent images to v0.1.5 and refactors status condition handling to use explicit Unknown states during initialization and retrieval. It also adds logic to re-fetch Deployment and LeaderWorkerSet resources to capture status updates. The review feedback points out that context.TODO() should be replaced with proper context propagation and warns that immediate re-fetching might return cached data rather than the updated status from the API server.

Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/lws/lws_reconciler.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/lws/lws_reconciler.go Outdated
@YouNeedCryDear YouNeedCryDear changed the title Fix InferenceService readiness bootstrap from missing workload conditions (Bug) Fix InferenceService readiness bootstrap from missing workload conditions Apr 6, 2026
@YouNeedCryDear YouNeedCryDear force-pushed the fix/isvc-status branch 2 times, most recently from fb1cd13 to b244392 Compare April 8, 2026 17:28
@github-actions github-actions Bot added the documentation Documentation changes label Apr 8, 2026
@YouNeedCryDear YouNeedCryDear force-pushed the fix/isvc-status branch 2 times, most recently from c2709da to 724d7b1 Compare April 14, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes controller Controller changes documentation Documentation changes inferenceservice InferenceService controller changes multi-node Multi-node/distributed inference changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OME raw InferenceService can report Ready before deployment readiness is populated

1 participant