Skip to content

AgentHarness: phaseToCondition should fall back to status.conditions when phase=UNSPECIFIED #1958

@nloke

Description

@nloke

Summary

phaseToCondition in translate.go switches on sb.GetPhase() to derive the AgentHarness Ready condition. When the OpenShell gateway returns phase=UNSPECIFIED (or omits the field entirely — see NVIDIA/OpenShell#1710), the controller permanently sets Ready=Unknown / reason=SandboxPhaseUnknown and requeues every 10 s, even though status.conditions[Ready=True] is present in the same response.

Affected code

go/core/pkg/sandboxbackend/openshell/translate.go

case openshellv1.SandboxPhase_SANDBOX_PHASE_UNKNOWN, openshellv1.SandboxPhase_SANDBOX_PHASE_UNSPECIFIED:
    return metav1.ConditionUnknown, "SandboxPhaseUnknown", msg

Observed behaviour

  • Gateway returns full status.conditions[{type:Ready, status:True}] but no phase field
  • Controller logs: reason=SandboxPhaseUnknown on every reconcile loop (every 10 s via agentHarnessNotReadyRequeue)
  • AgentHarness never reaches Ready=True despite sandbox pod running and supervisor connected

Expected behaviour

When phase is UNSPECIFIED (or missing from the wire response), the controller should inspect status.conditions as a fallback rather than immediately giving up.

Proposed fix

case openshellv1.SandboxPhase_SANDBOX_PHASE_UNKNOWN, openshellv1.SandboxPhase_SANDBOX_PHASE_UNSPECIFIED:
    // Gateway may omit the phase field (see NVIDIA/OpenShell#1710).
    // Fall back to status.conditions so an older gateway binary does not
    // permanently block AgentHarness readiness.
    for _, c := range sb.GetStatus().GetConditions() {
        if c.GetType() == "Ready" && c.GetStatus() == "True" {
            return metav1.ConditionTrue, "SandboxReady", msg
        }
    }
    return metav1.ConditionUnknown, "SandboxPhaseUnknown", msg

Evidence

# grpcurl GetSandbox — phase field absent, conditions show Ready=True
fields returned: ['metadata', 'spec', 'status']
phase present: False
status.conditions[0]: {type: Ready, status: True, reason: DependenciesReady, message: Pod is Ready; Service Exists}

Controller log before fix:

{"msg":"requeue","reason":"SandboxPhaseUnknown","agentHarness":"kagent/orch-the-orc"}

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions