Summary
phaseToCondition in translate.go switches on sb.GetPhase() to derive the AgentHarness Ready condition. When the OpenShell gateway returns phase=UNSPECIFIED (or omits the field entirely — see NVIDIA/OpenShell#1710), the controller permanently sets Ready=Unknown / reason=SandboxPhaseUnknown and requeues every 10 s, even though status.conditions[Ready=True] is present in the same response.
Affected code
go/core/pkg/sandboxbackend/openshell/translate.go
case openshellv1.SandboxPhase_SANDBOX_PHASE_UNKNOWN, openshellv1.SandboxPhase_SANDBOX_PHASE_UNSPECIFIED:
return metav1.ConditionUnknown, "SandboxPhaseUnknown", msg
Observed behaviour
- Gateway returns full
status.conditions[{type:Ready, status:True}] but no phase field
- Controller logs:
reason=SandboxPhaseUnknown on every reconcile loop (every 10 s via agentHarnessNotReadyRequeue)
- AgentHarness never reaches
Ready=True despite sandbox pod running and supervisor connected
Expected behaviour
When phase is UNSPECIFIED (or missing from the wire response), the controller should inspect status.conditions as a fallback rather than immediately giving up.
Proposed fix
case openshellv1.SandboxPhase_SANDBOX_PHASE_UNKNOWN, openshellv1.SandboxPhase_SANDBOX_PHASE_UNSPECIFIED:
// Gateway may omit the phase field (see NVIDIA/OpenShell#1710).
// Fall back to status.conditions so an older gateway binary does not
// permanently block AgentHarness readiness.
for _, c := range sb.GetStatus().GetConditions() {
if c.GetType() == "Ready" && c.GetStatus() == "True" {
return metav1.ConditionTrue, "SandboxReady", msg
}
}
return metav1.ConditionUnknown, "SandboxPhaseUnknown", msg
Evidence
# grpcurl GetSandbox — phase field absent, conditions show Ready=True
fields returned: ['metadata', 'spec', 'status']
phase present: False
status.conditions[0]: {type: Ready, status: True, reason: DependenciesReady, message: Pod is Ready; Service Exists}
Controller log before fix:
{"msg":"requeue","reason":"SandboxPhaseUnknown","agentHarness":"kagent/orch-the-orc"}
Environment
Summary
phaseToConditionintranslate.goswitches onsb.GetPhase()to derive the AgentHarnessReadycondition. When the OpenShell gateway returnsphase=UNSPECIFIED(or omits the field entirely — see NVIDIA/OpenShell#1710), the controller permanently setsReady=Unknown / reason=SandboxPhaseUnknownand requeues every 10 s, even thoughstatus.conditions[Ready=True]is present in the same response.Affected code
go/core/pkg/sandboxbackend/openshell/translate.goObserved behaviour
status.conditions[{type:Ready, status:True}]but nophasefieldreason=SandboxPhaseUnknownon every reconcile loop (every 10 s viaagentHarnessNotReadyRequeue)Ready=Truedespite sandbox pod running and supervisor connectedExpected behaviour
When
phaseisUNSPECIFIED(or missing from the wire response), the controller should inspectstatus.conditionsas a fallback rather than immediately giving up.Proposed fix
Evidence
Controller log before fix:
Environment
ghcr.io/nvidia/openshell/gateway:latest=sha256:2d1e54cb