Skip to content

fix(flytek8s): surface init-container waiting failures in DemystifyPending#7244

Merged
pvditt merged 1 commit intov2from
pvditt/init-container-pending-demystify
Apr 21, 2026
Merged

fix(flytek8s): surface init-container waiting failures in DemystifyPending#7244
pvditt merged 1 commit intov2from
pvditt/init-container-pending-demystify

Conversation

@pvditt
Copy link
Copy Markdown
Contributor

@pvditt pvditt commented Apr 21, 2026

Summary

  • Port of unionai/flyte#956 to the v2 branch.
  • Fixes indefinite task hang when an init container is stuck (e.g. ImagePullBackOff on a missing init image, InvalidImageName, CreateContainerConfigError).
  • demystifyPendingHelper now inspects InitContainerStatuses before ContainerStatuses, so the real init-container failure surfaces with the correct reason and per-reason grace-period handling.
  • Previously, the main container's downstream Waiting{PodInitializing} masked the init-container failure; only the coarse PodPendingTimeout would rescue the task, with a generic "timeout expired" message.

Root cause

flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go — the PodReady == False branch of demystifyPendingHelper only iterated status.ContainerStatuses. DemystifySuccess and DemystifyFailure already walk InitContainerStatuses; only the pending path was inconsistent.

Scope

  • Extract the waiting-reason switch into classifyWaitingContainer (logic move, preserving v2's existing classification — v2 uses PhaseInfoFailureWithCleanup for CreateContainerError/CreateContainerConfigError past grace, which differs from v1's retryable variants).
  • Iterate init containers first, then regular containers.
  • Grace-period clock unchanged (PodReady.LastTransitionTime).
  • No changes to outer DemystifyPending, PodPendingTimeout, or plugin GetTaskPhase.

Test plan

  • go build ./flyteplugins/go/tasks/pluginmachinery/flytek8s/...
  • go test ./flyteplugins/go/tasks/pluginmachinery/flytek8s/... -run TestDemystifyPending -v — 16 pre-existing subtests unchanged + 9 new init-container subtests green
  • go test ./flyteplugins/go/tasks/pluginmachinery/flytek8s/... — full package PASS
  • go test ./flyteplugins/go/tasks/plugins/k8s/pod/... — caller path PASS

Refs

  • Linear: ENG26-411
  • v1 PR: unionai/flyte#956

🤖 Generated with Claude Code

  • main
    • Flyte 2 #6583
      • fix(flytek8s): surface init-container waiting failures in DemystifyPending 👈

@github-actions github-actions Bot mentioned this pull request Apr 20, 2026
3 tasks
…nding

Port of unionai/flyte#956 to v2. Fixes indefinite task hang when an init
container is stuck (e.g. ImagePullBackOff on a missing init image,
InvalidImageName, CreateContainerConfigError).

demystifyPendingHelper now inspects InitContainerStatuses before
ContainerStatuses, so the real init-container failure surfaces with the
correct reason and per-reason grace-period handling. Previously, the main
container's downstream Waiting{PodInitializing} masked the init-container
failure; only the coarse PodPendingTimeout would eventually rescue the task.

The waiting-reason switch is extracted into classifyWaitingContainer
(byte-equivalent logic move, preserving v2's existing classification for
CreateContainerError / CreateContainerConfigError past grace).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
@pvditt pvditt force-pushed the pvditt/init-container-pending-demystify branch from 746349c to 8b9d01e Compare April 21, 2026 00:44
@pvditt pvditt merged commit dc56811 into v2 Apr 21, 2026
19 checks passed
@pvditt pvditt deleted the pvditt/init-container-pending-demystify branch April 21, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants