Skip to content

Fix copilot TerminationGracePeriodSeconds set to nanoseconds instead of seconds#7183

Merged
pingsutw merged 2 commits intoflyteorg:masterfrom
bergman:jocke/its-been-114k-years
Apr 9, 2026
Merged

Fix copilot TerminationGracePeriodSeconds set to nanoseconds instead of seconds#7183
pingsutw merged 2 commits intoflyteorg:masterfrom
bergman:jocke/its-been-114k-years

Conversation

@bergman
Copy link
Copy Markdown
Contributor

@bergman bergman commented Apr 9, 2026

Why are the changes needed?

AddCoPilotToPod sets TerminationGracePeriodSeconds by casting a time.Duration (which is stored as nanoseconds) directly to *int64:

coPilotPod.TerminationGracePeriodSeconds = (*int64)(&cfg.Timeout.Duration)

TerminationGracePeriodSeconds expects seconds, not nanoseconds. With the default 1-hour timeout, this results in a grace period of ~114,000 years (3,600,000,000,000 nanoseconds interpreted as seconds) instead of the intended 3,600 seconds.

This causes pods with copilot sidecars to stay in Terminating state effectively forever when a node is drained, since Kubernetes waits for the grace period to expire before sending SIGKILL.

We discovered this while rotating GKE node pools in production. Pods with copilot sidecars were stuck in Terminating state indefinitely, blocking node drain. Kubernetes was honoring the ~114,000-year grace period and never sending SIGKILL, making it impossible to cleanly decommission nodes without force-deleting pods.

What changes were proposed in this pull request?

Replace the unsafe pointer cast with a proper conversion using Duration.Seconds():

coPilotPod.TerminationGracePeriodSeconds = ptr.To(int64(cfg.Timeout.Duration.Seconds()))

This also eliminates the pointer aliasing into the config struct that the original code introduced.

How was this patch tested?

Added a unit test TestAddCoPilotToPod/grace-period that sets a 1-hour timeout and asserts that TerminationGracePeriodSeconds is 3600 (seconds), not 3600000000000 (nanoseconds).

Labels

  • fixed

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#6501 — the PR that introduced the bug

bergman and others added 2 commits April 9, 2026 15:50
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Joakim Bergman <jocke@spotify.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Joakim Bergman <jocke@spotify.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.95%. Comparing base (22ac7ca) to head (2f025c8).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7183   +/-   ##
=======================================
  Coverage   56.95%   56.95%           
=======================================
  Files         931      931           
  Lines       58188    58188           
=======================================
  Hits        33143    33143           
  Misses      22004    22004           
  Partials     3041     3041           
Flag Coverage Δ
unittests-datacatalog 53.51% <ø> (ø)
unittests-flyteadmin 53.14% <ø> (ø)
unittests-flytecopilot 43.06% <ø> (ø)
unittests-flytectl 64.02% <ø> (ø)
unittests-flyteidl 75.71% <ø> (ø)
unittests-flyteplugins 60.17% <100.00%> (ø)
unittests-flytepropeller 53.65% <ø> (ø)
unittests-flytestdlib 63.02% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pingsutw pingsutw merged commit ed8aedb into flyteorg:master Apr 9, 2026
49 checks passed
@welcome
Copy link
Copy Markdown

welcome bot commented Apr 9, 2026

Congrats on merging your first pull request! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants