Skip to content

fix: disable APF feature flag to prevent readyz-blocking informers#30

Open
scotwells wants to merge 1 commit into
mainfrom
fix/disable-apf-before-apply-to
Open

fix: disable APF feature flag to prevent readyz-blocking informers#30
scotwells wants to merge 1 commit into
mainfrom
fix/disable-apf-before-apply-to

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

Problem

The IPAM apiserver pods are stuck 0/1 Ready in staging. The readiness probe returns HTTP 500 indefinitely:

informer-sync failed: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]

Why the previous fix (#29) didn't work

PR #29 moved genericConfig.FlowControl = nil to after ApplyTo, reasoning that ApplyTo re-initializes the field. That was correct but incomplete.

The real problem: FeatureOptions.ApplyTo calls utilflowcontrol.New(informers, ...), which registers FlowSchema and PriorityLevelConfiguration event handlers directly on the SharedInformerFactory. Setting FlowControl = nil afterward removes the controller reference but does nothing to the factory — those informers remain registered and appear in the informer-sync readyz check, where they block readyz because the IPAM apiserver has no flowcontrol.apiserver.k8s.io access.

Fix

Set EnablePriorityAndFairness = false on RecommendedOptions.Features in NewIPAMServerOptions(), before ApplyTo is ever called. This causes FeatureOptions.ApplyTo to skip the utilflowcontrol.New() call entirely — the informers are never registered, and readyz is unblocked.

The now-redundant genericConfig.FlowControl = nil is removed.

🤖 Generated with Claude Code

The previous fix (nil FlowControl after ApplyTo) was incomplete.
FeatureOptions.ApplyTo calls utilflowcontrol.New(), which registers
FlowSchema and PriorityLevelConfiguration event handlers on the shared
informer factory before FlowControl is ever set. Those informers then
appear in the informer-sync readyz check and block readyz indefinitely
because the IPAM apiserver has no access to flowcontrol.apiserver.k8s.io.

Setting EnablePriorityAndFairness=false before ApplyTo prevents
utilflowcontrol.New() from being called at all, so the informers are
never registered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scotwells
Copy link
Copy Markdown
Contributor Author

Closing — this fix is not needed. The APF informers only failed because AUTHENTICATION_SKIP_LOOKUP=true was manually set in the staging deployment, which broke the kube-apiserver client setup and left the informers unable to sync. The quota and activity services both run with APF enabled and work correctly in staging. The real fix is reverting the skip-lookup drift to match the base config defaults (false).

@scotwells scotwells closed this May 24, 2026
@scotwells
Copy link
Copy Markdown
Contributor Author

Re-opening. Neither the quota nor the activity service has a NetworkPolicy, so they can't confirm the default:443 egress rule is sufficient for APF informers in GKE. IPAM is the only aggregated apiserver behind a NetworkPolicy, and the informer failure is likely a genuine connectivity issue, not a side-effect of skip-lookup=true. Disabling APF is still the correct fix.

@scotwells scotwells reopened this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant