New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd-scaler causing fluentd pod deletions and messes with ds-controller #61190
Comments
Marked this as release-blocker as at least some part of it seems like a correctness issue (i.e fluentd pods dying and getting recreated). |
Thanks for filing this!
The only thing that was changing (apart from ds/fluentd-gcp status) was the resource version. |
I modified fluentd scaler to use --dry-run in |
Btw, addon manager sends tons of PATCH requests for all resources every minute anyway, so even though I will fix fluent-scaler not to send them, it won't solve everything. |
Fixes kubernetes#61190. This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`.
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Bump fluentd-gcp-scaler version **What this PR does / why we need it**: This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #61190 **Special notes for your reviewer**: **Release note**: ```release-note NONE ``` cc @shyamjvs
Should this issue actually be closed? It was auto-closed by the bot, but nobody has confirmed that the actual issue is fixed. |
I'll let @shyamjvs to confirm (looks like he'll be back on Monday) |
/status in-progress |
Yes, we should keep this issue open until we fully understand what's going wrong and fix the root-cause. |
Ok.. Seems like I missed Wojtek's comment above. |
So I just checked the ds-controller code and it seems like we're updating the ds status only if this condition is being violated: kubernetes/pkg/controller/daemon/daemon_controller.go Lines 1011 to 1021 in 622ad35
Which means one of those fields is probably mismatching. I'll try to add some logs and run the test at --v=4 to get more info. |
anything is possible, but we do have tests ensuring this works properly (no-op updates do not actually result in an update+resourceVersion bump in etcd): https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store_test.go#L674-L728 |
So I ran some 100-node experiments against #61472 with more debug logs and I think I now understand the reason for this. The controller-manager logs with the scaler and without the scaler mainly show the following difference:
This logs show-up when the scaler is active. And these pod deletions are happening as part of rolling-upgrade of daemonset to newer version (the one having resources set). You can see they're getting deleted one-by-one, which also explains why it takes so long (and so are happening even 30 mins after the cluster creation). And the reason why they're deleted one-by-one is because ds-controller has a rolling-upgrade strategy where
And the increased mem-usage FMU is because of the rolling-upgrade performed by ds-controller (over the prolonged period) - as it's the only difference I observe from logs |
I spoke offline with @x13n and suggested that we should increase maxUnavailable for the fluentd daemonset to a large enough value so that we're not bottlenecked by it. My reasoning is:
I'm going to make that change and test it against my PR (thanks @x13n for pointing that we can change maxUnavailable directly in ds config). |
So increasing maxUnavailable indeed seems to help (ref: c-m logs):
All the 101 replicas were deleted at once instead of one-by-one and happened pretty much within 2 mins of the original fluentd-ds seen by the controller:
We should get that PR into 1.10. |
Thanks for confirming this! Did increasing maxUnavailable also reduce c-m memory usage? |
So the run was still over the threshold, but I was expecting it as I ran c-m with increased verbosity (which can cause that). |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Increase fluentd rolling-upgrade maxUnavailable to large value ~For testing wrt #61190 (comment) Fixes issue #61190 wrt slow rolling-upgrade /cc @x13n @wojtek-t /sig instrumentation /kind bug /priority critical-urgent ```release-note NONE ```
@shyamjvs Is there anything else to do here or should this be closed now? |
Let's close it once we're sure there are no more such failures. We may also
need to bump the threshold.
…On Mon, Mar 26, 2018, 9:46 AM Kubernetes Submit Queue < ***@***.***> wrote:
[MILESTONENOTIFIER] Milestone Issue Labels *Incomplete*
@shyamjvs <https://github.com/shyamjvs> @x13n <https://github.com/x13n>
*Action required*: This issue requires label changes. If the required
changes are not made within 1 day, the issue will be moved out of the v1.11
milestone.
*priority*: Must specify exactly one of priority/critical-urgent,
priority/important-longterm or priority/important-soon.
Help
- Additional instructions
<https://github.com/kubernetes/community/blob/master/contributors/devel/release/issues.md>
- Commands for setting labels <https://go.k8s.io/bot-commands>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#61190 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEIhk0QUt8HMxHUAC9tpGENpS5rCmiusks5tiJztgaJpZM4Sq8RZ>
.
|
So bumping maxUnavailable for fluentd rolling-upgrade in #61472 is causing couple of issues:
So at this point, we might want to choose among one of the 2 options:
Thoughts? |
TBH, I don't know why it was merged without waiting for opinions... I really vote for reverting that.
We don't want this - this may mean making cluster "not fully usable" from user perspective.Which is not really what we want to test. |
SGTM. Let's revert it and increase our thresholds for the time-being. The discussion about changing fluentd rolling-upgrades can continue to happen in the background. |
…e-change Automatic merge from submit-queue (batch tested with PRs 60519, 61099, 61218, 61166, 61714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Revert "Increase fluentd rolling-upgrade maxUnavailable to large value" This reverts commit 7dd6adc. Ref #61190 (comment) /cc @wojtek-t ```release-note NONE ```
Automatic merge from submit-queue (batch tested with PRs 60499, 61715, 61688, 61300, 58787). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Increase cpu/mem thresholds for c-m in density test Follows #61190 (comment) /cc @wojtek-t ```release-note NONE ```
[MILESTONENOTIFIER] Milestone Issue Needs Approval @shyamjvs @x13n @kubernetes/sig-instrumentation-misc Action required: This issue must have the Issue Labels
|
This issue is now fixed after my above PRs. We're having continuous green runs for the job |
Fixes kubernetes#61190. This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`.
Forking from #60500 (comment):
To summarize, here's what we observed:
PUT pod-status
calls from respective kubelets (but maybe that's expected).cc @kubernetes/sig-instrumentation-bugs @crassirostris @liggitt
/priority critical-urgent
/assign @x13n
The text was updated successfully, but these errors were encountered: