-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: KafkaRebalance resource reconciliation gets stuck if CC pod is restarted during rebalance #10091
Comments
Thanks for raising the issue. Could you please format the YAMLs to make them readable? Thanks. |
Thanks, fixed the formatting. |
@urbandan I tried it with the latest operator(0.41.0) and was not able to get this error. For me when you increase the node pool brokers to 4 then the |
@ShubhamRwt Why would the Kafka cluster move to |
@scholzj I meant when we scale up the nodes then pods corresponding to |
No, the Kafka cluster should stay in Ready while scaling up. |
Also, look at it differently -> what happens if you simply delete the CC pod while a rebalance is in progress? |
Discussed on the community call on 16.5.2024: (Assuming this can be reproduced - see the discussion above), this should be addressed by failing the rebalance or restarting the process if possible. (Let's keep it in triage for next time to make sure it is reproducible and discuss the options) Note: This should be already handled by the Topic Operator when changing the replication factor, where the TO detects this and automatically restarts the RF change. @fvaleri will double-check. |
Yeah, we have this corner case covered in the Topic Operator. TL/DR: Cruise Control has no memory of the task that it was working on before restart, so the operator is responsible for detecting this event and resending any ongoing task. The operator periodically calls the Note that there is a small chance that the task could have been completed just before Cruise Control restarted, but the operator didn't had time to know that. In this case, the new task submission would be a duplicate. This is not a problem in practice, as the work has already been done, and the duplicated task would be completed quickly (no-op). |
Triaged on 30/5/2024: it seems to be a bug which needs to be investigated and reproduced first. @ShubhamRwt let's get in touch to understand how you tried to reproduce it with no success and what we can do. |
Hi, I was finally able to reproduce this issue with both the latest branch as well the 0.40.0, I assume this issue hapens as soon as the cruise control pod is shutdown and then when in
which tries to get the
@scholzj @ppatierno WDYT, what should be the best approach? |
I think approach 2 is towards the suggestion made by @fvaleri and what we have in the TO. |
Hi @ppatierno, I think the best way should be to ask for a new proposal since we have now added some new broker. Regading re-issuing rebalancing, mean that we are rebalancing based on the previous proposal which was with less brokers(In case we can do it)? |
@ShubhamRwt when scaling up, CC internal model will take some time to update with the new broker metrics, so you will likely get a transient error, but this should be already handled. If the spec changed, I think you have to stop the current rebalance, delete and create a new one. Otherwise, you can simply refresh the existing one. |
I think requesting new proposal sounds reasonable. But you should consider it from a wider perspective. The scale-up described in this issue is just one scenario where it happens. But I assume the same will happen if the CC Pod is just restarted or evicted during the rebalance. |
I think that if the original proposal was reviewed and manually triggered, an automatic new proposal + trigger is counter-intuitive. Especially if the proposal is significantly different than the previously reviewed one. |
Yeah, I definitely agree with this. If there is a new proposal, it would need a new approval (unless the auto-approval is enabled). |
+1 for a new proposal but agree with Jakub. Scaling up is just a use case, the fix should apply to other situation like (CC restarting). |
Until CC can handle dynamic configuration updates without restarting or we can devise a simple method of determining the cause of a CC restart (whether it be due to a scaling operation or CC crash/eviction) it may make sense to to just recover from all CC restarts in the same way (assuming the performance costs are reasonable). From what is discussed above, we appear to have two options to recover from a CC restart: (A) Request a new proposal
(B) Resend existing proposal of
Given Cruise Control's window samples persist in Kafka, the delay of proposal generation shouldn't be long. Another thing worth thinking about is if CC did support dynamic configuration updates without restart. Would we stop an ongoing rebalance because a new broker was added? What about removed? Should scaling operations be blocked when there is an ongoing rebalance?
+1 sounds reasonable to me |
So summarising all the comments, what I understood is: Different scenarios:
|
@ShubhamRwt I'm not sure you can really distinguish what happened with the pod and why. You should probably define this based on some Cruise cOntrol response or something. How will you in reality distinguish between scenarios 1 and 2? I also don't undertsand why you want to delete and recreate any rebalances in the scenario 2. |
I was taking Fede's point into consideration while summarising. Looking into it more, you are right @scholzj, I had something in my mind to make the difference between both the scenarios but looks like it will fail :(. |
Hi, I was trying to fix this bug. So the fix that I was working on works fine if CC pods is restarted in between a rebalance. But in one of the scenario, where we scale the nodes in the nodepool in middle of rebalance and then wait while the CC pod is restarting and the new broker is coming up, the
|
And which part of the condition fails you? |
Both the conditions, Here are some logs which says that the pods were not able to roll properly -> |
I do not see that log message anywhere in the log. Where does the code come from? Which class? Or maybe I'm not sure what are you trying to say with the log. The reconciliation in the log failed, so sure, it will be not ready. But that has nothing to do with scaling. |
KafkaRebalanceAssemblyOperator |
|
@ShubhamRwt Please start a discussion on Slack in #strimzi-dev on this. That should be better than doing chat on the issue. I cannot do that as I do not know which account is actually you. 😉 |
Bug Description
Encountered a weird behaviour in 0.40. If the CC pod is restarted in the middle of a rebalance, the KafkaRebalance resource reconciliation gets stuck with the following cluster operator logs:
Steps to reproduce
Expected behavior
No response
Strimzi version
0.40.0
Kubernetes version
v1.28.7+k3s1
Installation method
helm
Infrastructure
Bare-metal
Configuration files and logs
Additional context
No response
The text was updated successfully, but these errors were encountered: