New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CruiseControl CPU Utilization Computation #5951
Comments
(Updated: Sry uploaded empty files the first time...) |
Hi @roland-haeusler, configuring CPU capacity as number of cores is optional and useful when you have heterogeneous CPUs, which is not the case with Strimzi. What you see from We set CPU capacity as a percentage of each CPU/core (100% by default), which I think should be multiplied by the number of actual Kafka CPUs in order to get the overall capacity (CC assumes same cores on each broker). The CPU utilization computation is correct, as we are setting CC in "Kubernetes mode", which is needed when running in a container environment (see linkedin/cruise-control#1277).
It looks like you already raised The problem seems to be in how Strimzi/CC computes the allowed CPU capacity, which assumes that only 1 CPU/core is available (no matter how many Kafka CPU/cores you configure). I was able to reproduce the issue using a cluster of 3 brokers with 2 CPUs each and 90% threshold (overall: 600 capacity, 540 allowed capacity) and sending a good amount of records in order to trigger CPU utilization. For debugging purpose, I'm also setting apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
annotations:
strimzi.io/rebalance: refresh
creationTimestamp: "2021-11-28T11:48:55Z"
generation: 1
labels:
strimzi.io/cluster: my-cluster
name: my-rebalance
namespace: test
resourceVersion: "757844"
uid: 1cb6fa11-7bdb-4016-8645-18fd7870a0cd
spec: {}
status:
conditions:
- lastTransitionTime: "2021-11-28T13:31:32.751239Z"
message: 'Error for request: my-cluster-cruise-control.streams.svc:9090/kafkacruisecontrol/rebalance?json=true&dryrun=true&verbose=true&skip_hard_goal_check=false.
Server returned: Error processing POST request ''/rebalance'' due to: ''com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException:
[CpuCapacityGoal] Insufficient capacity for cpu (Utilization 392.71, Allowed
Capacity 270.00, Threshold: 0.90). Add at least 2 brokers with the same cpu
capacity (100.00) as broker-0. Add at least 2 brokers with the same cpu capacity
(100.00) as broker-0.''.'
reason: CruiseControlRestException
status: "True"
type: NotReady
observedGeneration: 1 As you can see, the allowed capacity is 270, which is computed as:
But I have 2 CPUs per-broker and allowed capacity should be 540, enough to satisfy the utilization (392.71). @tomncooper @kyguy WDYT? |
The CC code that seems to fail is the following: Where we compute the CPU limit as: double cpuLimit = cpuQuota / getCpuPeriod(); According to the container configuration, this should give 2 in my case (quota/period): $ kubectl exec my-cluster-kafka-0 -c kafka -- cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
200000
$ kubectl exec my-cluster-kafka-0 -c kafka -- cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000 |
Hi @roland-haeusler, Strimzi assumes broker pod resources to be homogeneous. Therefore, it is recommended to always set Thanks for the detailed investigative debugging @fvaleri! Is the In hindsight, this information should be better documented in the Cruise Control section to avoid confusion, maybe even having |
Thanks for the initial feedback @kyguy.
Exactly, I would also add that for any stateful application like Kafka, it is always good to have requests==limits to avoid the risk of pods being killed when the node is under pressure and needs some resources back (BestEffort vs Guaranteed QoS).
Yes. Attached you can find my test Kafka CR. |
Another example of the same issue. Here it happens on 1 out of 3 brokers (limit.cpu=2, threshold=0.9). As you can see, only 1 of the 2 available CPUs is used to compute capacity limit:
|
@kyguy I think the bug is here: The updateCachedNumCoresByBroker always returns default numCpuCores (1) when NOT using |
You are right @fvaleri, those numbers don't add up. I haven't discovered anything definite yet but I wanted to share what I have been thinking about so far.
This happens because Strimzi generates a CPU percentage config [1] instead of a CPU core config [2] Cruise Control is dependent on Strimzi to provide the number of CPU cores and when Strimzi doesn't, the numCpuCores defaults to We could attempt to fix this by adding some logic to Strimzi to generate a CPU core config here [3] when the However, we have one other problem that I am still trying to make sense of and that is:
The CPU utilization metric should initially be between [0.0,1.0] when it is collected here [4] and then from [0,100] when it is multiplied by Anyways, I'll continue looking into this tomorrow. If you think or find anything else in the meantime, let me know! [1] https://github.com/linkedin/cruise-control/blob/4e5927b48bf2581ab76acbbecbf42b355b871b65/config/capacity.json#L7 |
Thanks @kyguy, I'll continue my analysis as soon as I have some time. In the meantime, as a workaround, we can exclude CPU goals from the preset hard/default goals like this: cruiseControl:
config:
hard.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal
default.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal This works in my test env with all brokers at around 150% CPU utilization. |
I believe I have found the bug here. [1] The value generated here is supposed to be in the interval between [0.0,1.0], however, it is possible for this value to exceed 1.0. Looking at the CPU util formula:
Plugging in the following numbers from the tests run above: jvmMeasuredCpuUtil: 1.0 (We will assume this max value for ease of demonstration) We can see how this formula can produce values that violate the [0.0, 1.0] interval:
The cpuUtil value of Thanks @fvaleri and @roland-haeusler for finding this and helping debug/investigate! We will put in a fix for this for Cruise Control as soon as possible and make sure Strimzi gets updated Cruise Control build. Until then we can stick with the workaround provided by @fvaleri! |
On second thought, the formula should still hold and provide correct values between the interval of [0.0, 1.0] so long as the behavior of In the example above, That would mean that an upgrade in OpenJDK could have caused the underlying issue. In any case, I am following up the issue with upstream Cruise Control project here [2] [1] https://bugs.openjdk.java.net/browse/JDK-8226575 |
@kyguy thanks for your additional insights. One fact is that the original use case and all my tests show that the reported available CPU capacity is wrong. Original use case (brokers=6, requests.cpu=limit.cpu=3, cpu.capacity.threshold=1.0).
My tests (brokers=3, requests.cpu=limit.cpu=2, cpu.capacity.threshold=0.9).
It is like we are always considering 1 CPU per broker, no matter what you set in Kafka CR (requests.cpu=limit.cpu). If that is confirmed, a not so easy way to fix this would be to use the ncores capacity configuration type, reconciling that value with what we have in Kafka CR configuration in case of changes. |
There are two issues going on here: (1) Strimzi Cruise Control always displays 1 CPU per broker This is strictly a UI issue (it does not cause the rebalance errors) but still needs to be fixed. I can confirm that Strimzi Cruise Control is configured to only ever display 1 CPU core per broker. This single "virtual" CPU core has the cycles of 0 or more CPU cores (however many cores are configured in When we configure the numCore correctly, this error:
will change to this error:
This is because the utilization and the capacity values are both multiplied by the numCores. This leads us to the second issue: (2) The CPU utilization value is incorrect This is causing the rebalance errors. This was introduced with the The breaking changes of Anyways, @fvaleri I can confirm both issues for AMQ Streams 1.8 but Strimzi should be free of issue (2) and rebalance without error! Regardless, we will get these issues patched for Strimzi! [1] https://bugs.openjdk.java.net/browse/JDK-8269851 Line 96 in d440baa
|
Hi @kyguy , thanks for the fix and explanation. Great work! |
Just for transparency, the PR linked above solves the problem (2) listed in the comment above [1] but not problem (1). We will leave this issue open to until problem (1) is addressed! [1] #5951 (comment) |
Sure, I guess we can either fix it or simply document it as you have clearly explained. |
Closing issue as this has been resolved by #6892 |
Describe the bug
When trying to do a rebalance with Cruise Control, we always get an OptimizationFailureException saying the CpuCapacityGoal cannot be satisfied and we should add more brokers. The Utilization Computation seems off. The CruiseControl REST API Call to /kafkacruisecontrol/load shows that it assumes our brokers have 1 core, which may be the cause of the malcomputation.
In CruiseControl you can set a num.cores for broker capacity (https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_4/config/capacityCores.json), but in Strimzi, this is not possible (https://strimzi.io/docs/operators/latest/using.html#type-CruiseControlSpec-reference)
To Reproduce
Steps to reproduce the behavior:
Error for request: cluster-main-cruise-control.kafka-devl.svc:9090/kafkacruisecontrol/rebalance?json=true&dryrun=true&verbose=true&skip_hard_goal_check=false. Server returned: Error processing POST request '/rebalance' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [CpuCapacityGoal] Insufficient capacity for cpu (Utilization 858.79, Allowed Capacity 600.00, Threshold: 1.00). Add at least 3 brokers with the same cpu capacity (100.00) as broker-0. Add at least 3 brokers with the same cpu capacity (100.00) as broker-0.'.
Expected behavior
Expect to see a "ProposalReady" Status in the KafkaRebalance CR, something like:
Status: Conditions: Last Transition Time: 2020-05-19T13:50:12.533Z Status: ProposalReady Type: State Observed Generation: 1 Optimization Result: Data To Move MB: 0 Excluded Brokers For Leadership: Excluded Brokers For Replica Move: Excluded Topics: Intra Broker Data To Move MB: 0 Monitored Partitions Percentage: 100 Num Intra Broker Replica Movements: 0 Num Leader Movements: 0 Num Replica Movements: 26 On Demand Balancedness Score After: 81.8666802863978 On Demand Balancedness Score Before: 78.01176356230222 Recent Windows: 1 Session Id: 05539377-ca7b-45ef-b359-e13564f1458c
Environment (please complete the following information):
YAML files and logs
Kafka CR (attached in the zip file)
KafkaRebalance CR (attached in the zip file)
REST Call to /kafkacruisecontrol/load
`curl https://cluster-main-cruise-control-kafka-devl.apps.ocp4-prod1.helvetia.io/kafkacruisecontrol/load
cluster-main-kafka-0.cluster-main-kafka-brokers.kafka-devl.svc, 0,eu-central-1c, 512000.000, 22101.055/04.32, 1, 113.978, 10000.000, 2.142, 0.518, 10000.000, 5.632, 7.537, 725/1809
cluster-main-kafka-1.cluster-main-kafka-brokers.kafka-devl.svc, 1,eu-central-1a, 512000.000, 23908.867/04.67, 1, 198.530, 10000.000, 2.703, 8.209, 10000.000, 6.910, 142.473, 747/1999
cluster-main-kafka-2.cluster-main-kafka-brokers.kafka-devl.svc, 2,eu-central-1b, 512000.000, 23526.426/04.60, 1, 72.533, 10000.000, 6.253, 4.443, 10000.000, 13.036, 23.018, 693/2006
cluster-main-kafka-3.cluster-main-kafka-brokers.kafka-devl.svc, 3,eu-central-1b, 512000.000, 17625.527/03.44, 1, 35.741, 10000.000, 6.081, 1.061, 10000.000, 123.725, 245.161, 736/1879
cluster-main-kafka-4.cluster-main-kafka-brokers.kafka-devl.svc, 4,eu-central-1c, 512000.000, 17014.631/03.32, 1, 272.399, 10000.000, 0.509, 9.539, 10000.000, 238.128, 371.253, 643/1797
cluster-main-kafka-5.cluster-main-kafka-brokers.kafka-devl.svc, 5,eu-central-1a, 512000.000, 17739.572/03.46, 1, 189.939, 10000.000, 0.825, 0.846, 10000.000, 2.193, 5.035, 683/1734`
The text was updated successfully, but these errors were encountered: