New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU overrides for CC capacity config #6892
Conversation
e77dfdc
to
f6e5eb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Mostly about the naming.
But I also wonder a bit about the use-case for this. If the broker resources are set, we should always set the automatically (and your docs changes suggest we do).
So, what is the use-case? When you run on dedicated hosts and don't set the CPU request is the only situation when I might see it useful. So it seems a bit niche. It might be good to explain it in the docs when would you use this.
@EqualsAndHashCode | ||
public class BrokerCapacity implements UnknownPropertyPreserving, Serializable { | ||
|
||
private static final long serialVersionUID = 1L; | ||
|
||
private String disk; | ||
private Integer cpuUtilization; | ||
private String cpuCores; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes is using just cpu
. Not cpuCores
. What is the reason for using different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to draw a distinction between how we were setting the cpu resource before with cpuUtilization
(which is now deprecated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still different from cpuUtilization
. And if I understand it right, you use the same format as the cpu
field in Kubernetes.
@ppatierno @tombentley WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can stick with naming and terminology that users already know from builtin Kubernetes features, it's much better imho. My +1 for using cpu
.
@Description("Broker capacity for CPU resource in cores or milliCPU. " + | ||
"For example, 1, 1.500, 1500m.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Description("Broker capacity for CPU resource in cores or milliCPU. " + | |
"For example, 1, 1.500, 1500m.") | |
@Description("Broker capacity for CPU resource in cores or millicores. " + | |
"For example, 1, 1.500, 1500m.") |
Should we link here to Kubernetes docs for the details about the units?
@EqualsAndHashCode | ||
public class BrokerCapacityOverride implements UnknownPropertyPreserving, Serializable { | ||
private static final long serialVersionUID = 1L; | ||
|
||
private List<Integer> brokers; | ||
private String cpuCores; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -9,20 +9,19 @@ public class BrokerCapacity { | |||
// CC designates the id of this default broker entry as "-1". | |||
public static final int DEFAULT_BROKER_ID = -1; | |||
public static final String DEFAULT_BROKER_DOC = "This is the default capacity. Capacity unit used for disk is in MiB, cpu is in percentage, network throughput is in KiB."; | |||
|
|||
public static final String DEFAULT_CPU_UTILIZATION_CAPACITY = "100"; // as a percentage (0-100) | |||
public static final String DEFAULT_CPU_CORE_CAPACITY = "1"; // as a percentage (0-100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is this 1 CPU core? Or is it 100% of available cores? The name suggests to me it is the first one. But the comment suggests the other. It would be great to have it more clear from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is just a left over of copying the previous line. It should be just 1 core. The comment is wrong.
NOTE: Disk capacity limits are automatically generated by Strimzi, so you do not need to set them. | ||
|
||
[NOTE] | ||
==== | ||
In order to guarantee accurate rebalance proposal when using CPU goals, you can set CPU requests equal to CPU limits in `Kafka.spec.kafka.resources`. | ||
NOTE: CPU capacity limits are automatically generated by Strimzi when you set CPU requests equal to CPU limits in `Kafka.spec.kafka.resources`. | ||
That way, all CPU resources are reserved upfront and are always available. | ||
This configuration allows Cruise Control to properly evaluate the CPU utilization when preparing the rebalance proposals based on CPU goals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow join both notes into one? I think it looks a bit weird to have the two notes right after each other.
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -9,20 +9,19 @@ public class BrokerCapacity { | |||
// CC designates the id of this default broker entry as "-1". | |||
public static final int DEFAULT_BROKER_ID = -1; | |||
public static final String DEFAULT_BROKER_DOC = "This is the default capacity. Capacity unit used for disk is in MiB, cpu is in percentage, network throughput is in KiB."; | |||
|
|||
public static final String DEFAULT_CPU_UTILIZATION_CAPACITY = "100"; // as a percentage (0-100) | |||
public static final String DEFAULT_CPU_CORE_CAPACITY = "1"; // as a percentage (0-100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is just a left over of copying the previous line. It should be just 1 core. The comment is wrong.
this.cores = milliCputoCpu(Quantities.parseCpuAsMilliCpus(cores)); | ||
} | ||
|
||
public static String milliCputoCpu(int milliCPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit ... milliCpuToCpu
with capital T
public static CpuCapacity processCpu(io.strimzi.api.kafka.model.balancing.BrokerCapacity bc, BrokerCapacityOverride override, String cpuBasedOnRequirements) { | ||
if (cpuBasedOnRequirements != null) { | ||
if ((override != null && override.getCpuCores() != null) || (bc != null && bc.getCpuCores() != null)) { | ||
LOGGER.warnOp("Ignoring CPU capacity override settings since they are set automatically set to resource limits"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many "set" ?
@@ -140,19 +140,17 @@ You specify capacity limits for Kafka broker resources in the `brokerCapacity` p | |||
They are enabled by default and you can change their default values. | |||
Capacity limits can be set for the following broker resources: | |||
|
|||
* `cpuCores` - CPU resource in milliCPU or CPU cores (Default: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"millicores"
p2.setId(1); | ||
volumes.add(p2); | ||
Map<String, Quantity> requests = new HashMap<>(1); | ||
requests.put(Capacity.RESOURCE_TYPE, new Quantity("400m")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use Collections.singletonMap()
here and a few places below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in a new code we tend to use Map.of(...)
. -if it can be immutable.
LGTM. The test seems to be failing due to some uncommitted files |
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Should be reviewed by SMEs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments about examples to fix, they are still using cpuCores
instead of cpu
. The same for the description of this PR, it would be better fixing with cpu
.
After the above changes I will approve it.
* outboundNetwork: 40000KB/s | ||
* - brokers: [1, 2] | ||
* cpuCores: 4000m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above example should have cpu
instead of cpuCores
inboundNetwork: 20000KiB/s | ||
outboundNetwork: 20000KiB/s | ||
- brokers: [1, 2] | ||
cpuCores: 3000m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above cpuCores
should be cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 👍
* `inboundNetwork` - Inbound network throughput in byte units per second (Default: 10000KiB/s) | ||
* `outboundNetwork` - Outbound network throughput in byte units per second (Default: 10000KiB/s) | ||
|
||
For network throughput, use an integer value with standard Kubernetes byte units (K, M, G) or their bibyte (power of two) equivalents (Ki, Mi, Gi) per second. | ||
|
||
NOTE: Disk and CPU capacity limits are automatically generated by Strimzi, so you do not need to set them. | ||
|
||
[NOTE] | ||
==== | ||
In order to guarantee accurate rebalance proposal when using CPU goals, you can set CPU requests equal to CPU limits in `Kafka.spec.kafka.resources`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to guarantee accurate rebalance proposal when using CPU goals, you can set CPU requests equal to CPU limits in `Kafka.spec.kafka.resources`. | |
In order to guarantee accurate rebalance proposals when using CPU goals, you can set CPU requests equal to CPU limits in `Kafka.spec.kafka.resources`. |
@Pattern("^[0-9]+([.][0-9]{0,3}|[m]?)$") | ||
@Description("Broker capacity for CPU resource in cores or millicores. " + | ||
"For example, 1, 1.500, 1500m. " + | ||
"For more details on valid CPU resource units see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For more details on valid CPU resource units see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu") | |
"For more information on valid CPU resource units, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu.") |
|
||
private static Integer getResourceRequirement(ResourceRequirements resources, ResourceRequirementType requirementType) { | ||
if (resources != null) { | ||
Map<String, Quantity> resourceRequirement = requirementType == ResourceRequirementType.REQUEST ? resources.getRequests() : resources.getLimits(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wonna be really fancy you can add a method to the enum to do this then you could just do Map<String, Quantity> resourceRequirement = requirementType.getResouceMap(resources)
.
if (resources != null) { | ||
Map<String, Quantity> resourceRequirement = requirementType == ResourceRequirementType.REQUEST ? resources.getRequests() : resources.getLimits(); | ||
if (resourceRequirement != null) { | ||
Quantity quantity = resourceRequirement.get(RESOURCE_TYPE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact you put this logic (with the step above) in the enum too and have Quantity quantity = requirementType.getQuantity(resources)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomncooper Can you double check to make sure my refactoring makes sense?
I do want to be fancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
@kyguy The regression error seems to be CC related:
So maybe this is related to your changes? |
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
787d628
to
00a9527
Compare
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
Type of change
Description
For accurate rebalances between brokers running on nodes with heterogeneous CPU resources, Cruise Control must know the CPU capacity limit of individual brokers. This PR allows users to specify capacity limit overrides for lists of individual Kafka brokers in the overrides property in Kafka.spec.cruiseControl.brokerCapacity. This PR also bumps the Cruise Control version to pick up an enhancement [1] which allows us to specify milliCPU and fractional core CPU capacity values.
This PR addresses the CPU capacity issues of #6265 and the UI issues of #5951
[1] linkedin/cruise-control#1831
Checklist
Please go through this checklist and make sure all applicable tasks have been done