Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move cruise control server, capacity and server logging metrics config to ConfigMap #8977

Merged
merged 9 commits into from Jan 18, 2024

Conversation

Thealisyed
Copy link
Contributor

@Thealisyed Thealisyed commented Aug 11, 2023

Type of change

  • Enhancement / new feature

Description

Moves cruise control capacity config to ConfigMap
Resolves part of this #8337

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@kyguy kyguy added this to the 0.37.0 milestone Aug 11, 2023
@Thealisyed Thealisyed force-pushed the cc-capacity-cm branch 3 times, most recently from 0dfdd11 to d417308 Compare August 16, 2023 15:45
Copy link
Member

@kyguy kyguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start @Thealisyed, left some comments, be careful with the IDE auto indenting code in files, it adds noise to the diff. Please go through an undo all of the unnecessary indentation changes, I have marked some of them but not all

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this newline change, it adds noise to the diff

@@ -93,4 +93,14 @@ public static String logAndMetricsConfigMapName(String clusterName) {
public static String networkPolicyName(String clusterName) {
return clusterName + "-network-policy-cruise-control";
}

/**
* Returns the name of the Cruise Control Broker Capacity ConfigMap Name {@code ConfigMap} for a {@code Kafka} cluster of the given name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Returns the name of the Cruise Control Broker Capacity ConfigMap Name {@code ConfigMap} for a {@code Kafka} cluster of the given name.
* Returns the name of the Cruise Control Broker capacity {@code ConfigMap} for a {@code Kafka} cluster of the given name.

/**
* Returns the name of the Cruise Control Broker Capacity ConfigMap Name {@code ConfigMap} for a {@code Kafka} cluster of the given name.
* @param clusterName The {@code metadata.name} of the {@code Kafka} resource.
* @return The name of the corresponding Cruise Control log {@code ConfigMap}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return The name of the corresponding Cruise Control log {@code ConfigMap}.
* @return The name of the corresponding Cruise Control capacity {@code ConfigMap}.

protected static final String CRUISE_CONTROL_CAPACITY_CONFIGURATION_VOLUME_MOUNT = "/tmp1";
protected static final String DEFAULT_CAPACITY_CONFIG_FILE_CONFIG =
CRUISE_CONTROL_CAPACITY_CONFIGURATION_VOLUME_MOUNT + "/" +
CRUISE_CONTROL_CAPACITY_CONFIGURATION_VOLUME_NAME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can 103 and 104 be aligned?

@@ -145,6 +149,7 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup
private InternalServiceTemplate templateService;

private static final Map<String, String> DEFAULT_POD_LABELS = new HashMap<>();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this newline change, it adds noise to the diff

.addToLimits("memory", new Quantity("300Mi"))
.addToRequests("memory", new Quantity("300Mi"))
.build())
.withNewJvmOptions()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo indentation

.endJvmOptions()
.endCruiseControl()
.endSpec()
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo intentation

.build());

String ccPodName = kubeClient().listPodsByPrefixInName(clusterOperator.getDeploymentNamespace(),
CruiseControlResources.deploymentName(clusterName)).get(0).getMetadata().getName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation

.endBrokerCapacity()
.endCruiseControl()
.endSpec()
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation

JsonNode cpuEntry = capacity.get("CPU");
System.out.println("Expected: " + cpu + " Actual: " + cpuEntry.get("num.cores") + " TEST PASSED " + cpu.equals(cpuEntry.get("num.cores")));
JsonNode numCores = cpuEntry.get("num.cores");
System.out.println("Num Cores: " + numCores);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the print statements; add assert statements

@@ -97,7 +97,11 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup
protected static final String LOG_AND_METRICS_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/custom-config/";
protected static final String API_AUTH_CONFIG_VOLUME_NAME = "api-auth-config";
protected static final String API_AUTH_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/api-auth-config/";

protected static final String CRUISE_CONTROL_CAPACITY_CONFIG_VOLUME_NAME = "capacity-json";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's match the name of the original file, capacity.json, if possible

/**
* Generates data for broker capacity ConfigMap
*
* @param capacity assigned capacity parameter for broker capacity configmap data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param capacity assigned capacity parameter for broker capacity configmap data
* @param capacity data used for broker capacity configuration

* "brokerCapacities":[
* {
* "brokerCapacities":[
* {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

@@ -83,6 +83,12 @@ public enum CruiseControlConfigurationParameters {
*/
METRIC_REPORTER_TOPIC_NAME("metric.reporter.topic"),

/**
* Name of the Cruise Control metrics topic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Name of the Cruise Control metrics topic
* Name of the Cruise Control capacity config file path

/**
* Name of the Cruise Control capacity file
*/
public static final String DEFAULT_CRUISE_CONTROL_CAPACITY_CONFIG_NAME = "/tmp1/capacity-json";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we define this path name in the CruiseControl.java class, create this value from the variables defined in the CruiseControl.java class? or vice versa? Just want to make sure this path name is defined in one place

.endMetadata()
.editOrNewSpec()
.editKafka()
.addToConfig("auto.create.topics.enable", "false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo indentation

.endJvmOptions()
.endCruiseControl()
.endSpec()
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo indentation changes

}

RollingUpdateUtils.waitForComponentScaleUpOrDown(testStorage.getNamespaceName(), testStorage.getKafkaSelector(), initialReplicas);
RollingUpdateUtils.waitForComponentScaleUpOrDown(testStorage.getNamespaceName(), testStorage.getKafkaSelector(),
initialReplicas);
}

@BeforeAll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo all the changes here that are no related to the testAutoCreationOfCruiseControlCapacityConfigMap test method i.e. undo changes to all other test methods

@Thealisyed Thealisyed force-pushed the cc-capacity-cm branch 2 times, most recently from 557ed6e to 6b25178 Compare August 29, 2023 15:05
/**
* Name of the Cruise Control capacity file
*/
public static final String DEFAULT_CRUISE_CONTROL_CAPACITY_CONFIG_NAME = "/tmp/capacity-json";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch the value here, we should define this value once, either here of in the CruiseControl.java class


@IsolatedTest
@KRaftWithoutUTONotSupported
@UTONotSupported("https://github.com/strimzi/strimzi-kafka-operator/issues/8864")
void testAutoCreationOfCruiseControlTopicsWithResources(ExtensionContext extensionContext) {
final String clusterName = mapWithClusterNames.get(extensionContext.getDisplayName());


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line?

@@ -121,11 +127,11 @@ void testAutoCreationOfCruiseControlTopicsWithResources(ExtensionContext extensi
.withName(CRUISE_CONTROL_METRICS_TOPIC).get().getSpec();

KafkaTopicUtils.waitForKafkaTopicReady(Constants.TEST_SUITE_NAMESPACE, CRUISE_CONTROL_MODEL_TRAINING_SAMPLES_TOPIC);
KafkaTopicSpec modelTrainingTopic = KafkaTopicResource.kafkaTopicClient().inNamespace(Constants.TEST_SUITE_NAMESPACE)
KafkaTopicSpec modelTrainingTopic = KafkaTopicResource.kafkaTopicClient().inNamespace(clusterOperator.getDeploymentNamespace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes necessary?

KafkaTopicUtils.waitForKafkaTopicReady(Constants.TEST_SUITE_NAMESPACE, CRUISE_CONTROL_PARTITION_METRICS_SAMPLES_TOPIC);
KafkaTopicSpec partitionMetricsTopic = KafkaTopicResource.kafkaTopicClient().inNamespace(Constants.TEST_SUITE_NAMESPACE)
KafkaTopicUtils.waitForKafkaTopicReady(clusterOperator.getDeploymentNamespace(), CRUISE_CONTROL_PARTITION_METRICS_SAMPLES_TOPIC);
KafkaTopicSpec partitionMetricsTopic = KafkaTopicResource.kafkaTopicClient().inNamespace(clusterOperator.getDeploymentNamespace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

.withName(CRUISE_CONTROL_MODEL_TRAINING_SAMPLES_TOPIC).get().getSpec();

KafkaTopicUtils.waitForKafkaTopicReady(Constants.TEST_SUITE_NAMESPACE, CRUISE_CONTROL_PARTITION_METRICS_SAMPLES_TOPIC);
KafkaTopicSpec partitionMetricsTopic = KafkaTopicResource.kafkaTopicClient().inNamespace(Constants.TEST_SUITE_NAMESPACE)
KafkaTopicUtils.waitForKafkaTopicReady(clusterOperator.getDeploymentNamespace(), CRUISE_CONTROL_PARTITION_METRICS_SAMPLES_TOPIC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -162,7 +220,7 @@ void testCruiseControlWithApiSecurityDisabled(ExtensionContext extensionContext)
.endMetadata()
.build());

KafkaRebalanceUtils.waitForKafkaRebalanceCustomResourceState(Constants.TEST_SUITE_NAMESPACE, clusterName, KafkaRebalanceState.ProposalReady);
KafkaRebalanceUtils.waitForKafkaRebalanceCustomResourceState(clusterOperator.getDeploymentNamespace(), clusterName, KafkaRebalanceState.ProposalReady);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo changes

if (Environment.isKafkaNodePoolsEnabled()) {
KafkaNodePoolResource.replaceKafkaNodePoolResourceInSpecificNamespace(testStorage.getKafkaNodePoolName(), knp ->
knp.getSpec().setReplicas(initialReplicas), testStorage.getNamespaceName());
if (Environment.isKafkaNodePoolsEnabled()) {KafkaNodePoolResource.replaceKafkaNodePoolResourceInSpecificNamespace(testStorage.getKafkaNodePoolName(), knp -> knp.getSpec().setReplicas(initialReplicas), testStorage.getNamespaceName());
} else {
KafkaResource.replaceKafkaResourceInSpecificNamespace(testStorage.getClusterName(), kafka -> kafka.getSpec().getKafka().setReplicas(initialReplicas), testStorage.getNamespaceName());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo changes here that are unrelated to the testAutoCreationOfCruiseControlCapacityConfigMap class

.endBrokerCapacity()
.endCruiseControl()
.endSpec()
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

* @param versions Supported Kafka versions
* @param kafkaBrokerNodes List of the nodes which are part of the Kafka cluster
* @param kafkaStorage A map with storage configuration used by the Kafka cluster and its node pools
* @param kafkaBrokerResources A map with resource configuration used by the Kafka cluster and its node pools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this changes

@@ -942,7 +942,7 @@ public void testCruiseControlContainerSecurityContext() {
.withAllowPrivilegeEscalation(false)
.withRunAsNonRoot(true)
.withNewCapabilities()
.addToDrop("ALL")
.addToDrop("ALL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@@ -102,7 +103,7 @@ public void reconcileEnabledCruiseControl(VertxTestContext context) {

Kafka kafka = new KafkaBuilder(ResourceUtils.createKafka(NAMESPACE, NAME, 3, "foo", 120, 30))
.editSpec()
.withCruiseControl(cruiseControlSpec)
.withCruiseControl(cruiseControlSpec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

client-examples Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong commit?

* @return The name of the corresponding Cruise Control capacity {@code ConfigMap}.
*/
public static String brokerCapacityConfigMapName(String clusterName) {
return clusterName + "-cruise-control-capacity-config";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the same name as the Cruise Control deployment has. Or actually, why not use the existing Config Map we already have?

Copy link
Contributor

@fvaleri fvaleri Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single config map does not scale well in this case, also considering the 1MB limit, so I like this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was until now in an environment variable and nobody complained. So I do not think 1MB limit would be an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but I still think it's a good idea to use a dedicated config map, as these configuration files can be quite big. Additionally, logAndMetricsConfigMapName would need to be renamed if we decide to go with one config map for all kind of configurations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config map name is <cluster-name>-cruise-control-config. So I do not think you need to rename it. I think it has like 1kB of size today in typical environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the method name.

@@ -95,7 +95,15 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup
protected static final String LOG_AND_METRICS_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/custom-config/";
protected static final String API_AUTH_CONFIG_VOLUME_NAME = "api-auth-config";
protected static final String API_AUTH_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/api-auth-config/";

protected static final String CRUISE_CONTROL_CAPACITY_CONFIG_VOLUME_NAME = "capacity.json";
protected static final String CRUISE_CONTROL_CAPACITY_CONFIG_VOLUME_MOUNT = "/tmp";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we keep separate config map with separate mount, it should not mount to /tmp but to something like the other config maps.

@@ -158,8 +165,7 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup
* Constructor
*
* @param reconciliation The reconciliation
* @param resource Kubernetes resource with metadata containing the namespace and cluster name
* @param sharedEnvironmentProvider Shared environment provider
* @param resource Kubernetes resource with metadata containing the namespace and cluster name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the sharedEnvironmentProvider removed from the Javadoc? That does not seem right.

* @param kafkaBrokerNodes List of the broker nodes which are part of the Kafka cluster
* @param kafkaStorage A map with storage configuration used by the Kafka cluster and its node pools
* @param kafkaBrokerResources A map with resource configuration used by the Kafka cluster and its broker pools
* @param sharedEnvironmentProvider A Interface to be implemented for returning an instance of the environment variables shared by all containers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the original Javadoc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the indentation part might have happened automatically by the IDE. @Thealisyed I guees you can revert the indentation and keep the things same as they were earlier. Not sure why you edited the javadoc for sharedEnvironmentProvider

Comment on lines 342 to 346
* @param isOpenShift Flag indicating if we are on OpenShift or not
* @param imagePullPolicy Image pull policy
* @param imagePullSecrets Image pull secrets
*
* @return Cruise Control Kubernetes Deployment
* @return Cruise Control Kubernetes Deployment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the indentation.

@@ -389,12 +395,12 @@ protected List<EnvVar> getEnvVars() {
varList.add(ContainerUtils.createEnvVar(ENV_VAR_STRIMZI_KAFKA_GC_LOG_ENABLED, String.valueOf(gcLoggingEnabled)));
varList.add(ContainerUtils.createEnvVar(ENV_VAR_MIN_INSYNC_REPLICAS, String.valueOf(minInsyncReplicas)));

varList.add(ContainerUtils.createEnvVar(ENV_VAR_CRUISE_CONTROL_CAPACITY_CONFIGURATION, capacity.toString()));
//varList.add(ContainerUtils.createEnvVar(ENV_VAR_CRUISE_CONTROL_CAPACITY_CONFIGURATION, capacity.toString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be deleted? And should the ENV_VAR_CRUISE_CONTROL_CAPACITY_CONFIGURATION field be removed as well?

Comment on lines 462 to 466
* @param namespace Namespace in which the Cruise Control cluster runs
* @param kafkaName Name of the Kafka cluster (it is used for the SANs in the certificate)
* @param clusterCa The cluster CA.
* @param isMaintenanceTimeWindowsSatisfied Indicates whether we are in the maintenance window or not.
* This is used for certificate renewals
*
* This is used for certificate renewals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the indentation here or in the other places. I will stop repeating it, but it applies below as well.

@@ -142,6 +146,52 @@ void testAutoCreationOfCruiseControlTopicsWithResources(ExtensionContext extensi
assertThat(partitionMetricsTopic.getReplicas(), is(2));
}

@IsolatedTest
@KRaftNotSupported("Topic Operator is not supported by KRaft mode and is used in this test class")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use Topic Operator? And maybe a bit more general, why does this need a separate System test? I would think that:

  • Unit tests should be sufficient to cover this
  • If we want a system test, it can be probably added to one of the existing tests?

assertThat(nwThroughputOutbound, is(capacity.get("NW_OUT")));
}
} catch (Exception e) {
e.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail the test instead of printing the stack trace?

@kyguy kyguy requested a review from fvaleri August 30, 2023 16:08
@scholzj scholzj modified the milestones: 0.37.0, 0.38.0 Aug 31, 2023
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. I left a couple of comments, but first you need to fix the build error.

* @return The name of the corresponding Cruise Control capacity {@code ConfigMap}.
*/
public static String brokerCapacityConfigMapName(String clusterName) {
return clusterName + "-cruise-control-capacity-config";
Copy link
Contributor

@fvaleri fvaleri Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single config map does not scale well in this case, also considering the 1MB limit, so I like this approach.

@@ -188,6 +189,27 @@ protected Future<Void> metricsAndLoggingConfigMap() {
.map((Void) null);
}
}
/**
* Manages the Cruise Control certificates Secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this comment should be updated.

@ppatierno
Copy link
Member

Nothing to add right now. I will wait for all the other comments to be fixed before having another pass.

Copy link
Contributor

@ShubhamRwt ShubhamRwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thealisyed I think you should first get rid of all the unnecessory indentation since it makes the changes look big when they are not.

* @param kafkaBrokerNodes List of the broker nodes which are part of the Kafka cluster
* @param kafkaStorage A map with storage configuration used by the Kafka cluster and its node pools
* @param kafkaBrokerResources A map with resource configuration used by the Kafka cluster and its broker pools
* @param sharedEnvironmentProvider A Interface to be implemented for returning an instance of the environment variables shared by all containers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the indentation part might have happened automatically by the IDE. @Thealisyed I guees you can revert the indentation and keep the things same as they were earlier. Not sure why you edited the javadoc for sharedEnvironmentProvider

@kyguy kyguy self-assigned this Oct 5, 2023
@kyguy
Copy link
Member

kyguy commented Oct 5, 2023

Ali is back at university so I'll take this over and finish it up over the next couple days

@scholzj scholzj modified the milestones: 0.38.0, 0.39.0 Oct 11, 2023
@kyguy kyguy force-pushed the cc-capacity-cm branch 3 times, most recently from f818fd6 to 095de6f Compare November 8, 2023 00:58
@scholzj
Copy link
Member

scholzj commented Jan 5, 2024

@kyguy In the regression tests I started earlier, several CC related STs seemed to have failed: https://dev.azure.com/cncf/strimzi/_build/results?buildId=159143&view=results ... I restarted it, maybe it was just flaky ... but might be also some issue. That said, I realized the PR does not seem to have any logic to roll CC when the config changes. Is that missing? Or did I missed it in the PR review? Typically, you would for example hash the configs from the CM and add them as annotation to the pods to trigger the restart when the configs change (unless CC reloads them dynamically in which case we would need to fix the STs I guess).

Update: The restarted test failed as well :-/

@kyguy
Copy link
Member

kyguy commented Jan 8, 2024

That said, I realized the PR does not seem to have any logic to roll CC when the config changes. Is that missing? Or did I missed it in the PR review?

You are right, this is missing! Going to fix this and re-request review once it is ready

Signed-off-by: alsyed <thealisyed@gmail.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably test the annotation here that they are set to an expected values? Might not need a new test, just enhancing the existing checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know what you think!

Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Comment on lines 156 to 167
// Verify annotations
CruiseControl expectCruiseControl = CruiseControl.fromCrd(Reconciliation.DUMMY_RECONCILIATION, kafka, VERSIONS, NODES, Map.of("kafka", kafka.getSpec().getKafka().getStorage()), Map.of(), SHARED_ENV_PROVIDER);
ConfigMap configMap = expectCruiseControl.generateConfigMap(new MetricsAndLogging(null, null));

Map<String, String> expectedAnnotations = new HashMap<>();
expectedAnnotations.put(Ca.ANNO_STRIMZI_IO_CLUSTER_CA_CERT_GENERATION, String.valueOf(clusterCa.caCertGeneration()));
expectedAnnotations.put(Ca.ANNO_STRIMZI_IO_CLUSTER_CA_KEY_GENERATION, String.valueOf(clusterCa.caKeyGeneration()));
expectedAnnotations.put(CruiseControl.ANNO_STRIMZI_SERVER_CONFIGURATION_HASH, Util.hashStub(configMap.getData().get(CruiseControl.SERVER_CONFIG_FILENAME)));
expectedAnnotations.put(CruiseControl.ANNO_STRIMZI_CAPACITY_CONFIGURATION_HASH, Util.hashStub(configMap.getData().get(CruiseControl.CAPACITY_CONFIG_FILENAME)));

Map<String, String> actualAnnotations = depCaptor.getAllValues().get(0).getSpec().getTemplate().getMetadata().getAnnotations();
assertThat("Annotations are not equal", actualAnnotations, is(expectedAnnotations));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right way to do it. You should use the deployment captor to check what annotations you have in the Deployment that the operator created. You already seem to have the captor below, you just need to assert it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the deployment captor to check what annotations you have in the Deployment that the operator created

I am not sure I understand, isn't that what the following line does?

Map<String, String> actualAnnotations = depCaptor.getAllValues().get(0).getSpec().getTemplate().getMetadata().getAnnotations();

uses the deployment captor, depCaptor to get a map of the annotations the operator created? Then compares against the annotation values we expect for the given configuration?

Or should I just use the deployment captor to check that required annotation keys exist in the deployment created by the operator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it does -> but all the lines above it do not seem to make any sense. You should just check the annotation value is as expected. The hash should be stable. Also, it should likely be with the other deployment captor checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the code a little bit here, hopefully it is more clear. Let me know what you think!

@scholzj
Copy link
Member

scholzj commented Jan 16, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyguy for adding the annotation part, I also missed that in my previous review. Just few nits.

Running CruiseControlConfigurationST, I have two failed tests due to NPE.

[ERROR] io.strimzi.systemtest.cruisecontrol.CruiseControlConfigurationST.testConfigurationPerformanceOptions(ExtensionContext) -- Time elapsed: 230.3 s <<< ERROR!
java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
	at io.strimzi.systemtest.cruisecontrol.CruiseControlConfigurationST.testConfigurationPerformanceOptions(CruiseControlConfigurationST.java:277)


[ERROR] io.strimzi.systemtest.cruisecontrol.CruiseControlConfigurationST.testConfigurationReflection(ExtensionContext) -- Time elapsed: 144.4 s <<< ERROR!
java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
	at io.strimzi.systemtest.cruisecontrol.CruiseControlConfigurationST.testConfigurationReflection(CruiseControlConfigurationST.java:194)

@@ -77,7 +77,7 @@ public static String secretName(String clusterName) {
* @param clusterName The {@code metadata.name} of the {@code Kafka} resource.
* @return The name of the corresponding Cruise Control log {@code ConfigMap}.
*/
public static String logAndMetricsConfigMapName(String clusterName) {
public static String configMapName(String clusterName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also update the javadoc.

/**
 * Returns the name of the Cruise Control {@code ConfigMap} for a {@code Kafka} cluster of the given name.
 * @param clusterName The {@code metadata.name} of the {@code Kafka} resource.
 * @return The name of the corresponding Cruise Control {@code ConfigMap}.
 */

protected static final String API_AUTH_CREDENTIALS_FILE = API_AUTH_CONFIG_VOLUME_MOUNT + API_AUTH_FILE_KEY;

protected static final String ENV_VAR_CRUISE_CONTROL_METRICS_ENABLED = "CRUISE_CONTROL_METRICS_ENABLED";

/**
* Annotation for rolling a cluster whenever the server configuration has changed.
* By changing the annotation we force a restart since the pod will be out of date compared to the Pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* By changing the annotation we force a restart since the pod will be out of date compared to the Pod.
* When the configuration hash annotation change is detected, we force a pod restart.


/**
* Annotation for rolling a cluster whenever the capacity configuration has changed.
* By changing the annotation we force a restart since the pod will be out of date compared to the Pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* By changing the annotation we force a restart since the pod will be out of date compared to the Pod.
* When the configuration hash annotation change is detected, we force a pod restart.

@@ -380,7 +398,6 @@ protected List<EnvVar> getEnvVars() {
varList.add(ContainerUtils.createEnvVar(ENV_VAR_STRIMZI_KAFKA_BOOTSTRAP_SERVERS, KafkaResources.bootstrapServiceName(cluster) + ":" + KafkaCluster.REPLICATION_PORT));
varList.add(ContainerUtils.createEnvVar(ENV_VAR_STRIMZI_KAFKA_GC_LOG_ENABLED, String.valueOf(gcLoggingEnabled)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary empty line.

*
* @return The CpuCapacity object as a JsonObject
*/
public JsonObject getJson() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, it looks like getCores() is not used anymore, so we can get rid of it.

@@ -93,7 +100,7 @@ public void reconcileEnabledCruiseControl(VertxTestContext context) {
when(mockNetPolicyOps.reconcile(any(), eq(NAMESPACE), eq(CruiseControlResources.networkPolicyName(NAME)), netPolicyCaptor.capture())).thenReturn(Future.succeededFuture());

ArgumentCaptor<ConfigMap> cmCaptor = ArgumentCaptor.forClass(ConfigMap.class);
when(mockCmOps.reconcile(any(), eq(NAMESPACE), eq(CruiseControlResources.logAndMetricsConfigMapName(NAME)), cmCaptor.capture())).thenReturn(Future.succeededFuture());
when(mockCmOps.reconcile(any(), eq(NAMESPACE), eq(CruiseControlResources.configMapName(NAME)), cmCaptor.capture())).thenReturn(Future.succeededFuture());

ArgumentCaptor<Deployment> depCaptor = ArgumentCaptor.forClass(Deployment.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe deployCaptor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's alright with you, I would like to take care of this in a different PR where will refactor the CC package structure and tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think depCaptor is used in many places. deployCaptor is not used anywhere. So I think the name is fine.

@kyguy kyguy force-pushed the cc-capacity-cm branch 2 times, most recently from d569835 to 89d79e8 Compare January 16, 2024 23:26
Signed-off-by: Kyle Liberti <kliberti@redhat.com>
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the tests pass.

@scholzj
Copy link
Member

scholzj commented Jan 17, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of nits.

/**
* Server config file name
*/
public static final String SERVER_CONFIG_FILENAME = "cruisecontrol.properties";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using the SERVER_ naming here? The same applies to the annotation below. Isn't this constant and the annotation related to Cruise Control only, so that something like CRUISE_CONTROL_ is clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to draw the distinction between the configuration of the Cruise Control server application and the Cruise Control broker capacity, since both are Cruise Control configurations. I figured since this fields and annotations are in the Cruise Control class and Cruise Control pod respectively, it would be clear they were apart of Cruise Control!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that explanation make sense, does it sound reasonable to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. Thanks!

@@ -16,6 +16,7 @@
import java.util.TreeMap;
import java.util.concurrent.TimeUnit;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non necessary blank line.

Signed-off-by: Kyle Liberti <kliberti@redhat.com>
@kyguy
Copy link
Member

kyguy commented Jan 18, 2024

All outstanding feedback has been addressed, this PR is now ready for merge!

@scholzj scholzj merged commit 3a8de17 into strimzi:main Jan 18, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants