Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for unique volume IDs in JBOD storage #9990

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Apr 17, 2024

Type of change

  • Bugfix

Description

Currently, we do not seem to have any validation that all JBOD storage IDs are unique. And when we have duplicate, we end-up breaking the first pod because we generate invalid Pod definition:

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.96.0.1:443/api/v1/namespaces/myproject/pods. Message: Pod "my-cluster-bodymoor-3000" is invalid: [spec.volumes[2].name: Duplicate value: "data-1", spec.containers[0].volumeMounts[2].mountPath: Invalid value: "/var/lib/kafka/data-1": must be unique]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.volumes[2].name, message=Duplicate value: "data-1", reason=FieldValueDuplicate, additionalProperties={}), StatusCause(field=spec.containers[0].volumeMounts[2].mountPath, message=Invalid value: "/var/lib/kafka/data-1": must be unique, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=my-cluster-bodymoor-3000, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "my-cluster-bodymoor-3000" is invalid: [spec.volumes[2].name: Duplicate value: "data-1", spec.containers[0].volumeMounts[2].mountPath: Invalid value: "/var/lib/kafka/data-1": must be unique], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) ~[io.fabric8.kubernetes-client-api-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:754) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:98) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1155) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:98) ~[io.fabric8.kubernetes-client-6.12.0.jar:?]
at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.maybeCreateOrPatchPod(StrimziPodSetController.java:470) ~[io.strimzi.cluster-operator-0.41.0-SNAPSHOT.jar:0.41.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.reconcile(StrimziPodSetController.java:398) ~[io.strimzi.cluster-operator-0.41.0-SNAPSHOT.jar:0.41.0-SNAPSHOT]
at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.run(StrimziPodSetController.java:566) ~[io.strimzi.cluster-operator-0.41.0-SNAPSHOT.jar:0.41.0-SNAPSHOT]
at java.lang.Thread.run(Thread.java:840) ~[?:?]

This PR adds the validation of duplicate volume IDs to the StorageDiff class. That allows us to proceed with reconciliation using the old storage definition. This Pr also refactors the previously added check for multiple KRaft metadata volumes to reduce the cyclomatic complexity and adds a debug log message indicating what the actual problem is to both checks.

Checklist

  • Write tests
  • Make sure all tests pass
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj added this to the 0.41.0 milestone Apr 17, 2024
@scholzj scholzj requested a review from ppatierno April 17, 2024 14:45
@scholzj
Copy link
Member Author

scholzj commented Apr 17, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@scholzj scholzj merged commit 6805f40 into strimzi:main Apr 18, 2024
21 checks passed
@scholzj scholzj deleted the add-jbod-volume-id-validation branch April 18, 2024 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants