Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for moving data between two JBOD disks using Cruise Control #106

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ShubhamRwt
Copy link
Contributor

This proposal shows how we can implement the feature to move data between two JBOD disks using Cruise Control

Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal. I left some quick comments.

strimzi.io/cluster: my-cluster
spec:
mode: remove-disks
brokerandlogdirs: [brokerid1-logdir1,brokerid2-logdir2...] # for eg. 0-/var/lib/kafka/data-0/kafka-log0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brokerandlogdirs seems strange as a name. If nothing else, you should use proper case: brokerAndLogDirs. For the users, it is quite complicated to understand what the paths are etc. as they are an internal implementation. So I also wonder if you should create more elegant APIs. E.g. something like:

spec:
  mode: remove-disks
  brokerandlogdirs:
    - brokerId: 0
      volumeId: 0
    - brokerId: 1
      volumeId: 0
    - brokerId: 2
      volumeId: 0

Or something similar that would be based on what the user configures int he Kafka / KafkaNodePool resources and nto on the internal paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jakub. In our current Strimzi implementation, the users have no knowledge of the internal paths on the volume where we configure the brokers to store logs. The only thing the user knows is the volume on a specific broker as pointed out in the above example.
The user doesn't know about /var/lib/kafka/data-0/kafka-log0 at all but we are using a very well defined pattern to make it depending on the volume for a broker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. That endpoint has a really bad user interface, and we should offer a better view.


### Flow

- The user should be using the `Kafka` resource with JBOD configured.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user uses KafkaNodePool resources? What if the user does not use JBOD type storage?

- The user accepts the proposal by applying the `strimzi.io/rebalance=approve` annotation on it.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for running the actual rebalancing.

Note: The movement of data between the JBOD disks doesn't affect the broker load, therefore there will be no changes in the before/after broker load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it not impact it? The moving of data between disks has to impact performance and load of the broker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mislead Shubham on this one, I thought it wouldn't affect the broker load since the data going in and out of the broker would be the same regardless of how the data was distributed amongst the disks. But I understand that if one disk was faster than another, moving the data between them within a broker certainly would impact the performance

@ShubhamRwt did we ever find out why the loadBefore/loadAfter information was not provided by this remove_disk endpoint? Is it just an oversight by upstream Cruise Control?


## Current situation

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple requests ... from community users I guess? Let's make it clear.

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
- When we want to use different Storage Class with different parameters or different storage types.
For now, we can do this using the Kafka CLI tools but it is not very user-friendly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which Kafka CLI tools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to mention kafka-reassign-partitions.sh tool

@@ -0,0 +1,71 @@
# Moving data between two JBOD disks using Cruise Control

This proposal is about integrating the `remove_disks` endpoint from Cruise Control into Strimzi cluster operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any documentation about this endpoint in Cruise Control, for example it's missing in the wiki.
Is there an official place where we can find how it works and put a link here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to link to the yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct link is this one, from LinkedIn repo.

You could also raise a documentation bug in Cruise Control for the missing endpoint documentation.

Two things I find weird about this endpoint:

  • The stop_ongoing_execution parameter, when we have a dedicated endpoint for that.
  • The fact the you use the same endpoint to poll for status updates, instead of using the user_tasks endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a documentation bug and create a PR for the same on upsteam repo regarding this. The only piece of information for using this is only on the PR

strimzi.io/cluster: my-cluster
spec:
mode: remove-disks
brokerandlogdirs: [brokerid1-logdir1,brokerid2-logdir2...] # for eg. 0-/var/lib/kafka/data-0/kafka-log0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jakub. In our current Strimzi implementation, the users have no knowledge of the internal paths on the volume where we configure the brokers to store logs. The only thing the user knows is the volume on a specific broker as pointed out in the above example.
The user doesn't know about /var/lib/kafka/data-0/kafka-log0 at all but we are using a very well defined pattern to make it depending on the volume for a broker.

- The user should be using the `Kafka` resource with JBOD configured.
- When the Kafka cluster is ready, the user creates a `KafkaRebalance` custom resource with the `spec.mode` field as `remove-disks` and the list of the broker and the corresponding logdirs to move in the `spec.brokerandlogdirs` field.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for getting an optimization proposal (by using the dryrun feature).
- The user accepts the proposal by applying the `strimzi.io/rebalance=approve` annotation on it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the user can still use the auto approval feature we have. It's better to highlight it here.

@ShubhamRwt ShubhamRwt marked this pull request as draft January 29, 2024 13:44
Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useful sought-after feature. 👍
I made a few suggestions as I read through

## Current situation

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The current disk is too big and the user wants to use smaller one
- The current disk is too big and the user wants to use a smaller one


Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
- When we want to use different Storage Class with different parameters or different storage types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When we want to use different Storage Class with different parameters or different storage types.
- When we want to use a different Storage Class with different parameters or different storage types.

## Proposal

Cruise Control provides `remove_disks` HTTP REST endpoint to move replicas from a specified disk to other disks of the same broker.
This endpoint allows to trigger a rebalancing operation by moving replicas in a round-robin manner to the remaining disks, from the largest to the smallest, while checking the following constraint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This endpoint allows to trigger a rebalancing operation by moving replicas in a round-robin manner to the remaining disks, from the largest to the smallest, while checking the following constraint:
This endpoint triggers a rebalancing operation by moving replicas in a round-robin manner to the remaining disks, from the largest to the smallest, while checking the following constraint:

errorMargin = configurable property (default 0.1); it makes sure that a disk percentage is always free when moving replicas
```

In order to use the above endpoint in the Strimzi cluster operator, it would be added to the [`CruiseControlApi`](https://github.com/strimzi/strimzi-kafka-operator/blob/main/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/resource/cruisecontrol/CruiseControlApi.java) interface and developing the corresponding implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to use the above endpoint in the Strimzi cluster operator, it would be added to the [`CruiseControlApi`](https://github.com/strimzi/strimzi-kafka-operator/blob/main/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/resource/cruisecontrol/CruiseControlApi.java) interface and developing the corresponding implementation.
In order to use the `remove_disks` endpoint in the Strimzi cluster operator, it would be added to the [`CruiseControlApi`](https://github.com/strimzi/strimzi-kafka-operator/blob/main/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/resource/cruisecontrol/CruiseControlApi.java) interface and the corresponding implementation developed.


### Implementation

For implementing this feature, We will be adding a new mode to the `KafkaRebalanceMode` class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For implementing this feature, We will be adding a new mode to the `KafkaRebalanceMode` class.
To implement this feature, we will be adding a new mode to the `KafkaRebalanceMode` class:


- The user should be using the `Kafka` resource with JBOD configured.
- When the Kafka cluster is ready, the user creates a `KafkaRebalance` custom resource with the `spec.mode` field as `remove-disks` and the list of the broker and the corresponding logdirs to move in the `spec.brokerandlogdirs` field.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for getting an optimization proposal (by using the dryrun feature).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for getting an optimization proposal (by using the dryrun feature).
- The `KafkaRebalanceAssemblyOperator` interacts with Cruise Control via the `/remove_disks` endpoint to generate an optimization proposal (by using the dryrun feature).

- When the Kafka cluster is ready, the user creates a `KafkaRebalance` custom resource with the `spec.mode` field as `remove-disks` and the list of the broker and the corresponding logdirs to move in the `spec.brokerandlogdirs` field.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for getting an optimization proposal (by using the dryrun feature).
- The user accepts the proposal by applying the `strimzi.io/rebalance=approve` annotation on it.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for running the actual rebalancing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for running the actual rebalancing.
- The `KafkaRebalanceAssemblyOperator` interacts with Cruise Control via the `/remove_disks` endpoint to perform the actual rebalancing.

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
- When we want to use different Storage Class with different parameters or different storage types.
For now, we can do this using the Kafka CLI tools but it is not very user-friendly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to mention kafka-reassign-partitions.sh tool

@@ -0,0 +1,71 @@
# Moving data between two JBOD disks using Cruise Control

This proposal is about integrating the `remove_disks` endpoint from Cruise Control into Strimzi cluster operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to link to the yaml


## Current situation

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better in the Motivationsection?

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
- When we want to use different Storage Class with different parameters or different storage types.
For now, we can do this using the Kafka CLI tools but it is not very user-friendly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reference to the kafka-reassign-partitions script right? If so, you could emphasize how it is devising the partition mappings to migrate data between disks in this scenario is a manual, time-consuming process. Solutions exist for moving data of brokers but not off specific disks of brokers which have JBOD config

# Moving data between two JBOD disks using Cruise Control

This proposal is about integrating the `remove_disks` endpoint from Cruise Control into Strimzi cluster operator.
This endpoint will allow us to move the data between two JBOD disks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it only allow moving data between two disks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, according to the endpoint definition (YAML). It would help to see it in action with curl commands.

- The user accepts the proposal by applying the `strimzi.io/rebalance=approve` annotation on it.
- The `KafkaRebalanceAssemblyOperator` starts the interaction with Cruise Control via the `/remove_disks` endpoint for running the actual rebalancing.

Note: The movement of data between the JBOD disks doesn't affect the broker load, therefore there will be no changes in the before/after broker load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mislead Shubham on this one, I thought it wouldn't affect the broker load since the data going in and out of the broker would be the same regardless of how the data was distributed amongst the disks. But I understand that if one disk was faster than another, moving the data between them within a broker certainly would impact the performance

@ShubhamRwt did we ever find out why the loadBefore/loadAfter information was not provided by this remove_disk endpoint? Is it just an oversight by upstream Cruise Control?

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShubhamRwt. I left just few comments. I will have another pass when out of draft state.

@@ -0,0 +1,71 @@
# Moving data between two JBOD disks using Cruise Control

This proposal is about integrating the `remove_disks` endpoint from Cruise Control into Strimzi cluster operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct link is this one, from LinkedIn repo.

You could also raise a documentation bug in Cruise Control for the missing endpoint documentation.

Two things I find weird about this endpoint:

  • The stop_ongoing_execution parameter, when we have a dedicated endpoint for that.
  • The fact the you use the same endpoint to poll for status updates, instead of using the user_tasks endpoint.

# Moving data between two JBOD disks using Cruise Control

This proposal is about integrating the `remove_disks` endpoint from Cruise Control into Strimzi cluster operator.
This endpoint will allow us to move the data between two JBOD disks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, according to the endpoint definition (YAML). It would help to see it in action with curl commands.

## Current situation

Currently, we get a multiple requests to add the ability for moving all Kafka logs between two disks on the JBOD storage array. This feature can be useful in following scenarios:
- The current disk is too big and the user wants to use smaller one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other way around too. Current disk is too small, and the user wants a bigger one, to avoid running out of disk space. Disk removal to reduce the total storage size is another use case.

strimzi.io/cluster: my-cluster
spec:
mode: remove-disks
brokerandlogdirs: [brokerid1-logdir1,brokerid2-logdir2...] # for eg. 0-/var/lib/kafka/data-0/kafka-log0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. That endpoint has a really bad user interface, and we should offer a better view.

## Proposal

Cruise Control provides `remove_disks` HTTP REST endpoint to move replicas from a specified disk to other disks of the same broker.
This endpoint allows to trigger a rebalancing operation by moving replicas in a round-robin manner to the remaining disks, from the largest to the smallest, while checking the following constraint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not round-robin, but a size-based scheduling.

```sh
remainingUsageAfterRemoval = current usage for remaining disks + additional usage from removed disks
remainingCapacity = sum of capacities of the remaining disks
errorMargin = configurable property (default 0.1); it makes sure that a disk percentage is always free when moving replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess users with very busy clusters may need more that 10% of available disk space. From the endpoint definition (YAML), I don't see how we can configure the errorMargine. Do you know how?

### Flow

- The user should be using the `Kafka` resource with JBOD configured.
- When the Kafka cluster is ready, the user creates a `KafkaRebalance` custom resource with the `spec.mode` field as `remove-disks` and the list of the broker and the corresponding logdirs to move in the `spec.brokerandlogdirs` field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have remove_disks on one place and remove-disks one another - it should be the same across whole proposal

@tombentley
Copy link
Member

Thanks @ShubhamRwt, what you describe here could certainly be made to work. But I'd like there to be some discussion about whether this is the right approach.

Before I get on to that, let's talk about terminology for a moment:

  • Kafka provides an API to move replicas (logs) between log dirs.
    The main (only?) use case for configuring a broker to use multiple log dirs is so that those directories are on separate disks/devices/volumes.
  • Cruise Control has decided to use the term "disks", rather than log dirs.
    *Kubernetes uses the term "volume" for something mounted into the container vfs.
  • Strimzi uses the Kubernetes terminology, so the Jbod storage API lists a number of volumes, which are each identified by integer ids.

I think this feature should stick with the established Strimzi terminology, so if we're going to add a new mode to the KafkaRebalance resource it should be mode: remove-volumes, rather than mode: remove-disks.

User experience

Kafka clusters need rebalancing because of uneven load on brokers imposed by the replicas they're hosting.
"Load" is composed of various resources that brokers have in limited supply (CPU cycles, disk space, network bandwidth)
Rebalancing estimates the load imposed by each replica on a broker and tries to rearrange partition leader- and followership to get a better balance.
That rearrangement involves moving the data between brokers.

But moving data between log dirs is quite different. It's not a global rebalance at all. What CC is providing is much less involved, because the space of possible changes is much smaller: It can only move replicas to the remaining disks on the same broker, and all it's using for that assignment (currently) is knowledge of the disk space required by each replica.

It's possible that these things are done by different people, and they're always done for different reasons. Removing disks is usually going to be done because it's related to some operational necessity. Rebalancing is usually going to be done to get better resource usage.

The schema of the KafkaRebalance CRD has all sorts of things which apply to the global rebalance case which do not apply for the volume removal case. Things like goals to do with CPU usage and network bandwidth are irrelevant for the volume removal case, because they'll be the same however the replicas are spread over the disks.

All this is to say that I'm not convinced by the approach of repurposing KafkaRebalance for this use case. I can see that it's convenient to lean on KafkaRebalance, because a lot of the underlying machinery will be the same.
But I don't think it fully addresses what the user wants to do.
With the KafkaRebalance approach the user will have to perform a multi-step change:

  1. Do the KafkaRebalance with mode: remove-volumes to move the data. This can take quite some time if there are many TB to move.
  2. Update the Kafka, or KafkaNodePool, to actually remove the volumes.
  3. Possibly do a global rebalance.

There's a few things that can go wrong between steps 1 and 2:

  • There's nothing in Kafka which will prevent replicas of new topics getting assigned to the to-be-removed volumes. We have some check in place to prevent disaster, but the fact remains that step 1 may need to be repeated if new topics get created.
  • The user can screw up and name different volumes in step 1 than in step 2. There are two cases:
    • The don't remove data off volumes in step which which they try to remove from brokers in step 2. The safety check prevents disaster, but they have to re-do step 1.
    • They move data off volumes in step 1 which aren't ultimately removed from the brokers in step 2.
      They then need to do step 3 to make use of those volumes again.
  • The user can do step 1 and then forget about step 2 entirely.

Further more we need to consider the complexity we're imposing on users with this approach. They need to operationally coordinate steps 1 and 2. If they're using gitops this would involve applying a change for step 1 and at some later point one for step 2. This might involve different people (e.g. on different shifts). This all adds to the burden we're placing on users.

I think we should consider if this is the experience we want users to have.

Possible alternative

One alternative would be for this to be driven from the Kafka (or KafkaNodePool) CRs:

  1. The user removes a volume from the JBOD storage
  2. The CO checks whether this volume contains any replicas
  3. If so, the CO:
    1. Checks for existing Cruise Control tasks: If any exist then we update the status with some new condition to say they volume removal is still pending, and proceed with the current reconciliation without taking further action. The user has to choice of cancelling those tasks themselves or waiting till they finish and a later CO reconciliation is able to proceed with the volume removal (at the next step). We could finesse this at some point so that, for example, KafkaRebalance itself declares whether it's auto-cancellable, and if so the CO could cancel it automatically.
    2. Eventually there are no on-going Cruise Control tasks, so the CO uses the remove_disks endpoint to orchestrate the data movement, updating the Kafka or KafkaNodePool CR status
    3. Eventually the remove_disks fails or succeeds: If it fails we give up and fail the reconciliation with a suitable message. If it succeeds then the to-be-removed volumes no longer have and replicas and we can proceed to the next step.
  4. The CO proceeds with deleting the PVCs and restarting those brokers.

This is certainly more complicated for us to implement. (It happens over a number of reconciliations, for a start). But I think it's much closer to what users actually need: In the normal case they have a single CR to interact with (the one which already declares the volumes for their brokers) and they have to interact with it only once, to express their intent.

With this alternative the KafkaRebalance remains a CR used for optimizing the cluster. The API it expresses is more "type safe", in the sense that it doesn't have a load of options to do with global rebalancing which get ignored by a special mode which isn't really a rebalance.

@ShubhamRwt
Copy link
Contributor Author

Hi @scholzj @ppatierno , I do think that the above suggestion by @tombentley looks good so do we want to put the remove-disk endpoint usage in Kafka resource?

@scholzj
Copy link
Member

scholzj commented Apr 10, 2024

I do not think it looks good.

  • The original design follows the existing logic for adding or removing brokers. The proposal from Tom creates yet another way to do what is basically the same thing.
  • As I explained in a different discussion the other day, we should avoid tight couplings in the KafkaAssemblyOperator. It is already today overloaded and tightly coupling it with Cruise Control would make it even worse. It will make it hard to test things.

It might make sense to one day automate the removal of the disks. But I think that should be left to another proposal. It should be also done through the KafkaRebalance resources to decouple the logic of the different operations.

@ShubhamRwt
Copy link
Contributor Author

@ppatierno @tombentley Hi, It would be great to have some views from you too , I will be starting to re-write this.

@ShubhamRwt
Copy link
Contributor Author

Another thing to consider is that upstream CC is not so happy to enable the verbose parameter for this remove-disk endpoint while all the other CC endpoints that we currently have in strimzi have the verbose parameter enabled. So the issue we will get with it is that we won't be able to get any loadBeforeOptimization part and that can cause error since we need both loadBeforeOptimization and loadAfterOptimization for proposal generation in Strimzi. I am currently trying to seek a work around for that. I am not sure why upstream CC has disabled verbose for this endpoint but on testing my self I was able to get this by enabling verbose -> https://gist.github.com/ShubhamRwt/f09f7c1a856eaa452b2696a86f1b7896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants