Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic node decommission progress API #8167

Merged

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jan 11, 2023

Added REST endpoint allowing caller to query node decommissioning status.

GET /v1/brokers/<id>/decommission

When node is being decommission Redpanda automatically move all the replicas from
the decommissioned node to other nodes. This process
may take a long time as it involves data transfer between nodes. Added
REST API will allow user to observe progress of decommissioning process.

Fixes: #7874

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

Release Notes

Improvements

  • new admin API GET /v1/brokers/<id>/decommission

@mmaslankaprv mmaslankaprv changed the title Node decommission monitoring Basic node decommission progress API Jan 11, 2023
@mmaslankaprv mmaslankaprv force-pushed the node-decommission-monitoring branch 3 times, most recently from e42038e to 9acee6a Compare January 11, 2023 14:16
@mmaslankaprv mmaslankaprv assigned bharathv and unassigned bharathv Jan 11, 2023
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is just awesome.., think we should hook this up as rpk decommission status <node_id>.. mostly nits, lgtm.

src/v/cluster/controller_api.cc Show resolved Hide resolved
src/v/cluster/controller_api.cc Show resolved Hide resolved
Comment on lines +385 to +388
if (moving_to) {
it->second.already_transferred_bytes.emplace_back(
replica_bytes{
.node = node_report.id, .bytes = p.size_bytes});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 🔥 super useful

status.moving_to = moving_to;
size_t left_to_move = 0;
for (auto as : p.already_transferred_bytes) {
left_to_move += (p.current_partition_size - as.bytes);
Copy link
Contributor

@bharathv bharathv Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: should we pair left_to_move with either already_transferred_bytes or current_total_size or both (even better) because if data is getting appended to the partition at a faster pace than recovery, left_to_move may be going up and that is not intuitive to the operator. Think if they see the other two metrics, they can see that the partition is being written to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good point, i will add this information to the response

src/v/cluster/controller_backend.cc Outdated Show resolved Hide resolved
@mmaslankaprv mmaslankaprv force-pushed the node-decommission-monitoring branch 3 times, most recently from 75dd68b to b2e336e Compare January 12, 2023 10:26
bharathv
bharathv previously approved these changes Jan 12, 2023
Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider one small nit, otherwise lgtm 🔥

"type": "long",
"description": "bytes left to move to new replica"
},
"target_size": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using a "total_bytes_to_move" analogous to "bytes_left_to_move".. I think "target" will be a bit vague to the operator. because there are two parties here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i will include current partition size and amount of data that were transfer to target node

Comment on lines +119 to +124
def wait_for_removal(self):
self.last_update = time.time()
# wait for removal only if progress was reported
while self._made_progress():
try:
decommission_status = self.admin.get_decommission_status(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Introduced `cluster::reconfiguration_state` a top level type indicating
the reconfiguration state as the ongoing reconfiguration may be result
of partition move or its cancellation.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added method allowing caller to query node decommissioning status. When
node is decommission Redpanda automatically move all the replicas from
the decommissioned node and reassign them to another nodes. This process
may take a long time as it involves data transfer between nodes. Added
API will allow user to observe progress of decommissioning process.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added `GET /v1/brokers/<id>/decommission` endpoint that returns basic
information about node decommissioning progress.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added simple helper that waits for node removal and checks progress. It
will be used in tests to detect stuck decommissioning operations.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv
Copy link
Member Author

mmaslankaprv commented Jan 13, 2023

ci failure: #7758

@mmaslankaprv mmaslankaprv merged commit 643d77f into redpanda-data:dev Jan 13, 2023
@daisukebe
Copy link
Contributor

This is great and helpful! Can we backport this to 22.3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: test_flipping_decommission_recommission timing out
4 participants