-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic node decommission progress API #8167
Basic node decommission progress API #8167
Conversation
e42038e
to
9acee6a
Compare
9acee6a
to
48141be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is just awesome.., think we should hook this up as rpk decommission status <node_id>
.. mostly nits, lgtm.
if (moving_to) { | ||
it->second.already_transferred_bytes.emplace_back( | ||
replica_bytes{ | ||
.node = node_report.id, .bytes = p.size_bytes}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 🔥 super useful
src/v/redpanda/admin_server.cc
Outdated
status.moving_to = moving_to; | ||
size_t left_to_move = 0; | ||
for (auto as : p.already_transferred_bytes) { | ||
left_to_move += (p.current_partition_size - as.bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: should we pair left_to_move with either already_transferred_bytes
or current_total_size
or both (even better) because if data is getting appended to the partition at a faster pace than recovery, left_to_move may be going up and that is not intuitive to the operator. Think if they see the other two metrics, they can see that the partition is being written to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good point, i will add this information to the response
75dd68b
to
b2e336e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider one small nit, otherwise lgtm 🔥
"type": "long", | ||
"description": "bytes left to move to new replica" | ||
}, | ||
"target_size": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider using a "total_bytes_to_move" analogous to "bytes_left_to_move".. I think "target" will be a bit vague to the operator. because there are two parties here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, i will include current partition size and amount of data that were transfer to target node
def wait_for_removal(self): | ||
self.last_update = time.time() | ||
# wait for removal only if progress was reported | ||
while self._made_progress(): | ||
try: | ||
decommission_status = self.admin.get_decommission_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Introduced `cluster::reconfiguration_state` a top level type indicating the reconfiguration state as the ongoing reconfiguration may be result of partition move or its cancellation. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added method allowing caller to query node decommissioning status. When node is decommission Redpanda automatically move all the replicas from the decommissioned node and reassign them to another nodes. This process may take a long time as it involves data transfer between nodes. Added API will allow user to observe progress of decommissioning process. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added `GET /v1/brokers/<id>/decommission` endpoint that returns basic information about node decommissioning progress. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added simple helper that waits for node removal and checks progress. It will be used in tests to detect stuck decommissioning operations. Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
b2e336e
to
35fe043
Compare
ci failure: #7758 |
This is great and helpful! Can we backport this to 22.3? |
Added REST endpoint allowing caller to query node decommissioning status.
When node is being decommission Redpanda automatically move all the replicas from
the decommissioned node to other nodes. This process
may take a long time as it involves data transfer between nodes. Added
REST API will allow user to observe progress of decommissioning process.
Fixes: #7874
Backports Required
UX Changes
Release Notes
Improvements
GET /v1/brokers/<id>/decommission