Basic node decommission progress API #8167

mmaslankaprv · 2023-01-11T13:30:47Z

Added REST endpoint allowing caller to query node decommissioning status.

GET /v1/brokers/<id>/decommission

When node is being decommission Redpanda automatically move all the replicas from
the decommissioned node to other nodes. This process
may take a long time as it involves data transfer between nodes. Added
REST API will allow user to observe progress of decommissioning process.

Fixes: #7874

Backports Required

UX Changes

Release Notes

Improvements

new admin API GET /v1/brokers/<id>/decommission

bharathv

This patch is just awesome.., think we should hook this up as rpk decommission status <node_id>.. mostly nits, lgtm.

src/v/cluster/controller_api.cc

bharathv · 2023-01-11T22:10:37Z

src/v/cluster/controller_api.cc

+                if (moving_to) {
+                    it->second.already_transferred_bytes.emplace_back(
+                      replica_bytes{
+                        .node = node_report.id, .bytes = p.size_bytes});


this is 🔥 super useful

bharathv · 2023-01-11T22:18:44Z

src/v/redpanda/admin_server.cc

+        status.moving_to = moving_to;
+        size_t left_to_move = 0;
+        for (auto as : p.already_transferred_bytes) {
+            left_to_move += (p.current_partition_size - as.bytes);


q: should we pair left_to_move with either already_transferred_bytes or current_total_size or both (even better) because if data is getting appended to the partition at a faster pace than recovery, left_to_move may be going up and that is not intuitive to the operator. Think if they see the other two metrics, they can see that the partition is being written to.

very good point, i will add this information to the response

src/v/cluster/controller_backend.cc

bharathv

Please consider one small nit, otherwise lgtm 🔥

bharathv · 2023-01-12T16:15:10Z

src/v/redpanda/admin/api-doc/broker.json

+                    "type": "long",
+                    "description": "bytes left to move to new replica"
+                },
+                "target_size": {


nit: consider using a "total_bytes_to_move" analogous to "bytes_left_to_move".. I think "target" will be a bit vague to the operator. because there are two parties here.

good point, i will include current partition size and amount of data that were transfer to target node

dotnwat · 2023-01-12T18:57:22Z

tests/rptest/utils/node_operations.py

+    def wait_for_removal(self):
+        self.last_update = time.time()
+        # wait for removal only if progress was reported
+        while self._made_progress():
+            try:
+                decommission_status = self.admin.get_decommission_status(


Introduced `cluster::reconfiguration_state` a top level type indicating the reconfiguration state as the ongoing reconfiguration may be result of partition move or its cancellation. Signed-off-by: Michal Maslanka <michal@redpanda.com>

Signed-off-by: Michal Maslanka <michal@redpanda.com>

Added method allowing caller to query node decommissioning status. When node is decommission Redpanda automatically move all the replicas from the decommissioned node and reassign them to another nodes. This process may take a long time as it involves data transfer between nodes. Added API will allow user to observe progress of decommissioning process. Signed-off-by: Michal Maslanka <michal@redpanda.com>

Added `GET /v1/brokers/<id>/decommission` endpoint that returns basic information about node decommissioning progress. Signed-off-by: Michal Maslanka <michal@redpanda.com>

Signed-off-by: Michal Maslanka <michal@redpanda.com>

Added simple helper that waits for node removal and checks progress. It will be used in tests to detect stuck decommissioning operations. Signed-off-by: Michal Maslanka <michal@redpanda.com>

Signed-off-by: Michal Maslanka <michal@redpanda.com>

mmaslankaprv · 2023-01-13T13:26:18Z

ci failure: #7758

daisukebe · 2023-01-16T05:43:55Z

This is great and helpful! Can we backport this to 22.3?

[v22.3.x] Backports of #7862 #8139 #8167 #8245 #8181

github-actions bot added the area/redpanda label Jan 11, 2023

mmaslankaprv changed the title ~~Node decommission monitoring~~ Basic node decommission progress API Jan 11, 2023

mmaslankaprv force-pushed the node-decommission-monitoring branch 3 times, most recently from e42038e to 9acee6a Compare January 11, 2023 14:16

mmaslankaprv requested a review from dotnwat January 11, 2023 14:27

mmaslankaprv assigned bharathv and unassigned bharathv Jan 11, 2023

mmaslankaprv requested review from bharathv and ZeDRoman January 11, 2023 14:27

mmaslankaprv force-pushed the node-decommission-monitoring branch from 9acee6a to 48141be Compare January 11, 2023 16:03

bharathv reviewed Jan 11, 2023

View reviewed changes

mmaslankaprv force-pushed the node-decommission-monitoring branch 3 times, most recently from 75dd68b to b2e336e Compare January 12, 2023 10:26

mmaslankaprv requested a review from bharathv January 12, 2023 13:12

bharathv previously approved these changes Jan 12, 2023

View reviewed changes

dotnwat reviewed Jan 12, 2023

View reviewed changes

mmaslankaprv added 7 commits January 13, 2023 08:01

c/topic_table: added method returning number of node allocated replicas

f41ba7b

Signed-off-by: Michal Maslanka <michal@redpanda.com>

admin: added basic endpoint to query node decommission status

be300b7

Added `GET /v1/brokers/<id>/decommission` endpoint that returns basic information about node decommissioning progress. Signed-off-by: Michal Maslanka <michal@redpanda.com>

tests/admin: added ability to query node decommission status

ab37ac8

Signed-off-by: Michal Maslanka <michal@redpanda.com>

tests: added helper that waits for node being removed

abb9221

Added simple helper that waits for node removal and checks progress. It will be used in tests to detect stuck decommissioning operations. Signed-off-by: Michal Maslanka <michal@redpanda.com>

tests: use decommission waiter in nodes decommission test

35fe043

Signed-off-by: Michal Maslanka <michal@redpanda.com>

mmaslankaprv dismissed bharathv’s stale review via 35fe043 January 13, 2023 12:06

mmaslankaprv force-pushed the node-decommission-monitoring branch from b2e336e to 35fe043 Compare January 13, 2023 12:06

mmaslankaprv requested a review from dotnwat January 13, 2023 12:08

mmaslankaprv requested a review from bharathv January 13, 2023 12:08

bharathv approved these changes Jan 13, 2023

View reviewed changes

mmaslankaprv merged commit 643d77f into redpanda-data:dev Jan 13, 2023

mmaslankaprv mentioned this pull request Jan 17, 2023

[v22.3.x] Backports of #7862 #8139 #8167 #8245 #8181 #8265

Merged

This was referenced Jan 17, 2023

rpk: add rpk command to show decommission progress #8268

Closed

admin: correct added_replicas calculation #8347

Merged

rpk: show decommissioning progress #8376

Merged

bharathv added a commit that referenced this pull request Feb 3, 2023

Merge pull request #8265 from mmaslankaprv/v22.3.x

cafc6c3

[v22.3.x] Backports of #7862 #8139 #8167 #8245 #8181

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic node decommission progress API #8167

Basic node decommission progress API #8167

mmaslankaprv commented Jan 11, 2023 •

edited

bharathv left a comment

bharathv Jan 11, 2023

bharathv Jan 11, 2023 •

edited

mmaslankaprv Jan 12, 2023

bharathv left a comment

bharathv Jan 12, 2023

mmaslankaprv Jan 13, 2023

dotnwat Jan 12, 2023

mmaslankaprv commented Jan 13, 2023 •

edited

daisukebe commented Jan 16, 2023

Basic node decommission progress API #8167

Basic node decommission progress API #8167

Conversation

mmaslankaprv commented Jan 11, 2023 • edited

Backports Required

UX Changes

Release Notes

Improvements

bharathv left a comment

Choose a reason for hiding this comment

bharathv Jan 11, 2023

Choose a reason for hiding this comment

bharathv Jan 11, 2023 • edited

Choose a reason for hiding this comment

mmaslankaprv Jan 12, 2023

Choose a reason for hiding this comment

bharathv left a comment

Choose a reason for hiding this comment

bharathv Jan 12, 2023

Choose a reason for hiding this comment

mmaslankaprv Jan 13, 2023

Choose a reason for hiding this comment

dotnwat Jan 12, 2023

Choose a reason for hiding this comment

mmaslankaprv commented Jan 13, 2023 • edited

daisukebe commented Jan 16, 2023

mmaslankaprv commented Jan 11, 2023 •

edited

bharathv Jan 11, 2023 •

edited

mmaslankaprv commented Jan 13, 2023 •

edited