Improve graceful shutdown of RegionSevers

Relevant docs: https://hbase.apache.org/book.html#decommission
Relevant script: [graceful_stop.sh](https://github.com/apache/hbase/blob/master/bin/graceful_stop.sh)
Relevant class: [org.apache.hadoop.hbase.util.RegionMover](https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java), with relevant [function](https://github.com/apache/hbase/blob/8efd67b9aaf629e338ddc0283e29beec1adcfeaf/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java#L538)

In https://github.com/stackabletech/hbase-operator/issues/400 we implemented a graceful shutdown for all HBase components which is similar to `./bin/hbase-daemon.sh stop <service>`. While this works in general it has downsides, such regions being offline for some time, resulting in (short) outages.

Instead we should try to call or mimic `graceful_stop.sh`. The graceful_stop.sh script will move the regions off the decommissioned RegionServer one at a time to minimize region churn. It will verify the region deployed in the new location before it will moves the next region and so on until the decommissioned server is carrying zero regions. At this point, the graceful_stop.sh tells the RegionServer stop. The master will at this point notice the RegionServer gone but all regions will have already been redeployed and because the RegionServer went down cleanly, there will be no WAL logs to split.

```[tasklist]
### Acceptance criteria
- [x] Must: Call or mimic `graceful_stop.sh`
- [x] (see comment) Must: The docs say "Disable the Load Balancer before Decommissioning a node". We found a solution to this by either doing so or making sure we (or our customers) are not using LBs
- [x] Should: Decommissioning several Regions Servers concurrently: To gracefully drain multiple regionservers at the same time, RegionServers can be put into a "draining" state. This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper under the hbase_root/draining znode. Watch out to clean up or make sure the regionserver does this when starting up again
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve graceful shutdown of RegionSevers #508

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve graceful shutdown of RegionSevers #508

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions