Skip to content

Commit

Permalink
docs: added two node operations documentation
Browse files Browse the repository at this point in the history
Added docs about version upgrade
Added docs about maintenance mode
  • Loading branch information
zimnx committed Jan 4, 2021
1 parent 3b6ece8 commit 7758a12
Showing 1 changed file with 125 additions and 2 deletions.
127 changes: 125 additions & 2 deletions docs/source/node_operations.md
@@ -1,5 +1,108 @@
# Node operations using Scylla Operator

### Upgrading version of Scylla

To upgrade Scylla version using Operator user have to modify existing ScyllaCluster definition.

In this example cluster will be upgraded to `2020.1.0` version.
```bash
$ kubectl -n scylla patch ScyllaCluster simple-cluster -p '{"spec":{"version": "4.2.2}}' --type=merge
```

Operator supports two types of version upgrades:
1. Patch upgrade
1. Generic upgrade


**Patch upgrade**

Patch upgrade is executed when only patch version change is detected according to [semantic versioning format](https://semver.org/).
Procedure simply rolls out a restart of whole cluster and upgrades Scylla container image for each node one by one.

Example: `4.0.0 -> 4.0.1`

**Generic upgrade**

Generic upgrades are executed for the non patch version changes.

Example: `4.0.0 -> 2020.1.0` or `4.0.0 -> 4.1.0` or even `4.0.0 -> nightly`

User can observe current state of upgrade in ScyllaCluster status.
```bash
$ kubectl -n scylla describe ScyllaCluster simple-cluster
[...]
Status:
Racks:
us-east-1a:
Members: 3
Ready Members: 3
Version: 4.1.9
Upgrade:
Current Node: simple-cluster-us-east-1-us-east-1a-2
Current Rack: us-east-1a
Data Snapshot Tag: so_data_20201228135002UTC
From Version: 4.1.9
State: validate_upgrade
System Snapshot Tag: so_system_20201228135002UTC
To Version: 4.2.2
```

Each upgrade begins with taking a snapshot of `system` and `system_schema` keyspaces on all nodes in parallel.
Name of this snapshot tag is saved in upgrade status under `System Snapshot Tag`.

Before nodes in rack are upgraded, underlying StatefulSet is changed to use `OnDelete` UpgradeStrategy.
This allows Operator have a full control over when Pod image is changed.

When a node is being upgraded, [maintenance mode](#maintenance-mode) is enabled, then the node is drained and snapshot of all data keyspaces is taken.
Snapshot tag is saved under `Data Snapshot Tag` and is the same for all nodes during the procedure.
Once everything is set up, maintenance mode is disabled and Scylla Pod is deleted. Underlying StatefulSet will bring up a new
Pod with upgraded version.
Once Pod will become ready, data snapshot from this particular node is removed, and Operator moves to next node.

Once every rack is upgraded, system snapshot is removed from all nodes in parallel and previous StatefulSet UpgradeStrategy is restored.
At this point, all your nodes should be already in desired version.

Current state of upgrade can be traced using `Current Node`, `Current Rack` and `State` status fields.
* `Current Node` shows which node is being upgraded.
* `Current Rack` displays which rack is being upgraded.
* `State` contain information at which stage upgrade is.

`State` can have following values:
* `begin_upgrade` - upgrade is starting
* `check_schema_agreement` - Operator waits until all nodes reach schema agreement. It waits for it for 1 minute, prints an error log message and check is retried.
* `create_system_backup` - system keyspaces snapshot is being taken
* `find_next_rack` - Operator finds out which rack must be upgraded next, decision is saved in `Current Rack`
* `upgrade_image_in_pod_spec` - Image and UpgradeStrategy is upgraded in underlying StatefulSet
* `find_next_node` - Operator finds out which node must be upgraded next, decision is saved in `Current Node`
* `enable_maintenance_mode` - maintenance mode is being enabled
* `drain_node` - node is being drained
* `backup_data` - snapshot of data keyspaces is being taken
* `disable_maintenance_mode` - maintenance mode is being disabled
* `delete_pod` - Scylla Pod is being deleted
* `validate_upgrade` - Operator validates if new pod enters Ready state and if Scylla version is upgraded
* `clear_data_backup` - snapshot of data keyspaces is being removed
* `clear_system_backup` - snapshot of system keyspaces is being removed
* `restore_upgrade_strategy` - restore UpgradeStrategy in underlying StatefulSet
* `finish_upgrade` - upgrade cleanup

**Recovering from upgrade failure**

Upgrade may get stuck on `validate_upgrade` stage. This happens when Scylla Pod refuses to properly boot up.

To continue with upgrade, first turn off operator by scaling Operator replicas to zero:
```bash
$ kubectl -n scylla-operator-system scale sts scylla-operator-controller-manager --replicas=0
```
Then user have to manually resolve issue with Scylla by checking what is the root cause of a failure in Scylla container logs.
If needed data and system keyspaces SSTable snapshots are available on the node. You can check ScyllaCluster status for their names.

Once issue is resolved and Scylla Pod is up and running (Pod is in Ready state), scale Operator back to one replica:
```bash
$ kubectl -n scylla-operator-system scale sts scylla-operator-controller-manager --replicas=1
```

Operator should continue upgrade process from where it left off.

### Replacing a Scylla node
In the case of a host failure, it may not be possible to bring back the node to life.

Expand Down Expand Up @@ -31,7 +134,7 @@ _This procedure is for replacing one dead node. To replace more than one dead no
simple-cluster-us-east-1-us-east-1a-1 ClusterIP 10.43.125.110 <none> 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 3h11m
simple-cluster-us-east-1-us-east-1a-2 ClusterIP 10.43.43.51 <none> 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 3h5m
```
1. Drain node which we would like to replace using
1. Drain node which we would like to replace using. **This command may delete your data from local disks attached to given node!**
```bash
$ kubectl drain gke-scylla-demo-default-pool-b4b390a1-6j12 --ignore-daemonsets --delete-local-data
```
Expand Down Expand Up @@ -77,4 +180,24 @@ _This procedure is for replacing one dead node. To replace more than one dead no
In case when your k8s cluster loses one of the nodes due to incident or explicit removal, Scylla Pods may become unschedulable due to PVC node affinity.

When `automaticOrphanedNodeCleanup` flag is enabled in your ScyllaCluster, Scylla Operator will perform automatic
node replacement of a Pod which lost his bound resources.
node replacement of a Pod which lost his bound resources.

### Maintenance mode

When maintenance mode is enabled, readiness probe of Scylla Pod will always return failure and liveness probe will always succeed. This causes that Pod under maintenance
is being removed from K8s Load Balancer and DNS registry but Pod itself stays alive.

This allows the Scylla Operator to interact with Scylla and Scylla dependencies inside the Pod.
For example user may turn off Scylla process, do something with the filesystem and bring the process back again.

To enable maintenance mode add `scylla/node-maintenance` label to service in front of Scylla Pod.

```bash
$ kubectl -n scylla label svc simple-cluster-us-east1-b-us-east1-2 scylla/node-maintenance=""
```

To disable, simply remove this label from service.

```bash
$ kubectl -n scylla label svc simple-cluster-us-east1-b-us-east1-2 scylla/node-maintenance-
```

0 comments on commit 7758a12

Please sign in to comment.