Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article about migrating from Consul to memberlist. Added documentation for /memberlist endpoint. #2166

Merged
merged 11 commits into from Jun 24, 2022
2 changes: 2 additions & 0 deletions CHANGELOG.md
Expand Up @@ -102,6 +102,8 @@
* [ENHANCEMENT] Clarify "Set rule group" API specification. #1869
* [ENHANCEMENT] Published Mimir jsonnet documentation. #2024
* [ENHANCEMENT] Documented required scrape interval for using alerting and recording rules from Mimir jsonnet. #2147
* [ENHANCEMENT] Added step-by-step article about migrating from Consul to Memberlist KV store using jsonnet without downtime. #2166
* [ENHANCEMENT] Documented `/memberlist` admin page. #2166
* [ENHANCEMENT] Documented how to configure queriers’ autoscaling with Jsonnet. #2128
* [BUGFIX] Fixed ruler configuration used in the getting started guide. #2052
* [BUGFIX] Fixed Mimir Alertmanager datasource in Grafana used by "Play with Grafana Mimir" tutorial. #2115
Expand Down
@@ -0,0 +1,142 @@
---
title: "Migrating from Consul to Memberlist KV store for hash rings without downtime"
menuTitle: "Migrating from Consul to Memberlist"
description: "Learn how to migrate from using Consul as KV store for hash rings to using memberlist without any downtime."
weight: 40
---

# Migrating from Consul to Memberlist KV store for hash rings without downtime

Mimir Jsonnet uses `memberlist` as KV store for hash rings since Mimir 2.2.0.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved

Memberlist can be disabled by using the following configuration:

```jsonnet
{
_config+:: {
memberlist_ring_enabled: false
}
}
```

If you are running Mimir hash rings with Consul and would like to migrate to `memberlist` without any downtime, you can follow instructions in this document.

## Step 1: Enable memberlist and multi KV store.

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
multikv_migration_enabled: true,
}
}
```

Step 1 configures components to use `multi` KV store, with `consul` as primary and `memberlist` as secondary stores.
This step requires rollout of all Mimir components.
After applying this step all Mimir components will expose [`/memberlist`]({{< relref "../../reference-http-api/index.md#memberlist-cluster" >}}) page on HTTP admin interface, which can be used to check health of Memberlist cluster.

## Step 2: Enable KV store mirroring

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
multikv_migration_enabled: true,
multikv_mirror_enabled: true, // Changed in this step.
}
}
```

In this step we enable writes to primary KV store (Consul) to be mirrored into secondary store (Memberlist).
Applying this change will not cause restart of Mimir components.

You can monitor following metrics to check if mirroring was enabled on all components and if it works correctly:

- `cortex_multikv_mirror_enabled` – shows which components have KV store mirroring enabled. All Mimir components should start mirroring to secondary KV store reloading runtime configuration.
- `rate(cortex_multikv_mirror_writes_total[1m])` – shows rate of writes to secondary KV store in writes per second.
- `rate(cortex_multikv_mirror_write_errors_total[1m])` – shows rate of write errors to secondary KV store, in errors per second.

After mirroring is enabled, you should see a key for each Mimir hash ring in the [Memberlist cluster information]({{< relref "../../reference-http-api/index.md#memberlist-cluster" >}}) admin page.
See [list of components that use hash ring]({{< relref "../../architecture/hash-ring/index.md" >}}).

## Step 3: Switch Primary and Secondary store

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
multikv_migration_enabled: true,
multikv_mirror_enabled: true,
multikv_switch_primary_secondary: true, // Changed in this step.
}
}
```

This change will switch primary and secondary stores as used by `multi` KV.
From this point on components will use memberlist as primary KV store, and they will mirror updates to Consul.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved
This step does not require restart of Mimir components.

To see if all components started to use `memberlist` as primary store, please watch `cortex_multikv_primary_store` metric.

## Step 4: Disable mirroring to Consul

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
multikv_migration_enabled: true,
multikv_mirror_enabled: false, // Changed in this step.
multikv_switch_primary_secondary: true,
}
}
```

This step does not require restart of any Mimir component. After applying the change components will stop writing ring updates to Consul, and will only use `memberlist`.
You can watch `cortex_multikv_mirror_enabled` metric to see if all components have picked up updated configuration.

## Step 5: Disable `multi` KV Store

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
multikv_migration_enabled: false, // Changed in this step.
multikv_mirror_enabled: false,
multikv_switch_primary_secondary: true,
multikv_migration_teardown: true, // Added in this step.
}
}
```

This configuration change will cause a new rollout of all components.
After the restart components will no longer use `multi` KV store and will be configured to use `memberlist` only.
We use `multikv_migration_teardown` to preserve runtime configuration for `multi` KV store for components that haven't restarted yet.

All `cortex_multikv_*` metrics are only exposed by components that use `multi` KV store. As components restart, these metrics will disappear.

> **Note**: setting `multikv_migration_enabled: false` while keeping `memberlist_ring_enabled: true` will also remove Consul! That's expected, since Consul is not used anymore – mirroring to it was disabled in step 4.

If you need to keep consul running, you can explicitly set `consul_enabled: true` in `_config`.

## Step 6: Cleanup

We have successfully migrated Mimir cluster from using Consul to Memberlist without any downtime!
As a final step, we can remove all migration-related config options:

- `multikv_migration_enabled`
- `multikv_mirror_enabled`
- `multikv_switch_primary_secondary`
- `multikv_migration_teardown`

Our final memberlist configuration will be:

```jsonnet
{
_config+:: {
memberlist_ring_enabled: true,
}
}
```

This will not trigger new restart of components. After applying this change, you are finished.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved
13 changes: 13 additions & 0 deletions docs/sources/operators-guide/reference-http-api/index.md
Expand Up @@ -32,6 +32,7 @@ This document groups API endpoints by service. Note that the API endpoints are e
| [Pprof](#pprof) | _All services_ | `GET /debug/pprof` |
| [Fgprof](#fgprof) | _All services_ | `GET /debug/fgprof` |
| [Build information](#build-information) | _All services_ | `GET /api/v1/status/buildinfo` |
| [Memberlist cluster](#memberlist-cluster) | _All services_ | `GET /memberlist` |
| [Remote write](#remote-write) | Distributor | `POST /api/v1/push` |
| [Tenants stats](#tenants-stats) | Distributor | `GET /distributor/all_user_stats` |
| [HA tracker status](#ha-tracker-status) | Distributor | `GET /distributor/ha_tracker` |
Expand Down Expand Up @@ -208,6 +209,18 @@ GET <alertmanager-http-prefix>/api/v1/status/buildinfo

This endpoint returns in JSON format information about the build and enabled features. The format returned is not identical, but is similar to the [Prometheus Build Information endpoint](https://prometheus.io/docs/prometheus/latest/querying/api/#build-information).

### Memberlist cluster

```
GET /memberlist
```

This admin page shows information about Memberlist cluster (list of nodes and their health) and KV store (keys and values in the KV store).

If memberlist message history is enabled, this page also shows all received and sent messages stored in the buffers. This can be useful for troubleshooting memberlist cluster. To enable message history buffers use `-memberlist.message-history-buffer-bytes` option or the corresponding config key.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved

This page exists even if Mimir doesn't use memberlist for any of the hash rings, but will only show message about not using memberlist in that case.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved

## Distributor

The following endpoints relate to the [distributor]({{< relref "../architecture/components/distributor.md" >}}).
Expand Down
18 changes: 1 addition & 17 deletions operations/mimir/memberlist.libsonnet
Expand Up @@ -19,23 +19,7 @@
// but "primary" KV depends on value of multikv_primary.
memberlist_ring_enabled: true,

// Migrating from consul to memberlist is a multi-step process:
//
// 1) Enable memberlist_ring_enabled=true and multikv_migration_enabled=true, restart components.
//
// 2) Set multikv_mirror_enabled=true. This doesn't require restart.
//
// 3) Set multikv_switch_primary_secondary=true. This doesn't require restart. From this point on components use memberlist as primary KV store!
//
// 4) Set multikv_mirror_enabled=false. Stop mirroring writes to Consul. Doesn't require restart.
//
// 5) Set multikv_migration_enabled=false and multikv_migration_teardown=true. This requires a restart.
// After restart components will only use memberlist. Using multikv_migration_teardown=true guarantees that runtime config
// with multi KV configuration is preserved for components that haven't restarted yet.
//
// Note: this also removes Consul. That's fine, because it's not used anymore (mirroring to it was disabled in step 4).
//
// 6) Set multikv_migration_teardown=false. This step removes runtime configuration for multi KV. It doesn't require a restart of components.
// To migrate from Consul to Memberlist check ""Migrating from Consul to Memberlist KV store for hash rings" article.
pstibrany marked this conversation as resolved.
Show resolved Hide resolved
multikv_migration_enabled: false, // Enable multi KV.
multikv_migration_teardown: false, // If multikv_migration_enabled=false and multikv_migration_teardown=true, runtime configuration for multi KV is preserved.
multikv_switch_primary_secondary: false, // Switch primary and secondary KV stores in runtime configuration for multi KV.
Expand Down