Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add Documentation to do HA deduplication on Mimir Helm Deployment #2972

Merged
merged 57 commits into from Oct 20, 2022

Conversation

lamida
Copy link
Contributor

@lamida lamida commented Sep 16, 2022

What this PR does

Add Documentation to do HA deduplication on Mimir Helm Deployment.

Which issue(s) this PR fixes or relates to

Fixes #2597

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@lamida lamida changed the title Add Documentation to do HA deduplication on Mimir Helm Deployment Docs: Add Documentation to do HA deduplication on Mimir Helm Deployment Sep 16, 2022
@lamida lamida added the type/docs Improvements or additions to documentation label Sep 19, 2022
Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! In general I think we need to add Grafana agent setup and assume that the user already has a working Mimir - to avoid duplicating documentation.

@lamida
Copy link
Contributor Author

lamida commented Sep 19, 2022

@krajorama please help to do another pass when you are free. I will unset draft from this PR too now.

Changes from your first pass:

  • Removed install mimir step and make mimir installated as prerequisite
  • Because reference to install mimir is removed therefore I don't see it makes sense to keep Prometheus helm install guide too
  • Removed grafana port forwarding and data source setup
  • Added Grafana agent setup (although the config and setup are exactly same like Prometheus)

@lamida lamida marked this pull request as ready for review September 19, 2022 14:28
@lamida lamida requested review from osg-grafana and a team as code owners September 19, 2022 14:28
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for writing this! I would also add a mention to this article in the existing "Configuring Grafana Mimir high-availability deduplication"


# Configuring Grafana Mimir Helm Chart for high-availability deduplication with Consul

Grafana Mimir can deduplicate data from high-availability (HA) Prometheus setup. In this guide, you will see how to configure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd also add a link to the HA dedup docs here, since here is the first mention of HA dedup in this doc.

You also have to configure HA for Prometheus or Grafana Agent. Last, you need a KV store. In this guide you will
use Consul. You will be guided on the setup if you haven't had one.

## Configure Prometheus or Grafana Agent to send HA external labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by the other guide. Should we repeat the info here too?

Copy link
Contributor Author

@lamida lamida Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that keep referring to the external document will make the reader keep jumping around to different documents. External label configuration is the most important part of HA configuration, hence I feel that for good reading/guide continuity, it will be better to keep this here.

Copy link
Contributor

@osg-grafana osg-grafana Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We need to add a note about what content is copied and pasted between documents so that we name one source as definitive. This is a workaround because we do not have single-sourcing capabilities for documentation that is this granular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to put a note that this section contains similar content from another HA doc. Please review and advise if this note is useful or better just to remove it.

weight: 70
---

# Configuring Grafana Mimir Helm Chart for high-availability deduplication with Consul
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the name of the chart - mimir-distributed? Maybe a tech writer can also pitch in, but using the name seems more concrete and searchable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look at this soon; on UTC-4 atm and might not get to this until next Monday.

Comment on lines 47 to 50
Install Consul in Kubernetes using
[Consul helm chart](https://github.com/hashicorp/consul-k8s/tree/main/charts/consul). Follow the documentation to get
the chart and installing a Consul release. Put a note on the Consul endpoint, because you will need it for Mimir
configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried installing consul. Is there anything specific about this installation that they should do? Like configure this as X or the other thing as [A, B, C]? or the default config would work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default config will work.


### HA Deduplication Global

You need a Mimir setup installed by Helm. Add the following configuration to your `values.yaml` to configure HA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick about values.yaml. I think we've used custom.yaml in a few articles so far to refer to the values file that the user supplies. It makes it obvious that it's a different file from the values.yaml in the root of the helm chart. Those values.yaml are the default ones and helm uses them every time, even without providing a flag.

helm -n <mimir-namespace> upgrade --install mimir grafana/mimir-distributed -f values.yaml
```

## Verifying deduplication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distributor also exposes some metrics around this. I think it's worth mentioning. They all start with cortex_ha_tracker_, I think the ones that start with cortex_ha_tracker_elected_replica_ will be most useful to users.

These are also exposed on the "Mimir / Tenants" dashboard in the "Distributor deduplicated/non-HA" panel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on another topic - what do you think about moving this section to the other article - "Configuring Grafana Mimir high-availability deduplication"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distributor also exposes some metrics around this...
I will mention the metrics info in this section.

on another topic - what do you think about moving this section to the other article - "Configuring Grafana Mimir high-availability deduplication"?

I am fine to move the section to the "Configuring Grafana Mimir high-availability deduplication". But maybe we want to hear first what @krajorama or @osg-grafana think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mimir does not have self monitoring like GEM so that won't make sense there in itself.
And the Mimir documentation tries to be deployment agnostic so talking about port forward seems out of scope.
Given that I'd keep these here.

@@ -0,0 +1,138 @@
---
aliases:
- /docs/mimir/latest/operators-guide/configuring/setting-helm-ha-deduplication-consul/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this alias and title to use an imperative: configure-helm-ha-deduplication-consul.

deduplication globally.

```yaml
# other configurations above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean, "Put other configurations before this one." or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the section above this comment is all configurations added when installing mimir-distributed helm chart in the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway to make it non confusing, I just removed the whole comment.

host: <consul-endpoint> # example: http://consul.consul.svc.cluster.local:8500
```

Upgrade the Mimir's helm release using the above configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Upgrade the Mimir's helm release using the above configuration.
1. Upgrade the Mimirs helm chart version by using the following configuration.

This looks inaccurate, so I changed it to "following".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "following" code block is not a configuration but is a command. Hence I changed this to:

2. Upgrade the Mimir's helm release using the following command:

ha_replica_label: __replica__
```

Upgrade the Mimir's helm release using the above configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Upgrade the Mimir’s helm chart version by using the following configuration:

@osg-grafana
Copy link
Contributor

The image docs/sources/operators-guide/configure/setting-helm-ha-deduplication-consul/verify-deduplication.png is unreadable. Do you have one that has higher readability?

@pracucci
Copy link
Collaborator

pracucci commented Oct 7, 2022

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines 26 to 30
Configure Prometheus or Grafana Agent HA setup by setting label called `cluster` and `__replica__`. These two labels
name are default labels for HA setup in Grafana Mimir. If you want to change the labels, make sure to change it in Mimir
too to make Grafana Mimir and Prometheus/Grafana Agent config matches with each other, otherwise HA deduplication
wouldn't work. `cluster` label value must be same across replica belong to the same cluster. `__replica__` must be
unique across different replica in the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Configure Prometheus or Grafana Agent HA setup by setting label called `cluster` and `__replica__`. These two labels
name are default labels for HA setup in Grafana Mimir. If you want to change the labels, make sure to change it in Mimir
too to make Grafana Mimir and Prometheus/Grafana Agent config matches with each other, otherwise HA deduplication
wouldn't work. `cluster` label value must be same across replica belong to the same cluster. `__replica__` must be
unique across different replica in the cluster.
Configure the Prometheus or Grafana Agent HA setup by setting the labels named `cluster` and `__replica__`,
which are the default labels for a HA setup in Grafana Mimir. If you want to change a label,
make sure to change it in Mimir as well. This ensures that the configurations of Grafana Mimir, Prometheus, and Grafana Agent all match each other. Otherwise, HA deduplication will not work.
* The value of the `cluster` label must be same across replica that belong to the same cluster.
* The value of the `__replica__` label must be unique across different replica within the same cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please vet this for technical accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For technical accuracy, the second sentence should be something like

If you want to change the HA labels, make sure to change them in Mimir as well.

instead of

If you want to change a label, make sure to change it in Mimir as well.

because we are talking about specific label pairs with the default name cluster and __replica__.

Please review it again.

host: <consul-endpoint> # example: http://consul.consul.svc.cluster.local:8500
```

2. Upgrade the Mimir's helm release using the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Upgrade the Mimir's helm release using the following command:
2. Upgrade the Mimir's Helm chart release using the following command:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Helm or the Helm chart? Please vet my change for technical accuracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are upgrading mimir-distributed helm chart release. Since we are upgrading the mimir-distributed chart using the commands that follow that line, a new release will be installed.

See this reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are upgrading mimir-distributed helm chart release. Since we are upgrading the mimir-distributed chart using the commands that follow that line, a new release will be installed.

See this reference

lamida and others added 8 commits October 18, 2022 16:07
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
…lication-consul/index.md

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
…lication-consul/index.md

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
…lication-consul/index.md

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
…lication-consul/index.md

Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
@krajorama
Copy link
Contributor

This is looking good to me now overall. I wonder where to add some information about using sharding in prometheus operator that I discovered during a support case: https://github.com/grafana/support-escalations/issues/4030#issuecomment-1262289642
I'm fine to add it in a separate PR though after we merge this.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With two nitpicks

@lamida
Copy link
Contributor Author

lamida commented Oct 19, 2022

This is looking good to me now overall. I wonder where to add some information about using sharding in prometheus operator that I discovered during a support case: https://github.com/grafana/support-escalations/issues/4030#issuecomment-1262289642
I'm fine to add it in a separate PR though after we merge this.

Yes most likely it is better to be covered in separate PR. I am still more than happy to work on that.

CHANGELOG.md Outdated
@@ -138,6 +138,7 @@
### Documentation

* [ENHANCEMENT] Added documentation on how to configure storage retention. #2970
* [ENHANCEMENT] Documented how to configure HA deduplication using Consul in a Mimir Helm deployment. #2972
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase to current main and move this line to around line 15 in the main/unreleased section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed to do it immediately yesterday. Updated now.

@krajorama krajorama merged commit f51c880 into main Oct 20, 2022
@krajorama krajorama deleted the lamida/helm-doc-ha-consul branch October 20, 2022 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/notified-changelog-cut type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs: Helm: how to set up HA dedup with external Consul
5 participants