docs: add proposal for grafana alerting #1349

theSuess · 2023-12-13T09:52:13Z

This picks up #911 and #1144. The proposal contains three different options for realizing alerting support in the operator and should serve as a base for discussion regarding this topic.

theSuess · 2023-12-13T09:55:50Z

Personally, I'm in favor of option three. This would also allow us to implement this in multiple passes by implementing alert rules first, as the routing is label based anyway.

NissesSenap · 2023-12-18T09:03:04Z

Thanks for taking another shot at this @theSuess , here are my initial thoughts.

Trying to figure out what I feel about the suggestion, I will probably change my mind 3 times ;)
My main concern about reusing GrafanaFolder is that none will find Alerts. It's an awful name, and unless you read the docs of the grafana-opreator or know how the Grafana HTTP API works, you have no idea why it is like this.

I lean towards following how it's done in terraform and have a specific resource for Rule Groups, so doing option 1 but instead of calling itGrafanaAlertFolder I would call it GrafanaRuleGroup or GrafanaAlertGroup.
But then we of course have to ask our self, should we auto create a folder if it don't exist, just as we do for dashboards? Or should we instead start forcing people to use GrafanaFolder?

One of my main concern with this, due to the number of resources that point to each other is how we should connect them.
In a AlertRule should we point to a Datasource UID or a Datasource name? If we use a name we will have to lookup all datasource during every reconciliation and make sure the UID is still the same.
One issue with Datasource is that you can't specify UID in the HTTP request which makes using UID a bit painful.
This is not the same issue in Dashboards since you can create a variable that find Prometheus instances.

This is an easy problem to solve in terraform since you can just reference the Datasource resource in your RuleGroup resource, this is sadly not an easy thing to do in Kubernetes.
It can of course be solved using label selectors in Kubernetes, but my feeling is that it will be a bit painful for end users since it will be a few selectors.

The issue is the same in contact point and notification policy, but in contact point we can at least hard-code UID if we want to. For example, using the Kubernetes UID which we have as a backup solution for dashboards if it's not set.

How do you see GrafanaMuteTimings and GrafanaMessageTemplate? For me, I feel they are rather low priority, but we probably want to implement them in the future so we should at least consider how they should connect with the other resources.

theSuess · 2023-12-18T09:16:55Z

Calling the resource GrafanaRuleGroup Sounds good to me. Creating a folder if it does not exist can certainly be done, not sure if it should be the default though.

Regarding references to the datasource, UIDs can be set during creation - might require some changes in the operator though.

I'm trying to get every API to support UIDs at creation on the grafana side 🤞

Mute timings and Notification templates are identified by name, we should be good here.

NissesSenap · 2023-12-18T09:40:02Z

That is great, I have been looking at the http API docs allot https://grafana.com/docs/grafana/latest/developers/http_api/data_source/, but thinking of it now, those are mostly examples and not the full schema 🤦

Getting API to support UID at creation would be awesome, Igor got an old issue around that for teams if I remember correctly.

How do you see the linking happening between the resources? Should we use UID? It's not the most user-friendly way of solving it. But it sure makes life easy for us ;)

Around performing changes to our existing APIs I'm all open for it. We can always release a v1beta2 CRD version if needed. But if we feel that we need to do that, it would be nice to explain that part in this document as well.

Also, to make this design document easier to read in the future, it would be nice with full examples on how we want the CRDs to look like. I think it will also make it clearer when we do the implementation so we have something to compare it to. Of course, we might change the design a bit during the implementation, but that we at least have a discussion point before doing so.

theSuess · 2023-12-18T11:49:59Z

I think UID linking is the way to go here. We don't have many places where we actually need to link things.

I think this linking makes sense:

flowchart LR
    AlertGroup-- uid -->Folder
    AlertGroup-- label selector --> GrafanaInstance
    AlertGroup-- uid --> DataSource
    Folder-- label selector --> GrafanaInstance

    NotificationPolicy-- label selector --> GrafanaInstance
    NotificationPolicy-- uid --> ContactPoint
    ContactPoint-- label selector --> GrafanaInstance

Everything needs a label selector to decide to which grafana instance this applies to and cross-references are made using UIDs

pb82 · 2023-12-19T10:02:47Z

@theSuess is the idea then to use predictable UIDs (chosen by the user) to make it easier to link the resources together?

theSuess · 2023-12-19T10:09:49Z

Yes, that's the way I'd go about it

theSuess · 2023-12-19T12:52:17Z

We'll also need to gather infos on how alert rule groups interact with subfolders

theSuess · 2023-12-20T16:22:07Z

Updated the proposal with more examples and some more findings

NissesSenap

Added an implementation detail in a comment. Don't know if it needs to be added in the design document, though. But good to think about.

When implementing this feature it would be nice to use the new go-sdk for the grafana API as stataed in #1357.

All in all I think this looks great. Nice job @theSuess

docs/docs/proposals/002-alerting-support.md

NissesSenap · 2023-12-21T09:30:55Z

Would love some feedback on this PR from the community.
So I will ping some old requestors
@apryor6 @R-Studio @siegenthalerroger @rammelmueller

docs/docs/proposals/002-alerting-support.md

This picks up #911 and #1144. The proposal contains three different options for realizing alerting support in the operator and should serve as a base for discussion regarding this topic.

HVBE mentioned this pull request Dec 15, 2023

bump/set default image to grafana 10 #1347

Open

theSuess force-pushed the docs/alerting-proposal branch from 26a9688 to 22877e9 Compare December 20, 2023 16:13

NissesSenap approved these changes Dec 21, 2023

View reviewed changes

docs/docs/proposals/002-alerting-support.md Show resolved Hide resolved

HVBE reviewed Dec 22, 2023

View reviewed changes

docs/docs/proposals/002-alerting-support.md Show resolved Hide resolved

theSuess marked this pull request as ready for review January 9, 2024 12:47

docs: add proposal for grafana alerting

89b4dbb

This picks up #911 and #1144. The proposal contains three different options for realizing alerting support in the operator and should serve as a base for discussion regarding this topic.

theSuess force-pushed the docs/alerting-proposal branch from d928e11 to f464cf1 Compare January 9, 2024 12:48

NissesSenap approved these changes Jan 9, 2024

View reviewed changes

pb82 enabled auto-merge January 9, 2024 12:48

docs: add decision to alerting proposal

c04e2e5

theSuess force-pushed the docs/alerting-proposal branch from f464cf1 to c04e2e5 Compare January 9, 2024 12:49

HVBE approved these changes Jan 9, 2024

View reviewed changes

pb82 merged commit 7df688e into master Jan 9, 2024
10 checks passed

pb82 deleted the docs/alerting-proposal branch January 9, 2024 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add proposal for grafana alerting #1349

docs: add proposal for grafana alerting #1349

theSuess commented Dec 13, 2023

theSuess commented Dec 13, 2023

NissesSenap commented Dec 18, 2023

theSuess commented Dec 18, 2023

NissesSenap commented Dec 18, 2023

theSuess commented Dec 18, 2023 •

edited

pb82 commented Dec 19, 2023

theSuess commented Dec 19, 2023

theSuess commented Dec 19, 2023

theSuess commented Dec 20, 2023

NissesSenap left a comment

NissesSenap commented Dec 21, 2023

docs: add proposal for grafana alerting #1349

docs: add proposal for grafana alerting #1349

Conversation

theSuess commented Dec 13, 2023

theSuess commented Dec 13, 2023

NissesSenap commented Dec 18, 2023

theSuess commented Dec 18, 2023

NissesSenap commented Dec 18, 2023

theSuess commented Dec 18, 2023 • edited

pb82 commented Dec 19, 2023

theSuess commented Dec 19, 2023

theSuess commented Dec 19, 2023

theSuess commented Dec 20, 2023

NissesSenap left a comment

Choose a reason for hiding this comment

NissesSenap commented Dec 21, 2023

theSuess commented Dec 18, 2023 •

edited