Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kubernetes Recommended Labels #38

Open
rlex opened this issue Oct 21, 2021 · 8 comments
Open

Use Kubernetes Recommended Labels #38

rlex opened this issue Oct 21, 2021 · 8 comments

Comments

@rlex
Copy link

rlex commented Oct 21, 2021

Hello, and thanks for great tool!
I think it needs two things:

  1. Deletion of generated rules after pyrra object removal. Right now they are being left as is. What about adding labels like
managed-by=pyrra
pyrra-slo-name=prometheus-http-error

And some controller to ensure rules are in place / removed?

  1. Adjust name of alert, adding prefix / suffix to alertname. Atm all alerts are "ErrorBudgetBurn"
@metalmatze
Copy link
Member

Hey, thanks for trying and opening this issue. I'm sure there a plenty of more things (I have such a long list in my head too), so feel free to continue opening things.

  1. I created Delete generated PrometheusRule and rule files #39 for the deletion issue

Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?

  1. We can totally make the alertnames configurable for sure. Keep in mind, that the alertname is just another label for the alert itself. So usually the alerts are identified by their label set too, like:
{alertname="ErrorBudgetBurn", job="kube-dns", long="30m", short="3m", slo="coredns-response-errors"}{alertname="ErrorBudgetBurn", job="kube-dns", long="3h", short="15m", slo="coredns-response-errors"}

Therefore I'm not super concerned about the alertname itself, but we can still happily make it configurable.

I'll rename this issue to be about the second part of the original comment, as I opened a separate issue for 1.

@rlex
Copy link
Author

rlex commented Oct 22, 2021

Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?

Yes, precisely

@metalmatze metalmatze changed the title Improve generated prometheus-operator CRD objects? Use Kubernetes Recommended Labels Nov 12, 2021
@hsolberg
Copy link
Contributor

Hi! We have a similar use-case. When more than one team use the same Pyrra and Prometheus-operator it's difficult to tell the alert-rules apart and Pyrra can get a little cluttered. Having some customization to include existing labels from Kubernetes and adding new labels as well would be great!

type Objective struct {
Labels map[string]string `json:"labels"`
Description string `json:"description"`
Target float64 `json:"target"`
Window int64 `json:"window"`
Config string `json:"config"`
Indicator *Indicator `json:"indicator,omitempty"`
}
// NewObjective instantiates a new Objective object
// This constructor will assign default values to properties that have it defined,
// and makes sure properties required by API are set, but the set of arguments
// will change when the set of required properties is changed
func NewObjective(labels map[string]string, description string, target float64, window int64, config string) *Objective {
this := Objective{}
this.Labels = labels
this.Description = description
this.Target = target
this.Window = window
this.Config = config
return &this
}

From rules.go#L103-L104

        ruleLabels := map[string]string{}
        ruleLabels["slo"] = sloName

Adding a new entry here for custom labels like this for instance

ruleLabels["team"] = your-team-name

Seems the recording rules are missing namespace as well, since namespace is added automatically to the Pyrra config, but not to the recording rules and therefor not visible in the alert rules.

// MultiBurnrateAlert struct for MultiBurnrateAlert
type MultiBurnrateAlert struct {
Severity string `json:"severity"`
For int64 `json:"for"`
Factor float64 `json:"factor"`
Short Burnrate `json:"short"`
Long Burnrate `json:"long"`
State string `json:"state"`
}

For Pyrra (frontend), maybe add some (optional) top-level grouping as well. That way you could add more SLOs for the same team and group them under a team page.

@metalmatze
Copy link
Member

That sounds very reasonable @hsolberg 👍
Can you give a even more concrete example? I'd use that as the base line for implementing future unit tests to make sure things are properly propagated through all components.

@hsolberg
Copy link
Contributor

hsolberg commented Mar 3, 2022

Thanks for the fast reply! Maybe I should create a separate issue for the frontend part since it's a separate thing? If you prefer to have "user-stories" tickets or specific to what part needs to be changed (frontend or logic for generating alerts for instance).

Frontend
Regarding concrete examples. After conferring with a colleague (developer) to cover some scenarios, something like this for the presentation in Pyrra (frontend) perhaps?

|__<group-1>
|  |__<app-1>
|  |  |__ <metric-1>
|  |  |__ <metric-2>
|  |  .
|  |  .
|  |__<app-2>
|  |  |__<metric-1>
|  |  .
|
|__<group-2>
|  |__<app-1>
|  |  |__<metric-1>

Would look something like this example

|__team-api
|  |__api-app-auth
|  |  |__auth-error-rate
|  |  |__latency
|  |  |__system-error-rate
|  |__api-app-integration
|  |  |__system-error-rate
|
|__team-frontend
|  |__frontend-app
|  |  |__frontend-error-rate
|  |  |__frontend-latency

Alerts
As for the alerts, I already see the label slo being added based on what we put in the name-field in metadata (see example below). So that part should be easy to differentiate when the new Prometheus filter is added in the next version (hopefully). And the possibility to do use pre-filtered links from Pyrra in the near future. That only leaves the need for including namespace in the alert rule as well as possibility of adding more (custom).

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: your-slo-name-here
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: '99.0'
  window: 7d
  indicator:
    ratio:
      errors:
        metric: frontend_request_counter_total{status=~"5..",app="app-name"}
      total:
        metric: frontend_request_counter_total{app="app-name"}

Looking at the alert when firing with the rules above (this is the same as the last entry in the table Pyrra shows):

ALERTS{alertname="ErrorBudgetBurn", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here"}

When comparing this to what Pyrra shows in the table, maybe this should be consistent with the labels added to the alert? Also, namespace is missing, but that's only visible in as a label under the objectives name and at the top of the objective you navigate to.

image

Since the alerts are generated it's a bit difficult to name them without being generic as they are now. One solution could be to add the exhaustion to the alertname as well as adding the labels. That way you get a quick overview without having to expand all the alerts in prometheus and use the labels to filter more. Adding namespace would also help with filtering, so maybe something like this?

ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="your-team"}

The last label would be a custom one just to be able to have the target responder for the alert. I didn't want to change too much of how it works now as I'm not a 100% sure of what it ideally would look like. So I'm trying to stick to the bare minimum in the examples and see if anyone else has other views and suggestions. I'll need to think it over some more, but this is the gist of it. 😅

@metalmatze
Copy link
Member

Perfect. Thank you for that very in-depth comment. Indeed, we should probably track that in a separate issue to this one.
All that this really comes down to is propagating the labels all the way from the SLO CRD through Pyrra until they show up in the alerts.

Let's imagine this label is added in here:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: your-slo-name-here
  labels:
    prometheus: k8s
    role: alert-rules
+   team: pyrra
spec:
  target: '99.0'
  window: 7d
  indicator:
    ratio:
      errors:
        metric: frontend_request_counter_total{status=~"5..",app="app-name"}
      total:
        metric: frontend_request_counter_total{app="app-name"}

Then it should show up in the list page of Pyrra too as team=pyrra.

In the end the same label needs to be part of the alert

ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="pyrra"}

I'll open a separate issue and we'll look into it. 👍
At last you need to configure your alertmanager to route the alerts correctly based on the team label, as per our example.

@hsolberg
Copy link
Contributor

hsolberg commented Oct 5, 2022

Hi! A followup question, not sure if it should be asked here or in the closed ticket.
The namespace label is used in Pyrra UI etc. but it doesn't seem to be included in the generated alert-rules. Is this something that could be added by default? Or maybe it's already part of the recent changes to master that's not added to a specific release yet? 😄

@metalmatze
Copy link
Member

Good point. I'm not sure why it never came up.
We should add it indeed.
Would you mind opening another Issue for this one?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants