Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add prometheusRule #1703

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

micborens
Copy link

  • Adding prometheusRule manifest creation with flexibility for the user to enrich alerts
  • Respecting best practice following alert naming convention.
  • Alert enrichment logic is based on what is done in official kube prometheus stack

Default alerts are based on what I found in this PR

I'm ok to maintain this part of the helm if needed.

@micborens micborens force-pushed the feat/add.prometheusRule branch 2 times, most recently from 7fd5784 to d1cc9dd Compare February 8, 2024 21:44
@Vad1mo
Copy link
Member

Vad1mo commented Feb 8, 2024

@micborens this is probably related to the PR #1635
I would suggest to sync with @Allex1 so we can get in one version.

@micborens
Copy link
Author

@Allex1 I appreciate your first pr but it's not flexible following our usage in my company and any case for the community. That's why I opened this PR.
If you'd like to take a look at my change, you will understand and I covered your change except Grafana dashboard which is from my point of view a specific case.

@Vad1mo
Copy link
Member

Vad1mo commented Feb 9, 2024

This generally looks good, I need to test it myself and give it a try, but I support that approach!

@Allex1
Copy link

Allex1 commented Feb 9, 2024

@micborens tbh the extra labels and annotations for each alert group seem unnecessary but it's not my call. If you add Grafana support to this pr I'm willing to close mine in favor of this .

@micborens micborens force-pushed the feat/add.prometheusRule branch 3 times, most recently from e6ee715 to fa75afb Compare February 12, 2024 08:19
@micborens
Copy link
Author

micborens commented Feb 12, 2024

Hello @Allex1 I've included your Grafana dashboard file in my PR.
As you did in your, it's disabled by default to not impact users who don't need this cm.

tbh the extra labels and annotations for each alert group seems unnecessary but it's not my call

Having the ability to enrich or not a resource on Kubernetes is needed when the chart is opensource. And I need it for my usage :)

@micborens
Copy link
Author

Hello @Vad1mo do you think we can merge this PR?

@Vad1mo
Copy link
Member

Vad1mo commented Feb 21, 2024

IMO, it's good, but you need to address the open points.
Imagine, it should be clear and understandable for 1k .- 10kk of developers by looking at the values.yaml file.

@MinerYang
Copy link
Collaborator

MinerYang commented Feb 22, 2024

Hi @micborens @Allex1 ,

We are aware of many requirements of PrometheusRule for harbor and by having this let it more convenient to allow users to visualizing metrics.
However, what we concern is:

  • Harbor helm is not an all-in-one tool kit and we preferred not including CRs other than Kubernetes resources.
  • Since it contains CR outside such as PrometheusRule, ServiceMonitor, long-term maintenance would take into our considerations given to limited resources.
  • Having all of these rules configurable would somehow too much for values.yaml , however it do necessary for various usage.

So, we are thinking if it's possible to make it as a plugin which should have a standalone repository to co-work on demand, and possibly need a process to move existing metrics-svcmon part along with it.

We will be appreciate if you are willing to continuing contributing this part for our community!

Best,
Miner

@micborens
Copy link
Author

micborens commented Feb 26, 2024

Hello

@Vad1mo I understood your point to have a value as clear as posssible. That's why I've put some comments in the files to explain all parts.
To give you another point of view, people who need this feature can just enable it without touching the rest. Of course, I included a lot :D of variable to let community user to custom their alert and deployment.
I think all people that understand helm and/or prometheus operator cr can understand easily this configuration.
btw, different open source projet are using this way to use their helm without default alerts, grafana component (mimir, loki, etc ...)

@MinerYang prometheus operator looks become a standard on kubernetes. however it's secure with check condition (presence of crds).

So, we are thinking if it's possible to make it as a plugin which should have a standalone repository to co-work on demand, and possibly need a process to move existing metrics-svcmon part along with it.

You are probably referring to subchart or libraries. I don't think it is usefull here except you want to reuse it in another chart.

I reiterate my interest in the monitoring part and I am ready to contribute to maintaining the feature.

Copy link

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

@Allex1
Copy link

Allex1 commented Jun 27, 2024

@micborens @Vad1mo This is becoming ridiculous. There are 2 prs from 2 different ppl trying to help improve your helm chart since Nov 2 2023 which have not been merged or at least commented on since Feb 26 although other ppl expressed their support. This attitude is making us very worried about using the product at all.

@micborens
Copy link
Author

@micborens @Vad1mo This is becoming ridiculous. There are 2 prs from 2 different ppl trying to help improve your helm chart since Nov 2 2023 which have not been merged or at least commented on since Feb 26 although other ppl expressed their support. This attitude is making us very worried about using the product at all.

Hello @Allex1 , since my previous comment I didn't receive any response even I explained why I've committed the feature as is. It's inspired from official community chart (Grafana and/or Prometheus), so I think it respects the majority of guidelines.
The goal is to let people the flexibility to configure their resources without possibility to break the component itself.
Yes the config appears to be long, but at least it covers all configurations allowed by prometheus operator.

| `metrics.rules.additionalAggregationLabels` | additional labels when using expression aggregation | `{}` |
| `metrics.rules.additionalLabels` | additional labels for PrometheusRule resource | `{}` |
| `metrics.rules.additionalRuleLabels` | Additional labels for specific PrometheusRule alert | `{}` |
| `metrics.rules.additionalRuleGroupLabels.HarborCoreDown` | Additional labels for specific PrometheusRule alert groups HarborCoreDown | `{}` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate and make it clear in the README comments, what it means down?
How is it determined that it is down?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Down means the component is KO
Following the metrics, I've set a default alert in the file templates/metrics/_alerts.yaml.tpl

- alert: HarborCoreDown
    expr: |-
      harbor_up{exported_component="core"} == 0
    for: 5m
    labels:
      severity: critical
...

| `metrics.serviceMonitor.additionalLabels` | additional labels to upsert to the manifest | `""` |
| `metrics.serviceMonitor.interval` | scrape period for harbor metrics | `""` |
| `metrics.serviceMonitor.metricRelabelings` | metrics relabel to add/mod/del before ingestion | `[]` |
| `metrics.serviceMonitor.relabelings` | relabels to add/mod/del to sample before scrape | `[]` |
| `metrics.rules.enabled` | create prometheus prometheusRule. Requires prometheus CRD's | `false` |
| `metrics.rules.disabled` | specify which individual alerts should be disabled. | `{}` |
| `metrics.rules.alerting` | instead of turning off each alert one by one, set the .rules.alerting value to false instead | `true` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default should be false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I understand it. I think, It is not clear what the options actually mean. Can you improve the description.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will do a commit to explain a bit more and with example maybe.

| `metrics.rules.additionalRuleGroupAnnotations.HarborLatency99` | Additional annotations for specific PrometheusRule alert groups HarborLatency99 | `{}` |
| `metrics.rules.additionalRuleGroupAnnotations.HarborRateErrors` | Additional annotations for specific PrometheusRule alert groups HarborRateErrors | `{}` |
| `metrics.rules.additionalRuleGroupAnnotations.HarborQuotaProjectLimit` | Additional annotations for specific PrometheusRule alert groups HarborQuotaProjectLimit | `{}` |
| `metrics.rules.additionalGroups` | additional groups to add to the rules file | `[]` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the additional groups, how can I set them? From the description, it is not obvious what this option does.

Copy link
Author

@micborens micborens Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's called in the file templates/metrics/metrics-rules.yaml

spec:
  groups:
  {{- include "harbor.ruleGroupToYaml" (include "harbor.rules" . | fromYaml).groups | indent 2 }}
  {{- include "harbor.ruleGroupToYaml" .Values.metrics.rules.additionalGroups | indent 2 }}

PrometheusRule are interpreted following the group in Prometheus.
I've defined include "harbor.rules" in templates/metrics/_alerts.yaml.tpl which is the group by default.
In addition, a developer using the chart can add more alerts or record in a separate group named like

metrics:
  rules:
    additionalGroups:
    - name: my_custom_alerts_group
       rules:
       - alerts: ImATest
          ....

btw, in the values.yaml, I've given an example on how to use it:

    additionalGroups: []
    # - name: additional-harbor-rules
    #   rules:
    #     - record: my:record:name
    #       expr: my_expression
    #     - alert: MyAlertName
    #       expr: my:record:name > 0
    #       annotations:
    #       ...
    #       labels:
    #       ...
    #       for: 1m

@micborens micborens force-pushed the feat/add.prometheusRule branch 2 times, most recently from 1c3325e to 9ff2824 Compare July 18, 2024 16:21
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Signed-off-by: Michael Borens <michael.borens@contentsquare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants