feat: add prometheusRule #1703

micborens · 2024-02-08T21:05:02Z

Adding prometheusRule manifest creation with flexibility for the user to enrich alerts
Respecting best practice following alert naming convention.
Alert enrichment logic is based on what is done in official kube prometheus stack

Default alerts are based on what I found in this PR

I'm ok to maintain this part of the helm if needed.

Vad1mo · 2024-02-08T23:53:36Z

@micborens this is probably related to the PR #1635
I would suggest to sync with @Allex1 so we can get in one version.

micborens · 2024-02-09T08:33:51Z

@Allex1 I appreciate your first pr but it's not flexible following our usage in my company and any case for the community. That's why I opened this PR.
If you'd like to take a look at my change, you will understand and I covered your change except Grafana dashboard which is from my point of view a specific case.

Vad1mo · 2024-02-09T08:52:13Z

This generally looks good, I need to test it myself and give it a try, but I support that approach!

Allex1 · 2024-02-09T19:39:40Z

@micborens tbh the extra labels and annotations for each alert group seem unnecessary but it's not my call. If you add Grafana support to this pr I'm willing to close mine in favor of this .

micborens · 2024-02-12T08:21:33Z

Hello @Allex1 I've included your Grafana dashboard file in my PR.
As you did in your, it's disabled by default to not impact users who don't need this cm.

tbh the extra labels and annotations for each alert group seems unnecessary but it's not my call

Having the ability to enrich or not a resource on Kubernetes is needed when the chart is opensource. And I need it for my usage :)

templates/metrics/grafana-cm.yaml

micborens · 2024-02-20T14:09:55Z

Hello @Vad1mo do you think we can merge this PR?

Vad1mo · 2024-02-21T17:41:02Z

IMO, it's good, but you need to address the open points.
Imagine, it should be clear and understandable for 1k .- 10kk of developers by looking at the values.yaml file.

MinerYang · 2024-02-22T09:58:07Z

Hi @micborens @Allex1 ,

We are aware of many requirements of PrometheusRule for harbor and by having this let it more convenient to allow users to visualizing metrics.
However, what we concern is:

Harbor helm is not an all-in-one tool kit and we preferred not including CRs other than Kubernetes resources.
Since it contains CR outside such as PrometheusRule, ServiceMonitor, long-term maintenance would take into our considerations given to limited resources.
Having all of these rules configurable would somehow too much for values.yaml , however it do necessary for various usage.

So, we are thinking if it's possible to make it as a plugin which should have a standalone repository to co-work on demand, and possibly need a process to move existing metrics-svcmon part along with it.

We will be appreciate if you are willing to continuing contributing this part for our community!

Best,
Miner

micborens · 2024-02-26T20:29:41Z

Hello

@Vad1mo I understood your point to have a value as clear as posssible. That's why I've put some comments in the files to explain all parts.
To give you another point of view, people who need this feature can just enable it without touching the rest. Of course, I included a lot :D of variable to let community user to custom their alert and deployment.
I think all people that understand helm and/or prometheus operator cr can understand easily this configuration.
btw, different open source projet are using this way to use their helm without default alerts, grafana component (mimir, loki, etc ...)

@MinerYang prometheus operator looks become a standard on kubernetes. however it's secure with check condition (presence of crds).

So, we are thinking if it's possible to make it as a plugin which should have a standalone repository to co-work on demand, and possibly need a process to move existing metrics-svcmon part along with it.

You are probably referring to subchart or libraries. I don't think it is usefull here except you want to reuse it in another chart.

I reiterate my interest in the monitoring part and I am ready to contribute to maintaining the feature.

github-actions · 2024-04-28T09:04:30Z

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

Allex1 · 2024-06-27T08:12:35Z

@micborens @Vad1mo This is becoming ridiculous. There are 2 prs from 2 different ppl trying to help improve your helm chart since Nov 2 2023 which have not been merged or at least commented on since Feb 26 although other ppl expressed their support. This attitude is making us very worried about using the product at all.

micborens · 2024-07-12T13:17:17Z

@micborens @Vad1mo This is becoming ridiculous. There are 2 prs from 2 different ppl trying to help improve your helm chart since Nov 2 2023 which have not been merged or at least commented on since Feb 26 although other ppl expressed their support. This attitude is making us very worried about using the product at all.

Hello @Allex1 , since my previous comment I didn't receive any response even I explained why I've committed the feature as is. It's inspired from official community chart (Grafana and/or Prometheus), so I think it respects the majority of guidelines.
The goal is to let people the flexibility to configure their resources without possibility to break the component itself.
Yes the config appears to be long, but at least it covers all configurations allowed by prometheus operator.

Vad1mo · 2024-02-12T20:31:03Z

README.md

+| `metrics.rules.additionalAggregationLabels`                          | additional labels when using expression aggregation | `{}` |
+| `metrics.rules.additionalLabels` | additional labels for PrometheusRule resource | `{}` |
+| `metrics.rules.additionalRuleLabels` | Additional labels for specific PrometheusRule alert | `{}` |
+| `metrics.rules.additionalRuleGroupLabels.HarborCoreDown` | Additional labels for specific PrometheusRule alert groups HarborCoreDown | `{}` |


can you elaborate and make it clear in the README comments, what it means down?
How is it determined that it is down?

Down means the component is KO
Following the metrics, I've set a default alert in the file templates/metrics/_alerts.yaml.tpl

- alert: HarborCoreDown expr: |- harbor_up{exported_component="core"} == 0 for: 5m labels: severity: critical ...

Vad1mo · 2024-02-12T20:32:00Z

README.md

 | `metrics.serviceMonitor.additionalLabels`                            | additional labels to upsert to the manifest                                                                                                                                                                                                                                                                                                                                                                                                                      | `""`                                  |
 | `metrics.serviceMonitor.interval`                                    | scrape period for harbor metrics                                                                                                                                                                                                                                                                                                                                                                                                                                 | `""`                                  |
 | `metrics.serviceMonitor.metricRelabelings`                           | metrics relabel to add/mod/del before ingestion                                                                                                                                                                                                                                                                                                                                                                                                                  | `[]`                                  |
 | `metrics.serviceMonitor.relabelings`                                 | relabels to add/mod/del to sample before scrape                                                                                                                                                                                                                                                                                                                                                                                                                  | `[]`                                  |
+| `metrics.rules.enabled`                                              | create prometheus prometheusRule. Requires prometheus CRD's                             | `false`                               |
+| `metrics.rules.disabled`                                             | specify which individual alerts should be disabled.                                     | `{}`                                  |
+| `metrics.rules.alerting`                                             | instead of turning off each alert one by one, set the .rules.alerting value to false instead                                                                                                                                                                                | `true`                                |


default should be false

now I understand it. I think, It is not clear what the options actually mean. Can you improve the description.

i will do a commit to explain a bit more and with example maybe.

Vad1mo · 2024-02-12T20:35:07Z

README.md

+| `metrics.rules.additionalRuleGroupAnnotations.HarborLatency99` | Additional annotations for specific PrometheusRule alert groups HarborLatency99 | `{}` |
+| `metrics.rules.additionalRuleGroupAnnotations.HarborRateErrors` | Additional annotations for specific PrometheusRule alert groups HarborRateErrors | `{}` |
+| `metrics.rules.additionalRuleGroupAnnotations.HarborQuotaProjectLimit` | Additional annotations for specific PrometheusRule alert groups HarborQuotaProjectLimit | `{}` |
+| `metrics.rules.additionalGroups` | additional groups to add to the rules file | `[]` |


what are the additional groups, how can I set them? From the description, it is not obvious what this option does.

it's called in the file templates/metrics/metrics-rules.yaml

spec: groups: {{- include "harbor.ruleGroupToYaml" (include "harbor.rules" . | fromYaml).groups | indent 2 }} {{- include "harbor.ruleGroupToYaml" .Values.metrics.rules.additionalGroups | indent 2 }}

PrometheusRule are interpreted following the group in Prometheus.
I've defined include "harbor.rules" in templates/metrics/_alerts.yaml.tpl which is the group by default.
In addition, a developer using the chart can add more alerts or record in a separate group named like

metrics: rules: additionalGroups: - name: my_custom_alerts_group rules: - alerts: ImATest ....

btw, in the values.yaml, I've given an example on how to use it:

additionalGroups: [] # - name: additional-harbor-rules # rules: # - record: my:record:name # expr: my_expression # - alert: MyAlertName # expr: my:record:name > 0 # annotations: # ... # labels: # ... # for: 1m

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

micborens force-pushed the feat/add.prometheusRule branch 2 times, most recently from 7fd5784 to d1cc9dd Compare February 8, 2024 21:44

Vad1mo requested review from Vad1mo, wy65701436 and zyyw February 9, 2024 08:52

micborens force-pushed the feat/add.prometheusRule branch 3 times, most recently from e6ee715 to fa75afb Compare February 12, 2024 08:19

Allex1 reviewed Feb 12, 2024

View reviewed changes

templates/metrics/grafana-cm.yaml Show resolved Hide resolved

Allex1 mentioned this pull request Feb 12, 2024

Add Prometheus default alerts and Grafana health dashboard #1635

Closed

micborens force-pushed the feat/add.prometheusRule branch from fa75afb to 87e9f49 Compare February 12, 2024 09:59

github-actions bot added the Stale label Apr 28, 2024

Vad1mo added never-stale and removed Stale labels Apr 29, 2024

Vad1mo approved these changes Jul 12, 2024

View reviewed changes

micborens force-pushed the feat/add.prometheusRule branch 2 times, most recently from 1c3325e to 9ff2824 Compare July 18, 2024 16:21

feat: add prometheusRule

0bfb341

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

micborens added 6 commits July 18, 2024 18:23

chore: add check prometheus operator crds

3aa1d90

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

chore: alert expr with exported_component label filter

24fe17a

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

chore: adding namespace metadata

37b711d

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

chore: adding grafana dashboard

60c50c3

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

fix(readme): fixing conflicts

fff3db4

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

feat(metrics): add examples in values.yaml

2c760a2

Signed-off-by: Michael Borens <michael.borens@contentsquare.com>

micborens force-pushed the feat/add.prometheusRule branch from 9ff2824 to 2c760a2 Compare July 18, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add prometheusRule #1703

feat: add prometheusRule #1703

micborens commented Feb 8, 2024

Vad1mo commented Feb 8, 2024

micborens commented Feb 9, 2024

Vad1mo commented Feb 9, 2024

Allex1 commented Feb 9, 2024

micborens commented Feb 12, 2024 •

edited

Loading

micborens commented Feb 20, 2024

Vad1mo commented Feb 21, 2024

MinerYang commented Feb 22, 2024 •

edited

Loading

micborens commented Feb 26, 2024 •

edited

Loading

github-actions bot commented Apr 28, 2024

Allex1 commented Jun 27, 2024

micborens commented Jul 12, 2024

Vad1mo Feb 12, 2024

micborens Jul 12, 2024

Vad1mo Feb 12, 2024

Vad1mo Feb 12, 2024

micborens Jul 12, 2024

Vad1mo Feb 12, 2024

micborens Jul 12, 2024 •

edited

Loading

feat: add prometheusRule #1703

Are you sure you want to change the base?

feat: add prometheusRule #1703

Conversation

micborens commented Feb 8, 2024

Vad1mo commented Feb 8, 2024

micborens commented Feb 9, 2024

Vad1mo commented Feb 9, 2024

Allex1 commented Feb 9, 2024

micborens commented Feb 12, 2024 • edited Loading

micborens commented Feb 20, 2024

Vad1mo commented Feb 21, 2024

MinerYang commented Feb 22, 2024 • edited Loading

micborens commented Feb 26, 2024 • edited Loading

github-actions bot commented Apr 28, 2024

Allex1 commented Jun 27, 2024

micborens commented Jul 12, 2024

Vad1mo Feb 12, 2024

Choose a reason for hiding this comment

micborens Jul 12, 2024

Choose a reason for hiding this comment

Vad1mo Feb 12, 2024

Choose a reason for hiding this comment

Vad1mo Feb 12, 2024

Choose a reason for hiding this comment

micborens Jul 12, 2024

Choose a reason for hiding this comment

Vad1mo Feb 12, 2024

Choose a reason for hiding this comment

micborens Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

micborens commented Feb 12, 2024 •

edited

Loading

MinerYang commented Feb 22, 2024 •

edited

Loading

micborens commented Feb 26, 2024 •

edited

Loading

micborens Jul 12, 2024 •

edited

Loading