Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: velero metrics stripped #769

Closed
1 task done
felicianmv opened this issue Jul 25, 2022 · 3 comments
Closed
1 task done

Bug: velero metrics stripped #769

felicianmv opened this issue Jul 25, 2022 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@felicianmv
Copy link

Contact Details

felician.moldovan@flex.com

Describe bug

Hi guys,

I'm trying to setup alerting and reporting for our OADP instance ( 1.0.3 ) and Prometheus/Grafana is the logical solution, but from what I see data related to Velero are only 2: velero_backup_total and velero_restore_total.

Checking upstream Velero docs and /metrics URL shows that Velero actually is exposing a lot more data ( vanila_metrics.txt attached ), but Prometheus doesn't pick them up.
vanilla_metrics.txt

On the OADP namespace I found a ServiceMonitor object ( openshift-adp-velero-metrics-sm ) which seems to be filtering data recorded by Prometheus, specifically this section:
metricRelabelings: - action: keep regex: velero_backup|velero_restore sourceLabels: - __name__

I changed the above section into
metricRelabelings: - action: keep regex: velero_backup.*|velero_restore.* sourceLabels: - __name__
and now I have all the data to setup monitoring.

What happened?

Is there a specific reason for that filter in the ServiceMonitor?
Is it fine to use my solution above for production?
Any plans to remove the filter in a future version of OADP?

OADP Version

1.0.3 (Stable)

OpenShift Version

4.8

Velero pod logs

No response

Restic pod logs

No response

Operator pod logs

No response

New issue

  • This issue is new
@felicianmv felicianmv added the kind/bug Categorizes issue or PR as related to a bug. label Jul 25, 2022
@shubham-pampattiwar
Copy link
Member

@felicianmv Thank you for reaching out and raising the concerns. We added a service monitor to filter out the metrics exposed via Velero because most of the metrics exposed did not have bounded cardiniality and we were concerned that this might affect openshift's in-cluster monitoring stack adversely.

@kaovilai kaovilai closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2022
@QuingKhaos
Copy link

QuingKhaos commented Aug 1, 2022

@shubham-pampattiwar Can we please have a exposeAllMetrics: true setting on the operator, so we can override this as long as we are aware of the cardinality consequences? We need to create alerts on production backup schedules.

@kaovilai
Copy link
Member

kaovilai commented Sep 7, 2022

Do not check teh Enable Monitoring checkbox during installation of OADP Operator

Then you can have your own prometheus instance record everything.

mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 5, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 5, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 5, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 5, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 6, 2023
Documentation for the OADP to use User Workload Monitoring and sample
Alerting Rule.

Depends-On: openshift#1081

Fixes:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 6, 2023
Documentation for the OADP to use User Workload Monitoring and sample
Alerting Rule.

Depends-On: openshift#1081

Fixes:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 6, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 11, 2023
Documentation for the OADP to use User Workload Monitoring and sample
Alerting Rule.

Depends-On: openshift#1081

Fixes:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 12, 2023
Documentation for the OADP to use User Workload Monitoring and sample
Alerting Rule.

Depends-On: openshift#1081

Fixes:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

Signed-off-by: Michal Pryc <mpryc@redhat.com>
openshift-merge-robot pushed a commit that referenced this issue Jul 12, 2023
Documentation for the OADP to use User Workload Monitoring and sample
Alerting Rule.

Depends-On: #1081

Fixes:
  #769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 13, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
mpryc added a commit to mpryc/oadp-operator that referenced this issue Jul 18, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
openshift-merge-robot pushed a commit that referenced this issue Jul 18, 2023
…the Velero deployment. (#1081)

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  #769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/oadp-operator that referenced this issue Jul 18, 2023
…the Velero deployment.

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  openshift#769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
openshift-merge-robot pushed a commit that referenced this issue Jul 18, 2023
…the Velero deployment. (#1092)

First change to remove cluster monitoring of OADP that will be replaced by the user workload
monitoring (UWM).

Related Issues:
  #769
  https://issues.redhat.com/browse/OADP-1887
  https://issues.redhat.com/browse/OADP-661

The openshift-adp-velero-metrics-svc is left to easy process of enabling UWM.

Once UWM is enabled, it will require setting up ServiceMonitor and configuring alerts
or dashboards that are crucial for a particular use-case.

Enablement of user workload monitoring with additional documentation is not part
of this PR to easilly re-add the cluster monitoring in the future by reverting this change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
Co-authored-by: Michal Pryc <mpryc@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants