Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fleet grafana dashboards #4091

Merged
merged 3 commits into from
Jun 21, 2024

Conversation

p-se
Copy link
Contributor

@p-se p-se commented Jun 14, 2024

Issue:

rancher/fleet#2460

Problem

For the existing Fleet metrics (and Prometheus configuration to scrape those), Grafana dashboards have recently been added. They were pretty raw and barely usable. This PR updates them (see screenshots at the bottom).

Solution

Update Grafana dashboards for visualizing Fleet metrics.

Testing

  • On a cluster with Rancher and a fleet version >= v0.10.0-rc13, install the rancher-monitoring chart that includes the changes of this PR.
  • Open Grafana through the Rancher UI (section "monitoring") and look for 5 dashboards whose name starts with "Fleet". You will not find those dashboards embedded in the Rancher UI. image
  • You will need to create a GitRepo to start seeing GitRepo metrics.
  • Check if all dashboards are showing "data". Note that, depending on the age of the monitoring installation, the time range may need to be decreased to start seeing visualizations in those dashboards.
    image

Engineering Testing

Manual Testing

See Testing.

Automated Testing

Testing up to fetching and testing metrics from fleet-controller in https://github.com/rancher/fleet/tree/main/e2e/metrics. It does not cover the use of the ServiceMonitor to configure the Prometheus instance to scrape those metrics nor anything with Grafana.

QA Testing Considerations

No additional testing required, tests as described in the Testing section are deemed sufficient.

Regressions Considerations

I cannot think of any cases in which updating (not yet shipped) Grafana dashboards would conflict with existing dashboards.

Backporting considerations

Does not need to be backported.

Screenshots

2024-06-14 11 28 47 172 18 0 1 sslip io 0d0a1eb50ddc
2024-06-14 11 28 17 172 18 0 1 sslip io 320dc9577742
2024-06-14 11 28 05 172 18 0 1 sslip io 5272fd98de58
2024-06-14 11 27 49 172 18 0 1 sslip io 12b637bfa69e
2024-06-14 11 27 19 172 18 0 1 sslip io 453ab4f95be8

@p-se p-se requested a review from a team as a code owner June 14, 2024 10:38
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

@thehejik thehejik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked all 5 modified Grafana dashboards for Fleet and they are showing reasonable data. Also Prometheus Targets and Prometheus Graph are working as expected for fleet resources.

I was playing a bit with creating Fleet ClusterGroups and assigning various clusters there together with using filters in Grafana, all good.

image

Copy link
Member

@mallardduck mallardduck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p-se - looks pretty good to me overall. The PR is out of date with the HEAD, so other than updating for that before merge LGTM. (Edit - weird after I submitted the review it stopped showing the outdated branch thing)

@p-se
Copy link
Contributor Author

p-se commented Jun 18, 2024

@thehejik @mallardduck
Thank you very much for testing and the review!

@p-se p-se requested a review from a team June 20, 2024 12:59
@recena recena requested review from a team and removed request for a team June 20, 2024 13:01
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@thehejik thehejik merged commit f82ffa9 into rancher:dev-v2.9 Jun 21, 2024
5 checks passed
krunalhinguu pushed a commit to krunalhinguu/charts that referenced this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants