[Bug] Dashboards constantly refreshing in operator #1045

redminion0 · 2023-05-10T19:20:35Z

Describe the bug
A clear and concise description of what the bug is.

When using the v5.0.0-uid for dashboard UID support if multiple dashboards are imported the operator gets stuck in a loop constantly updating all dashboards.

Version
Full semver version of the operator being used e.g. v4.10.0, v5.0.0-rc0
quay.io/weisdd/grafana-operator:v5.0.0-uid (#1027)
To Reproduce
Steps to reproduce the behavior:

load the quay.io/weisdd/grafana-operator:v5.0.0-uid image.
then load in a large number of gzipJSON dashboards (in our case 36)

or

load dashboard one at a time (using the resource) then force the pods to move node

Expected behavior
A clear and concise description of what you expected to happen.
dashboards are not constantly updated

Suspect component/Location where the bug might be occurring
Please provide this if you know where this bug might occur otherwise leave as unknown

Screenshots
If applicable, add screenshots to help explain your problem.

Grafana Operator Version [e.g. v5.0.0]
Environment: K8s

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

redminion0 · 2023-05-10T20:17:52Z

when comparing json the only change grafana notices is the version

NissesSenap · 2023-05-11T03:50:49Z

@weisdd seems like we have some fun uid overwrite issue.

NissesSenap · 2023-05-11T03:52:58Z

@redminion0 thanks for reporting this issue. Could you provide an example of a dashboard that we can use to verify this?

weisdd · 2023-05-11T07:05:51Z

@redminion0 have you applied the CRDs from that PR as well? (There's a new status field to store UID)

redminion0 · 2023-05-11T13:35:53Z

Hi @weisdd yes I have applied the CRD's.
After some more investigation it goes like this:

adding dashboards one at a time

from 0 dashboards in the cluster
load dashboards 1 at a time (i left around 2mins per dashboard) in my case up to 36.
dashboards get the guids from their json (in this case gzip and base64'd) correctly.
dashboards are resynced only at resync period all seems fine.
pods restart and the operator goes into a loop of constantly reconciling/updating the dashboards.

mass adding dashboards

from 0 dashboards in the cluster
apply all 36 dashboards and the loop starts immediately.

The error doesnt happen if smaller numbers are mass loaded at once.

if the operator is left in this state we eventually get a error of:
dial tcp <IP>:3000: connect: cannot assign requested address"}

I will attempt to find the number of dashboards that causes the problem

redminion0 · 2023-05-11T14:12:52Z

it seems to happen at around 18 dashboards (although im also thinking it might be size rather than volume) I can easily recreate it by applying all the dashboards from https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/grafana-dashboardDefinitions.yaml (although i have updated some of the panels (graph to time series etc)

weisdd · 2023-05-11T14:50:02Z

@redminion0 Great to hear it's reproducible. Could you share it in a form of a step-by-step guide + archive with manifests? (a full set of manifests would be useful for us to have the same resync timers and other settings)

weisdd · 2023-05-13T10:48:39Z

Alright, I think the repro in the archive should be enough to investigate it further:
dashboards-loop.zip

weisdd · 2023-05-14T11:55:38Z

@redminion0 I've prepared a fix in #1051, the test image is here: quay.io/weisdd/grafana-operator:v5.0.0-correct-uid. Sorry for the bug and thanks for reporting it :)

evgenii-denisov · 2023-06-11T22:10:15Z

Seems this problem still exists even in v5.0.0 with Simple Dashboard from basic example

      image: ghcr.io/grafana-operator/grafana-operator:v5.0.0

Operator constantly creates new version of dashboard every 30 second. And same problem, only version changed

Please describe what to check? And what I should provide to reproduce on your side?

weisdd · 2023-06-11T22:32:28Z

@evgenii-denisov The behaviour you're seeing is not a bug and has nothing to do with the original issue that got fixed in #1051. - The basic example contains resyncPeriod: 30s in its spec, that's why the operator re-uploads the dashboard every 30 seconds.
The mechanism itself is needed to cope with dashboard spec drift (e.g. due to users changing panels through UI), and calculating diffs in this case would have been over-engineering (keys can come in different order with different indentation => everything needs to be represented in the same way before "original" dashboard and the one that exists in Grafana can be compared), so the operator simply re-uploads the dashboard. The default interval is 5 minutes, though you're free to choose any other number or even disable resync :)

redminion0 added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 10, 2023

NissesSenap added the v5 A v5 specifc issue/feature label May 11, 2023

NissesSenap added this to the Version 5.0 milestone May 11, 2023

weisdd added bug/critical Bug with a critical severity, breaking functionality and removed bug Something isn't working labels May 13, 2023

weisdd added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2023

weisdd mentioned this issue May 14, 2023

fix: incorrect uid in dashboard exists method #1051

Merged

weisdd closed this as completed in #1051 May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Dashboards constantly refreshing in operator #1045

[Bug] Dashboards constantly refreshing in operator #1045

redminion0 commented May 10, 2023 •

edited

redminion0 commented May 10, 2023

NissesSenap commented May 11, 2023

NissesSenap commented May 11, 2023

weisdd commented May 11, 2023

redminion0 commented May 11, 2023

redminion0 commented May 11, 2023 •

edited

weisdd commented May 11, 2023

weisdd commented May 13, 2023

weisdd commented May 14, 2023

evgenii-denisov commented Jun 11, 2023 •

edited

weisdd commented Jun 11, 2023

[Bug] Dashboards constantly refreshing in operator #1045

[Bug] Dashboards constantly refreshing in operator #1045

Comments

redminion0 commented May 10, 2023 • edited

redminion0 commented May 10, 2023

NissesSenap commented May 11, 2023

NissesSenap commented May 11, 2023

weisdd commented May 11, 2023

redminion0 commented May 11, 2023

redminion0 commented May 11, 2023 • edited

weisdd commented May 11, 2023

weisdd commented May 13, 2023

weisdd commented May 14, 2023

evgenii-denisov commented Jun 11, 2023 • edited

weisdd commented Jun 11, 2023

redminion0 commented May 10, 2023 •

edited

redminion0 commented May 11, 2023 •

edited

evgenii-denisov commented Jun 11, 2023 •

edited