-
Notifications
You must be signed in to change notification settings - Fork 163
Description
What happened:
I have recently setup the docker version of the grafana image renderer. This docker service is a part of a larger compose setup that includes both InfluxDB and Grafana. I have the services talking to each other and can successfully download images via curl and other methods.
However, I am seeing pretty sporadic latency issues that I believe I've traced back to the renderer retrying after an initial failed network request.
More often than not, I am able to download the images in about ~2s, however, every 2-5 requests, the images take exactly ~61-62s.
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 52310 100 52310 0 0 31474 0 0:00:01 0:00:01 --:--:-- 31474
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11196 100 11196 0 0 6397 0 0:00:01 0:00:01 --:--:-- 6401
(base) [user@server grafana]$ curl -L "http://[URL]" -o tmp.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11187 100 11187 0 0 181 0 0:01:01 0:01:01 --:--:-- 3199
(base) [user@server grafana]$ curl -L "http://admin:admin@localhost:3001/render/d-solo/cemi669r5v5s0f?orgId=1&from=now()&to=now()-1m&var-Drainages=Carbon&panelId=panel-67&&width=400&height=300&tz=UTC&theme=light" -o tmp.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10246 100 10246 0 0 165 0 0:01:02 0:01:01 0:00:01 2180
In the renderer logs, I get the following only when I experience the 60s+ requests:
renderer | {"failure":"net::ERR_ABORTED","level":"error","message":"Browser request failed","method":"POST","url":"http://grafana:3000/api/ds/query?ds_type=influxdb&requestId=SQR100"}
renderer | {"err":"TimeoutError: Waiting failed: 60000ms exceeded\n at new WaitTask (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:50:34)\n at IsolatedWorld.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Realm.js:25:26)\n at CdpFrame.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Frame.js:561:43)\n at CdpFrame.<anonymous> (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/util/decorators.js:98:27)\n at CdpPage.waitForFunction (/home/nonroot/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:1366:37)\n at waitForQueriesAndVisualizations (/home/nonroot/build/browser/browser.js:595:16)\n at /home/nonroot/build/browser/browser.js:375:19\n at callback (/home/nonroot/build/browser/browser.js:546:34)\n at ClusteredBrowser.withMonitoring (/home/nonroot/build/browser/browser.js:553:16)\n at ClusteredBrowser.performStep (/home/nonroot/build/browser/browser.js:509:36)","level":"error","message":"Error while performing step","step":"panelsRendered","url":"http://grafana:3000/d-solo/[URL....]"}
The reason I think there is a retry involved is because I always successfully download the image after the 60s timeout happens. This also only seems to happen with the Influx queries. I tried replicating the issue with a prometheus backend and I never hit the issue. To be clear, I don't experience delays when running the influx query in grafana directly.
What you expected to happen:
I'd expect consistent download times. It feels a bit like this could be resolved by just shortening the retry period.
How to reproduce it (as minimally and precisely as possible):
I simply retry the same image rendering requests.
Anything else we need to know?:
Environment:
- Grafana Image Renderer version: latest (4.x.x+)
- Grafana version: latest (12.x.x+)
- Installed plugin or remote renderer service: remote
- OS Grafana Image Renderer is installed on: default docker OS
- User OS & Browser: RHEL
- Others:
-
- Influxdb v2.7.
-
- I've included the compose setup below for grafana and renderer
renderer:
image: grafana/grafana-image-renderer:latest
container_name: renderer
shm_size: 1g
environment:
- AUTH_TOKEN=test-token
- RENDERING_MODE=clustered
- RENDERING_CLUSTERING_TIMEOUT=600
- RENDERING_VIEWPORT_MAX_WIDTH=3000
- RENDERING_VIEWPORT_MAX_HEIGHT=3000
- ENABLE_METRICS=true
- RENDERING_TIMING_METRICS=true
# Try timeout
- LOG_LEVEL=debug
ports:
- "8081:8081"
networks:
- test_network
grafana:
build:
context: ./grafana
dockerfile: Dockerfile
image: grafana:latest
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=test
- GF_SECURITY_ADMIN_PASSWORD=test
- GF_SERVER_DOMAIN=grafana
- GF_SERVER_ROOT_URL=http://grafana:3000/
- GF_RENDERING_CALLBACK_URL=http://grafana:3000/
- GF_RENDERING_SERVER_URL=http://renderer:8081/render
- GF_RENDERING_RENDERER_TOKEN=test-token
- GF_RENDERING_RENDERING_TIMEOUT=30
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- influxdb
- renderer
networks:
- test_network