Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

Closed
feuerrot opened this issue Dec 5, 2022 · 14 comments
Closed

Release v3.6.2/v3.6.3 runs into timeouts for some panels #389

feuerrot opened this issue Dec 5, 2022 · 14 comments

Comments

@feuerrot
Copy link

feuerrot commented Dec 5, 2022

What happened:
renovate upgraded our grafana-image-renderer docker container from version 3.6.1@sha256:b78baa730828d2f64e2d6a41f0314124147c3405c3890a444529bab6b1cebb6c to version 3.6.2@sha256:cbb508cee8a55c9f2f850e33bfcf7e487757e309a5a417df82d2d590b1333698.

Afterwards some of our automated render requests for reports started to fail: The chrome process inside the container runs at 100% cpu until the configured timeout. I can reproduce it by switching between the two container versions.

What you expected to happen:
Dashboards render without any problems whether we're using the v3.6.1 or v3.6.2 docker container.

How to reproduce it (as minimally and precisely as possible):
Docker Container env:

RENDERING_MODE=clustered
RENDERING_CLUSTERING_MODE=default
RENDERING_CLUSTERING_MAX_CONCURRENCY=50
RENDERING_CLUSTERING_TIMEOUT=120
ENABLE_METRICS=true
LOG_LEVEL=debug

Grafana configuration:

[rendering]
server_url = http://localhost:8081/render
callback_url = http://localhost:3000/
concurrent_render_request_limit = 1000
Panel JSON
{
  "id": 2,
  "gridPos": {
    "h": 10,
    "w": 24,
    "x": 0,
    "y": 0
  },
  "type": "timeseries",
  "title": "$path: Ladezeiten",
  "repeat": "path",
  "repeatDirection": "h",
  "maxPerRow": 2,
  "transformations": [],
  "interval": "1m",
  "datasource": {
    "uid": "000000016",
    "type": "prometheus"
  },
  "description": "",
  "fieldConfig": {
    "defaults": {
      "custom": {
        "drawStyle": "line",
        "lineInterpolation": "linear",
        "barAlignment": 0,
        "lineWidth": 1,
        "fillOpacity": 0,
        "gradientMode": "none",
        "spanNulls": false,
        "showPoints": "auto",
        "pointSize": 5,
        "stacking": {
          "mode": "none",
          "group": "A"
        },
        "axisPlacement": "auto",
        "axisLabel": "",
        "axisColorMode": "text",
        "scaleDistribution": {
          "type": "linear"
        },
        "axisCenteredZero": false,
        "hideFrom": {
          "tooltip": false,
          "viz": false,
          "legend": false
        },
        "thresholdsStyle": {
          "mode": "off"
        }
      },
      "color": {
        "mode": "palette-classic"
      },
      "mappings": [],
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {
            "color": "green",
            "value": null
          },
          {
            "color": "red",
            "value": 80
          }
        ]
      },
      "max": 10,
      "min": 0,
      "unit": "s"
    },
    "overrides": []
  },
  "options": {
    "tooltip": {
      "mode": "single",
      "sort": "none"
    },
    "legend": {
      "showLegend": true,
      "displayMode": "table",
      "placement": "bottom",
      "calcs": [
        "last",
        "min",
        "max"
      ]
    }
  },
  "targets": [
    {
      "exemplar": true,
      "expr": "holt_winters(headless_navigate_cached_duration_seconds{project=\"REDACTED\",path=~\"$path\"}[1d], 0.1, 0.5)",
      "hide": false,
      "interval": "$__interval",
      "legendFormat": "{{ environment }}",
      "refId": "B",
      "datasource": {
        "uid": "000000016",
        "type": "prometheus"
      }
    }
  ],
  "timeFrom": null,
  "scopedVars": {
    "path": {
      "text": "/",
      "value": "/",
      "selected": true
    }
  }
}
Container Logs for both versions
3.6.1:
{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/000000015/icinga2-default?var-hostname=REDACTED&var-service=check_REDACTED&var-command=prometheus_alert&panelId=1&orgId=5&width=640&height=280&theme=light&from=now-7d&to=now&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"file":"/tmp/e0775560.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:43 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=280&renderKey=REDACTED&timeout=60&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F000000015%2Ficinga2-default%3Fvar-hostname%3DREDACTED%26var-service%3Dcheck_REDACTED%26var-command%3Dprometheus_alert%26panelId%3D1%26orgId%3D5%26width%3D640%26height%3D280%26theme%3Dlight%26from%3Dnow-7d%26to%3Dnow%26render%3D1&width=640 HTTP/1.1\" 200 14703 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/000000015/icinga2-default?var-hostname=REDACTED&var-service=check_REDACTED&var-command=prometheus_alert&panelId=1&orgId=5&width=640&height=280&theme=light&from=now-7d&to=now&render=1"}
{"file":"/tmp/0a4a7388.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:45 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Ficinga2-reports%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D6%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-host%3DREDACTED%26var-service%3Dcheck_REDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 61113 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"file":"/tmp/a04b0494.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:49 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 56671 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"file":"/tmp/90a5f675.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:53 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 22349 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"file":"/tmp/c118bbe3.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:27:56 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 57743 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}


3.6.2:
{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"file":"/tmp/47de72d3.png","level":"debug","message":"Deleting temporary file"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:31:02 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Ficinga2-reports%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D6%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-host%3DREDACTED%26var-service%3Dcheck_REDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 200 60954 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/icinga2-reports?from=1667257200000&height=500&orgId=1&panelId=6&theme=light&timeout=10000&to=1669762800000&var-host=REDACTED&var-service=check_REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:32:08 +0000] \"GET /render/version HTTP/1.1\" 200 19 \"-\" \"Grafana/9.2.6\"\n"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:33:04 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D2%26theme%3Dlight%26timeout%3D100000%26to%3D1669762800000%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1667257200000&height=500&orgId=1&panelId=2&theme=light&timeout=100000&to=1669762800000&var-path=%2F&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:35:54 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3DREDACTED%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=REDACTED&width=1000&render=1"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:411:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [05/Dec/2022:14:37:58 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=REDACTED&timeout=100000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FclIh2vHnz%2FREDACTED%3Ffrom%3D1668210604609%26height%3D500%26orgId%3D1%26panelId%3D2%26render%3D1%26theme%3Dlight%26timeout%3D100000%26to%3D1668597640274%26var-path%3D%252F%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.2.6\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/clIh2vHnz/REDACTED?from=1668210604609&height=500&orgId=1&panelId=2&render=1&theme=light&timeout=100000&to=1668597640274&var-path=%2F&width=1000&render=1"}
Anything else we need to know?:

Environment:

  • Grafana Image Renderer version: v3.6.1 -> v3.6.2
  • Grafana version: v9.2.6
  • Installed plugin or remote renderer service: remote, using a docker container on the same host
  • OS Grafana Image Renderer is installed on: Grafana itself and the Docker daemon are running on CentOS 7.9.2009
  • User OS & Browser: Debian testing, Firefox 106.0b5
  • Others: -
@feuerrot feuerrot changed the title Release v3.6.2 runs into timeouts for some dashboards Release v3.6.2 runs into timeouts for some panels Dec 5, 2022
@jgournet
Copy link

jgournet commented Dec 9, 2022

I know such a comment usually does not help, but just to say you're not alone:
we're getting the same issue.
Reverting to 3.6.1 "solves" it

@joanlopez
Copy link
Collaborator

Hi,

Thanks for reporting it!

The only change introduced from v3.6.1 that could cause this issue is this one, unless it's related with any of the dependencies update.

Any thoughts @spinillos? 🤔

Maybe worth trying to reproduce it / what happens when de/applying these changes and see if we can, at least, identify the set of changes that caused it @feuerrot @jgournet

Thanks!

@WesselAtWork
Copy link

WesselAtWork commented Jan 4, 2023

We are also having a similar issue.

Every render on every panel on the specific dashboard succeeds except one: the only state-timeline in the dashboard.
At first I thought it was because the timeline is still in beta but as it turns out it seems to be related to this issue.

When we reverted back to v3.6.1 the panel got rendered perfectly.

Also another thing that happend was a MAJOR speed up; on v3.6.2 it easily took 3-7m for the entire dashboard to render (failing without completing so might even take longer) now (on v3.6.1) it only takes 1-2m and succeeds flawlessly!
Memory and CPU are also WAY down and stayed down.

v3.6.2 Resources

(Scale 1h)

CPU

image

Memory

image

Memory (Full lifetime ~3h)

image

v3.6.1 Resources

(Scale 5m [!!!])

CPU

image

Memory

image

Debug

When the panel failed we observed this error log in the image render. (v3.6.2)

{"level":"error","message":"Request failed","stack":"TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1036745 is guaranteed to work.\n    at Timeout.onTimeout (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:301:20)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)","url":"/render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560"}

{"level":"error","message":"x.x.x.x - - [Z] \"GET /render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2Fdashuid%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.3.1\"\n"}

On grafana we observed these logs

logger=rendering renderer=http t=Z level=info msg=Rendering path="d-solo/dashuid/_?from=now-24h&height=440&panelId=111&theme=light&to=now&width=560"

logger=rendering renderer=http t=Z level=debug msg="calling remote rendering service" url="http://grafana-image-renderer.monitorns:9999/render?deviceScaleFactor=1.000000&domain=grafana.monitorns&encoding=&height=440&renderKey=xxxxxxxxxxxxxxxxxxxxxxx&timeout=60&timezone=&url=http%3A%2F%2Fgrafana.monitorns%3A80%2Fd-solo%2FSRExOVRVx%2F_%3Ffrom%3Dnow-24h%26height%3D440%26panelId%3D111%26theme%3Dlight%26to%3Dnow%26width%3D560%26render%3D1&width=560"

logger=rendering renderer=http t=Z level=error msg="Remote rendering request failed" error="500 Internal Server Error" url="http://grafana.monitorns:80/d-solo/dashuid/_?from=now-24h&height=440&panelId=111&theme=light&to=now&width=560&render=1"

logger=context userId=x orgId=1 uname=sa-userx t=Z level=error msg="Rendering failed." error="remote rendering request failed, status code: 500, status: 500 Internal Server Error"

logger=context userId=x orgId=1 uname=sa-userx t=Z level=error msg="Request Completed" method=GET path=/render/d-solo/dashuid/_ status=500 remote_addr=x.x.x.x time_ms=31457 duration=31.457985829s size=1344 referer= handler=/render/*

logger=rendering renderer=http t=Z level=debug msg="calling remote rendering service" url=http://grafana-image-renderer.monitorns:9999/render/version

logger=cleanup t=Z level=info msg="Completed cleanup jobs" duration=398.773195ms

Seeing the time out, I tried to run the request with timeout=500 but that didn't change anything because it failed (printed the image-renderer errors) within 1m of the request being sent

For now using v3.6.1 is working for us

@WesselAtWork
Copy link

@feuerrot Are you using the grafana helm chart?
I noticed that the liveness probe is very short (1 sec) and I was seeing a lot of restarts on the pod.

@jgournet
Copy link

not OP, but: we use prometheus-kube-prometheus-stack helm chart - we don't have restarts issues (but we do have the issues from this ticket)

@WesselAtWork
Copy link

We actually use the exact same helm chart!
(It uses OP Grafana as a sub-chart anyway)

Sadly the chart doesn't allow us to change the liveness probe timeout directly.

Seems a value of 10s on the timeout guaranteed the containers to be up.

As an extra also saw that increasing the replicas seemed to increase stability without increasing the resource requirement with a small (20%) speed-up.

Will test to see if increasing the replicas with the default liveness works

@WesselAtWork
Copy link

not OP, but: we use prometheus-kube-prometheus-stack helm chart - we don't have restarts issues (but we do have the issues from this ticket)

@jgournet
What is the liveness timeout on your deployment? Might be that our k8s defaults to 1s timeout. Yours might be more generous.

@jgournet
Copy link

Well, now that I am checking ... turns out that I forgot:
Due to some old issues, we actually disable renderer from the original helm chart and patch the deploy to add a barebone one:

spec:
  template:
    spec:
      containers:
      - image: grafana/grafana-image-renderer:latest
        imagePullPolicy: IfNotPresent
        name: grafana-image-renderer
        ports:
        - containerPort: 8081
          name: web
          protocol: TCP
        resources:
          limits:
            cpu: 1000m
            memory: 1024Mi
          requests:
            cpu: 100m
            memory: 256Mi

so yeah ... ignore that part for me :)
(I'll try adding more CPU though)

@feuerrot
Copy link
Author

@WesselAtWork no, we just use puppet to deploy the image renderer container on the same host as grafana:

  if $deploy_image_renderer {
    include profile::docker
    docker::run { 'image_renderer':
      image   => $image_renderer_image,
      net     => 'host',
      env     => [
        'RENDERING_MODE=clustered',
        'RENDERING_CLUSTERING_MODE=default',
        'RENDERING_CLUSTERING_MAX_CONCURRENCY=50',
        'RENDERING_CLUSTERING_TIMEOUT=120',
        'ENABLE_METRICS=true',
        'LOG_LEVEL=debug',
      ],
      require => Class['profile::docker']
    }
[…]

I also tried the current version 3.6.3, which fails too:

{"collectDefaultMetrics":true,"level":"info","message":"Metrics enabled","requestDurationBuckets":"1,5,7,9,11,13,15,20,30"}
{"level":"info","maxConcurrency":50,"message":"using clustered browser","mode":"default","timeout":120}
{"config":{"acceptLanguage":null,"args":["--no-sandbox","--disable-gpu"],"chromeBin":"/usr/bin/chromium-browser","clustering":{"maxConcurrency":50,"mode":"default","monitor":false,"timeout":120},"deviceScaleFactor":1,"dumpio":false,"emulateNetworkConditions":false,"headed":false,"height":500,"ignoresHttpsErrors":false,"maxDeviceScaleFactor":4,"maxHeight":3000,"maxWidth":3080,"mode":"clustered","pageZoomLevel":1,"timingMetrics":false,"verboseLogging":false,"width":1000},"level":"debug","message":"Browser initialized"}
{"level":"info","message":"HTTP Server started, listening at http://localhost:8081"}
{"level":"debug","message":"Launching Browser cluster with puppeteer-cluster"}
{"level":"debug","message":"Render request received","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"err":"Error: Navigation failed because browser has disconnected!\n    at new LifecycleWatcher (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:95:223)\n    at Frame.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Frame.js:194:25)\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1155:88)\n    at /usr/src/app/build/browser/browser.js:260:29\n    at ClusteredBrowser.withTimingMetrics (/usr/src/app/build/browser/browser.js:412:20)\n    at ClusteredBrowser.takeScreenshot (/usr/src/app/build/browser/browser.js:256:24)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/src/app/build/browser/clustered.js:66:32","level":"error","message":"Error while trying to prepare page for screenshot","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"err":"Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.\n    at CDPSession.send (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:285:35)\n    at ExecutionContext._ExecutionContext_evaluate (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:210:46)\n    at ExecutionContext.evaluateHandle (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:158:107)\n    at WaitTask.rerun (/usr/src/app/node_modules/puppeteer/lib/cjs/puppeteer/common/IsolatedWorld.js:473:37)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)","level":"error","message":"Error while waiting for the panels to load","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"level":"error","message":"Request failed","stack":"Error: Timeout hit: 120000\n    at /usr/src/app/node_modules/puppeteer-cluster/dist/util.js:69:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/util.js:5:58)","url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=[…]&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3D[…]%26width%3D1000%26render%3D1&width=1000"}
{"level":"error","message":"::ffff:127.0.0.1 - - [24/Jan/2023:10:33:16 +0000] \"GET /render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=500&renderKey=[…]&timeout=10000&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2Fnode-info%3Ffrom%3D1667257200000%26height%3D500%26orgId%3D1%26panelId%3D5%26theme%3Dlight%26timeout%3D10000%26to%3D1669762800000%26var-hostname%3D[…]%26width%3D1000%26render%3D1&width=1000 HTTP/1.1\" 500 96 \"-\" \"Grafana/9.3.2\"\n"}
{"level":"debug","message":"Connection closed","url":"http://localhost:3000/d-solo/node-info?from=1667257200000&height=500&orgId=1&panelId=5&theme=light&timeout=10000&to=1669762800000&var-hostname=[…]&width=1000&render=1"}
{"level":"debug","message":"::ffff:127.0.0.1 - - [24/Jan/2023:10:33:52 +0000] \"GET /render/version HTTP/1.1\" 200 19 \"-\" \"Grafana/9.3.2\"\n"}

@feuerrot feuerrot changed the title Release v3.6.2 runs into timeouts for some panels Release v3.6.2/v3.6.3 runs into timeouts for some panels Jan 24, 2023
@feuerrot
Copy link
Author

At least for our test cases the problem seems to be fixed in the latest version (v3.6.4).

@astrolemonade
Copy link

It happens with 3.7.1 again @feuerrot

@astrolemonade
Copy link

TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1036745 is guaranteed to work.

@astrolemonade
Copy link

https://storage.googleapis.com/chromium-find-releases-static/index.html#r1036745
The version returned is

image

and the version from the 3.7.1 image is

Chromium 112.0.5615.165

@feuerrot
Copy link
Author

feuerrot commented Jun 1, 2023

@astrolemonade I haven't been able to test v3.7.1 yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants