Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No data and no alerts if site goes down #1219

Closed
2 tasks done
katsil opened this issue Jan 24, 2022 · 9 comments
Closed
2 tasks done

No data and no alerts if site goes down #1219

katsil opened this issue Jan 24, 2022 · 9 comments
Labels
area:core issues describing changes to the core of uptime kuma bug Something isn't working

Comments

@katsil
Copy link

katsil commented Jan 24, 2022

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

At some point in time, my monitoring stops checking the availability of the site. This is expressed in the fact that it "freezes" and displays the latest data for a certain period of time, the same thing happens when I use the prometheus exporter.

👟 Reproduction steps

This also happened on version 1.1 (before switching to the new one), I even put monitoring on a separate server, but the problem still repeats. I have about 15-20 HTTP checks, and they all don't work

👀 Expected behavior

Operational monitoring and display of correct data

😓 Actual Behavior

At some point in time, monitoring "freezes" and it's not clear to me how to fix it. This happens until I stop/enable monitoring or restart the server with monitoring.
Here is some screenshots:

how it looks like from uptime-kuma:

if 24h:

grafana:

So as you can see it stopped posting status after 00:00 24.01.22

How to fix this? My monitoring VM have about 100Gb NVMe space free

🐻 Uptime-Kuma Version

1.11.3

💻 Operating System and Arch

Ubuntu 18.04

🌐 Browser

Safari

🐋 Docker Version

No response

🟩 NodeJS Version

No response

📝 Relevant log output

No response

@katsil katsil added the bug Something isn't working label Jan 24, 2022
@katsil
Copy link
Author

katsil commented Jan 24, 2022

Again i see error, after restart

@chakflying
Copy link
Collaborator

Are there any logs in the server output?

@katsil
Copy link
Author

katsil commented Jan 25, 2022

Here is logs from docker container, maybe you can tell me where to find some other debug logs?

https://pb0.superhub.xyz/?fdff20a2a4ba2348#KNj9yiwtEjvqlqkkBtTBBJIqUpFOwLQok1xJvYnRCjw=

@katsil
Copy link
Author

katsil commented Jan 25, 2022

also error.log inside container:

[2022-01-20 06:49:43] KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
    at runMicrotasks (<anonymous>)
    at runNextTicks (internal/process/task_queues.js:60:5)
    at listOnTimeout (internal/timers.js:526:9)
    at processTimers (internal/timers.js:500:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.storeCore (/app/node_modules/redbean-node/dist/redbean-node.js:166:26)
    at async RedBeanNode.store (/app/node_modules/redbean-node/dist/redbean-node.js:126:20)
    at async beat (/app/server/model/monitor.js:417:13) {
  sql: undefined,
  bindings: undefined
}
[2022-01-20 06:49:43] KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
    at runMicrotasks (<anonymous>)
    at runNextTicks (internal/process/task_queues.js:60:5)
    at processTimers (internal/timers.js:497:9)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.storeCore (/app/node_modules/redbean-node/dist/redbean-node.js:166:26)
    at async RedBeanNode.store (/app/node_modules/redbean-node/dist/redbean-node.js:126:20)
    at async beat (/app/server/model/monitor.js:417:13)
    at async Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:443:17) {
  sql: undefined,
  bindings: undefined
}

@katsil
Copy link
Author

katsil commented Jan 26, 2022

Hey, guys, any news please?

@chakflying
Copy link
Collaborator

Relevant previous discussion in #218. Unfortunately it's a generic database connection error and there isn't much to go on.

@katsil
Copy link
Author

katsil commented Jan 26, 2022

generic database connection error

But im using native sqlite database inside docker container, how it may be error with connecting to database?

@louislam
Copy link
Owner

louislam commented Feb 1, 2022

It may causes by a busy database.

The monitor should be restarted if there is any error in general.
Unfortunately, I don't know why, most Knex errors are not catch by try-catch.

@CommanderStorm
Copy link
Collaborator

v1.23.X included some improvements in the direction of using incremental_vaccum => improving the situation.

A lot of performance improvements (using aggregated vs non-aggregated tables to store heartbeats, enabling users to choose mariadb as a db-backend, pagination of important events) have been made in v2.0 (our next release) resolving™️ this problem-area.
=> I'm going to close this issue

You can subscribe to our releases and get notified when a new release (such as v2.0-beta.0) gets made.
See #4171 for the bugs that need addressing before that can happen.

Meanwhile (the issue is with SQLite not reading data fast enough to keep up):

  • limit how much retention you have configured
  • limit yourself to a reasonable amount of monitors (hardware-dependant, no good measure)
  • don't run on slow disks or disk with high latency like HDDs, SD-cards, USB-Stick attached to a router, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core issues describing changes to the core of uptime kuma bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants