-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node restarts with no error message #5934
Comments
One of the ways this can happen is the out-of-memory-killer (OOM, an OS feature) decides that the system would be more stable if there was some free memory and kills the largest user (databases are large users for good reason). Unfortunately there is no catchable signal sent, so there is nothing we can put in our logs. But if this is the case, then there is probably an entry in the system log about the event. Usually they have the text @bsharpe: can you check for that message? If it is there, then I would check your memory settings to make sure that RethinkDB's cache setting is not set to larger than, say 2/3rds the total memory size: |
thx @larkost -- cache size was >2/3rds available ram... will tone it down. :) |
this would be very useful info to have in the docs... |
One thing is unclear to me though: The OOM killer would terminate the |
@danielmewes yes, we setup |
@larkost confirmed...
|
Ok we need to figure out why something started using so much memory. We've had some reports recently of increased memory usage by RethinkDB under disk I/o contention. Taking a backup might have pushed it into that scenario. We're still investigating this problem though, so it's too early to tell if it's a plausible explanation of this or not. Another thing we're looking into is whether the backup script itself could be using too much memory.
|
Unfortunately.. we were not running a backup at this time.
|
@bsharpe So the time where the out-of-memory situation happened was after the backup finished (or before it started)? I'm a bit confused, because you wrote
in the first comment. Just want to make sure I understand what happened... |
@danielmewes sorry -- we had two of these today - yes, the first was during a backup. The second one was not. So, when the backup was happening -- it was from a different machine via a proxy. |
One of our 3 nodes just restarted with no error message in the logs.
Ubuntu 14.04.01
Using RethinkDB 2.3.4 built from source as per instructions in the docs
We were under normal operations plus a full backup was being perform at the same time.
The text was updated successfully, but these errors were encountered: