RethinkDB shuts down slowly on rotational drive #1547

danielmewes · 2013-10-16T00:26:34Z

Just something I've noticed which might be worth investigating:

After inserting a lot of data (~100 GB) into a single RethinkDB instance on rotational drives, I shut down the server. There was only one table, with a cache size of just 1 GB. After about 15 minutes, it is still shutting down.

What's weird is the output of iostat -m:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda             305.50         2.86         0.75          5          1
dm-0            201.50         0.74         0.75          1          1
dm-1            542.00         2.12         0.00          4          0

First of all the write throughput is extremely low at less than a MB per second. Interestingly, RethinkDB is reading more data than writing. I assume this is the garbage collector at work.
At the same time, the relatively high number of transactions (305 on sda) could indicate that it is bound by disk seeks but that's just speculation.

The server uses only very little CPU at this point.

The text was updated successfully, but these errors were encountered:

danielmewes · 2013-10-16T00:47:57Z

hdparm -t /dev/sda reports that it can only read 8 MB/s while RethinkDB is shutting down. Without RethinkDB running the throughput is 124 MB/s on the same disk.

This indicates that the disk is actually saturated by the server.

If this is indeed the garbage collector:

Why does GC seem to have such a random access pattern? (or is inefficient with rotational drives in some other way)
Why is garbage collecting not interrupted at shut down? It just seems to keep on going.

Another interesting effect is that memory utilization as reported by htop fluctuates significantly during the shutdown process. It varied between about 30% (of 6 GB total on this machine), and up to more than 67%, repeatedly going up and down. I wouldn't expect memory consumption to increase (that much) during shutdown.

coffeemug · 2013-10-17T21:23:41Z

I don't think it's necessarily clear that it's GC that causes a slow shutdown and isn't being interrupted. We need to validate that theory first.

wojons · 2013-10-22T17:10:08Z

If I may ask did you shutdown immediately after writing all this data? Also how much memory does the server have its self? What sort of write durability was used for inserting the data?

danielmewes · 2013-10-22T19:27:01Z

Hi @wojons , I shut it down relatively shortly after writing the data, which I inserted with soft durability.
The RethinkDB server used between 2 and 4 GB of the system's 6 GB of memory.
Do you see similar (or different) behavior?

wojons · 2013-10-22T19:31:13Z

@danielmewes So far in all of my tests I have had fast shut downs on spinning disk (Less then 30 seconds). I have seen your issue with other databases. Did you have any extra indexes other then the primary?

danielmewes · 2013-10-22T19:37:22Z

@wojons Thanks for chiming in. I did not have any secondary indexes.
The odd thing is that RethinkDB was actually reading from disk during shutdown. Because this was just a single machine, that couldn't have been due to backfills or cluster operations going on either. As far as I know the only other thing which could cause disk reads during shutdown is our garbage collector. As @coffeemug pointed out, that hypothesis yet has to be verified though.

I will check later which exact circumstances are required to reproduce it.

wojons · 2013-10-24T00:41:22Z

@danielmewes i think i just ran into your problem i did a mass delete of 2 million items and the shutdown took a while

danielmewes · 2015-01-21T01:26:31Z

Closing as outdated.

wojons · 2015-01-21T01:41:05Z

@danielmewes do we want to test this before closing it.

Tryneus · 2015-01-21T01:50:52Z

If this was happening in debug mode, it may have been due to some debug checks that were performed. I ran into that a couple months ago and removed the check as it was rather antiquated. If this was happening in release mode, nevermind.

wojons · 2015-01-21T02:27:06Z

@Tryneus i can find sometime this weekened to load 2M real records into a spinning disk db and then delete it and then shutdown and see if its still happending if @danielmewes he can assign it to me.

danielmewes · 2015-01-21T04:07:36Z

Yeah I closed this because the last instance I'm aware of at least was quite a while ago and we changed and fixed a lot of things since.

Please just re-open this if you see it again.

danielmewes modified the milestones: backlog, subsequent Nov 25, 2014

danielmewes modified the milestones: outdated, backlog Jan 21, 2015

danielmewes closed this as completed Jan 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RethinkDB shuts down slowly on rotational drive #1547

RethinkDB shuts down slowly on rotational drive #1547

danielmewes commented Oct 16, 2013

danielmewes commented Oct 16, 2013

coffeemug commented Oct 17, 2013

wojons commented Oct 22, 2013

danielmewes commented Oct 22, 2013

wojons commented Oct 22, 2013

danielmewes commented Oct 22, 2013

wojons commented Oct 24, 2013

danielmewes commented Jan 21, 2015

wojons commented Jan 21, 2015

Tryneus commented Jan 21, 2015

wojons commented Jan 21, 2015

danielmewes commented Jan 21, 2015

RethinkDB shuts down slowly on rotational drive #1547

RethinkDB shuts down slowly on rotational drive #1547

Comments

danielmewes commented Oct 16, 2013

danielmewes commented Oct 16, 2013

coffeemug commented Oct 17, 2013

wojons commented Oct 22, 2013

danielmewes commented Oct 22, 2013

wojons commented Oct 22, 2013

danielmewes commented Oct 22, 2013

wojons commented Oct 24, 2013

danielmewes commented Jan 21, 2015

wojons commented Jan 21, 2015

Tryneus commented Jan 21, 2015

wojons commented Jan 21, 2015

danielmewes commented Jan 21, 2015