hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

mark-kubacki · 2016-04-09T22:37:32Z

I seem to be hitting an invisible wall when inserting some millions of rows (avg. 160bytes in size):
As soon as I've inserted 18M rows RethinkDB just stops responding.

Any pending INSERTs just hang.
CPU is at 0% at this point, no disk activity (no consolidation of the data on disk), ~60% of „cache“ has been used (I tried setting the cache to 16 GiB and then independently to 22 GiB without difference); and Linux shows that I still have some gigabytes memory free (out of the total memory, which is 32 GiB).

The web interface shows „NaN% cache used“ after some time after the stalling.

mark-kubacki · 2016-04-09T23:06:16Z

I've disabled swap (removed the partition) and reduced RethinkDB's cache size to below 16GiB: the hanging just happened earlier. I guess a million or two records have not been written to disk.

Being in this state, I cannot kill RethinkDB's main process.

mark-kubacki · 2016-04-10T13:58:10Z

The issue does not exist in RethinkDB 2.2.6.

danielmewes · 2016-04-10T15:24:19Z

                                                                                  On which operating system/distribution + version are you running this?Is there a chance that you could send us a copy of the rethinkdb data directory and the script that you're using for the writes? We can set up a secure upload location for uploading the data files. We're also happy to sign an NDA if the data is sensitive.

mark-kubacki · 2016-04-10T15:48:50Z

Running on Ubuntu 16.04 (Xenial) (updated as of today), their Linux 4.4.0-4-generic, SSD with ext4.
32 GiB ECC DDR4, amd64, Xeon E5-2676v3 (12 core+HT, Haswell). No swap.

RethinkDB is yours published for Docker and started this way:

# sysctl -w vm.overcommit_memory=1
# (no memory limits were hit, though)

docker run --name bench_rethinkdb \
  -v /var/lib/rethinkdb:/data \
  --net=host --privileged \
  -d \
  rethinkdb:2.3 rethinkdb --bind all --no-update-check

(I've used 2.2.6 and 2.3.0.)

I cannot send you the data directory, but the raw data ready for your import tool. Packed as squashfs image of about 2.5 GiB. That's all what is needed to reproduce the issue. No NDA needed. If that's okay with you I will start the upload and link to it here in a day or two.

danielmewes · 2016-04-10T16:15:57Z

                                                                                  Data in a squashfs image sounds good. Do you have a place to upload it to or do you want us to provide you with one?Thanks for your help in tracking this down!

mark-kubacki · 2016-04-11T20:15:44Z

Thanks for looking into this!

https://[link not public anymore]/

docker run \
  -v /var/lib/rethinkdb:/data \
  --net=host --privileged \
  -d \
  rethinkdb:2.3 rethinkdb --bind all --no-update-check

In database 'test' create a new table 'benchdata';
remember to select "Ack writes only when written to disk" ← NO.

sudo mount benchdata-people.csv.xz.squashfs /mnt

/usr/local/bin/rethinkdb-import \
  --force --table test.benchdata \
  --format csv -f /mnt/*.csv

After a few minutes, depending on RethinkDB's cache size, writes/s drop to zero and RethinkDB stops responding.

The error occurs even with said importer. I've formerly used a script written in Go for that (which pushed the data in in batches for 100 rows/transaction), but found that it doesn't make any difference. It's been of no difference whether I used 'address' as string or nested map. It's been of no difference whether 'isMale' is a string or bool, 'birthday' datetime or string.

I guess this is about millions of rows <512byte and something kicking in eventually.

danielmewes · 2016-04-19T18:23:19Z

@wmark Sorry for the delay, I was out of office last week. The download link no longer seems to be valid. Could you send me a new one via email to daniel@rethinkdb.com ?

danielmewes · 2016-04-20T22:29:34Z

It looks like Docker has a default size limit for the changed data of a given container, depending on the backend used.
Looks like it used to be 10 GB ( moby/moby#5151 ) though it was apparently bumped to 100 GB recently ( moby/moby#14709 ).

I wonder what happens if that limit is exhausted. Do you think it's possible that you're hitting this?

mark-kubacki · 2016-04-21T00:41:48Z

I don't think I hit that limit: version 2.2.6 runs just fine, even with a cache of 22 GiB.

The data is here: https://s.blitznote.com/unclassified/benchdata-people.csv.xz.squashfs (2.5 GiB)

danielmewes · 2016-04-21T00:49:23Z

Thanks for the re-upload @wmark . I got the data this time. Trying to reproduce now.

danielmewes · 2016-04-23T01:17:01Z

So far I couldn't reproduce. It has imported 46M rows at this point and is still writing at a steady rate.
This was with a relatively small 2 GB cache size, because I set things up on a small machine that has only 6 GB of RAM. I might need to try this on a bigger box.

danielmewes · 2016-04-25T16:52:19Z

My server went through and imported all 50M rows fine. I'll retry this on a larger machine in the next days to see if that changes anything.

mark-kubacki · 2016-04-25T20:47:13Z

I will pull some memory (32 GiB--) and disable some threads (24--) tomorrow.

danielmewes · 2016-04-25T21:19:14Z

@wmark You mentioned that you tried both larger and smaller cache sizes. I used a cache of just 2 GB. Had you previously tried anything that small?

mark-kubacki · 2016-04-26T15:04:58Z

No, I've never tried with it with anything less than the 7 GiB RethinkDB sets with 16 GiB memory, which is the lowest I can go (without setting anything myself) because I've only memory sticks of 16 GiB each.

Anyway, I've started test series now on a fresh Ubuntu 16.04, 2×SSDs with Ext4, Linux 4.4.0-21, Docker 1.11.0 (driver: overlay), and will update this comment with the results.

16 GiB memory, 24 threads, cache auto=7.1 GiB, 2.3.0 → success
16 GiB memory, 24 threads, cache auto=7.1 GiB, 2.3.1 →
16 GiB memory, 24 threads, cache fixed=2 GiB, 2.3.1 →
16 GiB memory, 24 threads cache fixed=2 GiB, 2.3.0 →

… 32 GiB, 4 threads combinations follow.

Due to (1) and the new Linux scheduling RethinkDB on the first half of the CoD (~ NUMA node) I suspect that a variable in RethinkDB is used as if it were atomic, but without proper locking.

danielmewes · 2016-04-26T18:03:18Z

Thanks for running more tests and helping us tracking this down.

Can you explain a bit more why you think that the success of configuration 1 is an indication of a locking issue?

mark-kubacki · 2016-04-26T18:07:29Z

It's because this time RethinkDB's threads didn't span two NUMA nodes or more. I will run some more tests to confirm that.

mark-kubacki · 2016-04-27T17:29:42Z

I've changed cache sizes, forced the kernel to schedule RethinkDB on two different NUMA nodes, switched to an importer written in Go to make inserts burstier – but cannot reproduce this with 4.4.0-21 anymore.

Thanks for staying on this with me!

danielmewes · 2016-04-27T17:48:52Z

@wmark Thanks for putting so much work into this. I'll see if I can find anything suspicious in the kernel change logs. So Ubuntu 4.4.0-4 was bad, 4.4.0-21 is good it seems right?

In the meantime, I have also tested 2.3.0 and 2.3.1 on a larger server with more RAM (though I limited the cache size to 16 GB), and with two NUMA nodes. Both worked fine, despite the scheduler scheduling RethinkDB across both nodes. However I was running a much older kernel on the host, namely 3.13.0-85-generic. So it seems plausible that this is somehow related to a kernel detail.

danielmewes · 2016-04-27T18:15:23Z

Well I skimmed through the changelog http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.4.0-21.37/changelog but nothing popped out to me. That doesn't mean much though.

mark-kubacki changed the title ~~stalls after 18M of record (20–30 GiB data on disk)~~ hangs with asynchronous writes, millions of records, and ≥60% cache usage Apr 9, 2016

danielmewes added this to the 2.3.x milestone Apr 19, 2016

danielmewes added tp:bug pr:high labels Apr 19, 2016

danielmewes modified the milestones: 2.3.x, 2.3.1 Apr 20, 2016

mark-kubacki closed this as completed Apr 27, 2016

danielmewes modified the milestones: invalid, 2.3.x Apr 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

mark-kubacki commented Apr 9, 2016

mark-kubacki commented Apr 9, 2016

mark-kubacki commented Apr 10, 2016

danielmewes commented Apr 10, 2016

mark-kubacki commented Apr 10, 2016

danielmewes commented Apr 10, 2016

mark-kubacki commented Apr 11, 2016

danielmewes commented Apr 19, 2016

danielmewes commented Apr 20, 2016

mark-kubacki commented Apr 21, 2016 •

edited

danielmewes commented Apr 21, 2016

danielmewes commented Apr 23, 2016

danielmewes commented Apr 25, 2016

mark-kubacki commented Apr 25, 2016

danielmewes commented Apr 25, 2016

mark-kubacki commented Apr 26, 2016 •

edited

danielmewes commented Apr 26, 2016

mark-kubacki commented Apr 26, 2016

mark-kubacki commented Apr 27, 2016

danielmewes commented Apr 27, 2016 •

edited

danielmewes commented Apr 27, 2016

hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

Comments

mark-kubacki commented Apr 9, 2016

mark-kubacki commented Apr 9, 2016

mark-kubacki commented Apr 10, 2016

danielmewes commented Apr 10, 2016

mark-kubacki commented Apr 10, 2016

danielmewes commented Apr 10, 2016

mark-kubacki commented Apr 11, 2016

danielmewes commented Apr 19, 2016

danielmewes commented Apr 20, 2016

mark-kubacki commented Apr 21, 2016 • edited

danielmewes commented Apr 21, 2016

danielmewes commented Apr 23, 2016

danielmewes commented Apr 25, 2016

mark-kubacki commented Apr 25, 2016

danielmewes commented Apr 25, 2016

mark-kubacki commented Apr 26, 2016 • edited

danielmewes commented Apr 26, 2016

mark-kubacki commented Apr 26, 2016

mark-kubacki commented Apr 27, 2016

danielmewes commented Apr 27, 2016 • edited

danielmewes commented Apr 27, 2016

mark-kubacki commented Apr 21, 2016 •

edited

mark-kubacki commented Apr 26, 2016 •

edited

danielmewes commented Apr 27, 2016 •

edited