Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hangs with asynchronous writes, millions of records, and ≥60% cache usage #5634

Closed
mark-kubacki opened this issue Apr 9, 2016 · 20 comments

Comments

@mark-kubacki
Copy link

I seem to be hitting an invisible wall when inserting some millions of rows (avg. 160bytes in size):
As soon as I've inserted 18M rows RethinkDB just stops responding.

Any pending INSERTs just hang.
CPU is at 0% at this point, no disk activity (no consolidation of the data on disk), ~60% of „cache“ has been used (I tried setting the cache to 16 GiB and then independently to 22 GiB without difference); and Linux shows that I still have some gigabytes memory free (out of the total memory, which is 32 GiB).

The web interface shows „NaN% cache used“ after some time after the stalling.

@mark-kubacki mark-kubacki changed the title stalls after 18M of record (20–30 GiB data on disk) hangs with asynchronous writes, millions of records, and ≥60% cache usage Apr 9, 2016
@mark-kubacki
Copy link
Author

I've disabled swap (removed the partition) and reduced RethinkDB's cache size to below 16GiB: the hanging just happened earlier. I guess a million or two records have not been written to disk.

Being in this state, I cannot kill RethinkDB's main process.

@mark-kubacki
Copy link
Author

The issue does not exist in RethinkDB 2.2.6.

@danielmewes
Copy link
Member

                                                                                  On which operating system/distribution + version are you running this?Is there a chance that you could send us a copy of the rethinkdb data directory and the script that you're using for the writes? We can set up a secure upload location for uploading the data files. We're also happy to sign an NDA if the data is sensitive.

@mark-kubacki
Copy link
Author

Running on Ubuntu 16.04 (Xenial) (updated as of today), their Linux 4.4.0-4-generic, SSD with ext4.
32 GiB ECC DDR4, amd64, Xeon E5-2676v3 (12 core+HT, Haswell). No swap.

RethinkDB is yours published for Docker and started this way:

# sysctl -w vm.overcommit_memory=1
# (no memory limits were hit, though)

docker run --name bench_rethinkdb \
  -v /var/lib/rethinkdb:/data \
  --net=host --privileged \
  -d \
  rethinkdb:2.3 rethinkdb --bind all --no-update-check

(I've used 2.2.6 and 2.3.0.)

I cannot send you the data directory, but the raw data ready for your import tool. Packed as squashfs image of about 2.5 GiB. That's all what is needed to reproduce the issue. No NDA needed. If that's okay with you I will start the upload and link to it here in a day or two.

@danielmewes
Copy link
Member

                                                                                  Data in a squashfs image sounds good. Do you have a place to upload it to or do you want us to provide you with one?Thanks for your help in tracking this down!

@mark-kubacki
Copy link
Author

Thanks for looking into this!

https://[link not public anymore]/

docker run \
  -v /var/lib/rethinkdb:/data \
  --net=host --privileged \
  -d \
  rethinkdb:2.3 rethinkdb --bind all --no-update-check

In database 'test' create a new table 'benchdata';
remember to select "Ack writes only when written to disk" ← NO.

sudo mount benchdata-people.csv.xz.squashfs /mnt

/usr/local/bin/rethinkdb-import \
  --force --table test.benchdata \
  --format csv -f /mnt/*.csv 

After a few minutes, depending on RethinkDB's cache size, writes/s drop to zero and RethinkDB stops responding.

The error occurs even with said importer. I've formerly used a script written in Go for that (which pushed the data in in batches for 100 rows/transaction), but found that it doesn't make any difference. It's been of no difference whether I used 'address' as string or nested map. It's been of no difference whether 'isMale' is a string or bool, 'birthday' datetime or string.

I guess this is about millions of rows <512byte and something kicking in eventually.

@danielmewes
Copy link
Member

@wmark Sorry for the delay, I was out of office last week. The download link no longer seems to be valid. Could you send me a new one via email to daniel@rethinkdb.com ?

@danielmewes danielmewes added this to the 2.3.x milestone Apr 19, 2016
@danielmewes danielmewes modified the milestones: 2.3.x, 2.3.1 Apr 20, 2016
@danielmewes
Copy link
Member

It looks like Docker has a default size limit for the changed data of a given container, depending on the backend used.
Looks like it used to be 10 GB ( moby/moby#5151 ) though it was apparently bumped to 100 GB recently ( moby/moby#14709 ).

I wonder what happens if that limit is exhausted. Do you think it's possible that you're hitting this?

@mark-kubacki
Copy link
Author

mark-kubacki commented Apr 21, 2016

I don't think I hit that limit: version 2.2.6 runs just fine, even with a cache of 22 GiB.

The data is here: https://s.blitznote.com/unclassified/benchdata-people.csv.xz.squashfs (2.5 GiB)

@danielmewes
Copy link
Member

Thanks for the re-upload @wmark . I got the data this time. Trying to reproduce now.

@danielmewes
Copy link
Member

So far I couldn't reproduce. It has imported 46M rows at this point and is still writing at a steady rate.
This was with a relatively small 2 GB cache size, because I set things up on a small machine that has only 6 GB of RAM. I might need to try this on a bigger box.

@danielmewes
Copy link
Member

My server went through and imported all 50M rows fine. I'll retry this on a larger machine in the next days to see if that changes anything.

@mark-kubacki
Copy link
Author

I will pull some memory (32 GiB--) and disable some threads (24--) tomorrow.

@danielmewes
Copy link
Member

@wmark You mentioned that you tried both larger and smaller cache sizes. I used a cache of just 2 GB. Had you previously tried anything that small?

@mark-kubacki
Copy link
Author

mark-kubacki commented Apr 26, 2016

No, I've never tried with it with anything less than the 7 GiB RethinkDB sets with 16 GiB memory, which is the lowest I can go (without setting anything myself) because I've only memory sticks of 16 GiB each.

Anyway, I've started test series now on a fresh Ubuntu 16.04, 2×SSDs with Ext4, Linux 4.4.0-21, Docker 1.11.0 (driver: overlay), and will update this comment with the results.

  1. 16 GiB memory, 24 threads, cache auto=7.1 GiB, 2.3.0 → success
  2. 16 GiB memory, 24 threads, cache auto=7.1 GiB, 2.3.1
  3. 16 GiB memory, 24 threads, cache fixed=2 GiB, 2.3.1 →
  4. 16 GiB memory, 24 threads cache fixed=2 GiB, 2.3.0

… 32 GiB, 4 threads combinations follow.

Due to (1) and the new Linux scheduling RethinkDB on the first half of the CoD (~ NUMA node) I suspect that a variable in RethinkDB is used as if it were atomic, but without proper locking.

@danielmewes
Copy link
Member

Thanks for running more tests and helping us tracking this down.

Can you explain a bit more why you think that the success of configuration 1 is an indication of a locking issue?

@mark-kubacki
Copy link
Author

It's because this time RethinkDB's threads didn't span two NUMA nodes or more. I will run some more tests to confirm that.

@mark-kubacki
Copy link
Author

I've changed cache sizes, forced the kernel to schedule RethinkDB on two different NUMA nodes, switched to an importer written in Go to make inserts burstier – but cannot reproduce this with 4.4.0-21 anymore.

Thanks for staying on this with me!

@danielmewes
Copy link
Member

danielmewes commented Apr 27, 2016

@wmark Thanks for putting so much work into this. I'll see if I can find anything suspicious in the kernel change logs. So Ubuntu 4.4.0-4 was bad, 4.4.0-21 is good it seems right?

In the meantime, I have also tested 2.3.0 and 2.3.1 on a larger server with more RAM (though I limited the cache size to 16 GB), and with two NUMA nodes. Both worked fine, despite the scheduler scheduling RethinkDB across both nodes. However I was running a much older kernel on the host, namely 3.13.0-85-generic. So it seems plausible that this is somehow related to a kernel detail.

@danielmewes
Copy link
Member

Well I skimmed through the changelog http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.4.0-21.37/changelog but nothing popped out to me. That doesn't mean much though.

@danielmewes danielmewes modified the milestones: invalid, 2.3.x Apr 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants