Cache size does not have any effect #6275

Andarius · 2017-02-20T12:54:12Z

I'm running rethinkdb in docker (alpine 2.3.5) with --cache-size 15000 set (one node here).
However on some heavy queries rethinkdb uses a lot more than the allowed memory (up to 64Gb) then run up of memory and dies.

Here is the full stack trace of the error:

warn: Some RethinkDB data on this server has been placed into swap memory. This may impact performance.
rethinkdb: Memory allocation failed. This usually means that we have run out of RAM. Aborting.
Version: rethinkdb 2.3.5~0jessie (GCC 4.9.2)
error: Error in src/arch/runtime/thread_pool.cc at line 367:
error: Segmentation fault from reading the address (nil).
error: Backtrace:
error: Mon Feb 20 12:20:32 2017

   1 [0xae7500]: backtrace_t::backtrace_t() at 0xae7500 (rethinkdb)
   2 [0xae7879]: format_backtrace(bool) at 0xae7879 (rethinkdb)
   3 [0xd9f6c3]: report_fatal_error(char const*, int, char const*, ...) at 0xd9f6c3 (rethinkdb)
   4 [0x9f0254]: linux_thread_pool_t::fatal_signal_handler(int, siginfo_t*, void*) at 0x9f0254 (rethinkdb)
   5 [0x7f46af6db8d0]: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7f46af6db8d0] at 0x7f46af6db8d0 (/lib/x86_64-linux-gnu/libpthread.so.0)
   6 [0x7f46af357532]: abort+0x232 at 0x7f46af357532 (/lib/x86_64-linux-gnu/libc.so.6)
   7 [0xd9f424]: rethinkdb() [0xd9f424] at 0xd9f424 ()
   8 [0xd9f439]: rethinkdb() [0xd9f439] at 0xd9f439 ()
   9 [0x7f46afe5f2fc]: operator new(unsigned long) at 0x7f46afe5f2fc (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)
   10 [0xa4f932]: void std::vector<char, std::allocator<char> >::_M_range_insert<char const*>(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, char const*, char const*, std::forward_iterator_tag) at 0xa4f932 (rethinkdb)
   11 [0xa4f7b1]: vector_stream_t::write(void const*, long) at 0xa4f7b1 (rethinkdb)
   12 [0xa5161e]: send_write_message(write_stream_t*, write_message_t const*) at 0xa5161e (rethinkdb)
   13 [0xa147da]: raw_mailbox_writer_t::write(write_stream_t*) at 0xa147da (rethinkdb)
   14 [0x9fd35a]: connectivity_cluster_t::send_message(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, unsigned char, cluster_send_message_write_callback_t*) at 0x9fd35a (rethinkdb)
   15 [0xa133dd]: send_write(mailbox_manager_t*, raw_mailbox_t::address_t, mailbox_write_callback_t*) at 0xa133dd (rethinkdb)
   16 [0xbaafd2]: primary_query_server_t::client_t::perform_request(boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, signal_t*) at 0xbaafd2 (rethinkdb)
   17 [0xbb3536]: multi_client_server_t<boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, primary_query_server_t*, primary_query_server_t::client_t>::client_t::on_request(signal_t*, boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) at 0xbb3536 (rethinkdb)
   18 [0xbb32bf]: mailbox_t<void (boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>)>::read_impl_t::read(read_stream_t*, signal_t*) at 0xbb32bf (rethinkdb)
   19 [0xa14442]: mailbox_manager_t::mailbox_read_coroutine(threadnum_t, unsigned long, std::vector<char, std::allocator<char> >*, long, mailbox_manager_t::force_yield_t) at 0xa14442 (rethinkdb)
   20 [0xa14542]: rethinkdb() [0xa14542] at 0xa14542 ()
   21 [0x9f2c47]: coro_t::run() at 0x9f2c47 (rethinkdb)

error: Exiting.

The text was updated successfully, but these errors were encountered:

srh · 2017-02-20T19:23:58Z

The cache size parameter doesn't affect intermediate query computation sizes. But there should be some way to deal with that, without the server dying. In my opinion this is a known defect of the product, and I'd like it to be fixed someday. But you have 49 GB to burn through! What sort of query are you running?

Andarius · 2017-02-20T22:35:23Z

The cache size parameter doesn't affect intermediate query computation sizes

I agree. But then, when the computation ends, why doesn't it free the cache memory used for the query ? Also I don't get the X% cached used in the interface. Right now, no requests are running and rethink is using 25G but I can see 93% cache used with a --cache-size of 15000 and a max memory for the docker of 35G.
The query I ran was a aggregation on a 14 million rows table.

srh · 2017-05-24T17:18:56Z

What was the actual query?

Andarius · 2017-06-21T09:06:14Z

Here is one:

req = (rdb.table(Price.table)
    .group(rdb.row['store_id'], index='field_id')
    .count()
    .ungroup()
)

marshall007 · 2017-06-21T15:43:16Z

@Andarius I don't think you can group by a secondary index and a field like that. You have to group by either a secondary index or one or more fields/functions.

AtnNn · 2017-07-13T18:47:48Z

no requests are running and rethink is using 25G but I can see 93% cache used with a --cache-size of 15000

That would mean it is using 14.2 GB for the cache and 10.8 GB for other data.

Does the memory usage go down if you query it with r.expr(1) a few times?

Do you have a large amount of tables? I believe he metadata for those can consume a lot of memory in some cases.

It is also possible there is a memory leak somewhere.

Andarius · 2017-07-16T21:34:31Z

Does the memory usage go down if you query it with r.expr(1) a few times?

No it does not.

Do you have a large amount of tables?

In total I have around 10 tables, but when I run the request that it's only on 1 table and it fills the cache.

lciummo · 2017-12-11T18:03:24Z

Are you confusing cache usage and disk space. the GB number is disk space I beleive. I have a 4GB RAM VM and have 100GB od disk space.

GeoffreyPlitt · 2018-01-05T04:00:29Z

This exact thing happens to me a lot on my cluster serving a production environment. I have a sense that lowering my cache by a certain amount will free up the headroom needed for table metadata and per-query memory, but i have no idea how to figure out what that amount is. And if I'm even slightly off and the instance swaps more than a little, everything comes to a halt.

lciummo · 2018-01-05T17:08:43Z

Looking at the original error in this thread (a segfault) is different than the error we see with rethinkdb on mem issues - we saw a call to an "out of memory" in rethinkdb that cause the process to halt.

We added a --restart always to the docker image to get around it. Adding RAM just made it fail less often (two days vs a few hours).

Doesn't seem like large DB's of hundreds of megs are handled well.

If you believe swapping is an issue, you might look in Linux hugetlb/hugepage processing. That helped a similar mysql issue a few years back.

Andarius closed this as completed Jan 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache size does not have any effect #6275

Cache size does not have any effect #6275

Andarius commented Feb 20, 2017

srh commented Feb 20, 2017

Andarius commented Feb 20, 2017

srh commented May 24, 2017

Andarius commented Jun 21, 2017

marshall007 commented Jun 21, 2017

AtnNn commented Jul 13, 2017

Andarius commented Jul 16, 2017 •

edited

Loading

lciummo commented Dec 11, 2017

GeoffreyPlitt commented Jan 5, 2018

lciummo commented Jan 5, 2018

Cache size does not have any effect #6275

Cache size does not have any effect #6275

Comments

Andarius commented Feb 20, 2017

srh commented Feb 20, 2017

Andarius commented Feb 20, 2017

srh commented May 24, 2017

Andarius commented Jun 21, 2017

marshall007 commented Jun 21, 2017

AtnNn commented Jul 13, 2017

Andarius commented Jul 16, 2017 • edited Loading

lciummo commented Dec 11, 2017

GeoffreyPlitt commented Jan 5, 2018

lciummo commented Jan 5, 2018

Andarius commented Jul 16, 2017 •

edited

Loading