Changefeed Crashing Bugs #4678

Closed
mlucy opened this Issue Aug 13, 2015 · 33 comments

Comments

Projects
None yet
6 participants
@mlucy
Member

mlucy commented Aug 13, 2015

This was reported by SageMath on the mailing list.

The first bug is a failed sanity check:

FEED 30288716-0d35-41e0-acbf-a81be2ae9527 is broken, so canceling -- {"message":"SANITY CHECK FAILED: [batch.size() != 0 || (parent->coro_env->return_empty_normal_batches == return_empty_normal_batches_t::YES) || (bs.get_batch_type() == batch_type_t::NORMAL_FIRST)] at `src/rdb_protocol/datum_stream.cc:1286` (server is buggy).  Backtrace:
Wed Aug 12 21:11:16 2015

1: backtrace_t::backtrace_t() at 0xa4acb0 (/usr/bin/rethinkdb)
2: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at 0xa4afae (/usr/bin/rethinkdb)
3: ql::runtime_sanity_check_failed(char const*, int, char const*, std::string const&) at 0x789736 (/usr/bin/rethinkdb)
4: ql::coro_stream_t::cb(auto_drainer_t::lock_t) at 0x7cdd57 (/usr/bin/rethinkdb)
5: callable_action_instance_t<ql::coro_stream_t::maybe_launch_read()::{lambda()#2}>::run_action() at 0x7ce120 (/usr/bin/rethinkdb)
6: coro_t::run() at 0x954418 (/usr/bin/rethinkdb)
 in:
r.db("smc").table("file_use").getAll("0ab77792-c73f-4173-b3f8-3ee9b62b265e", ...

The second is a crash:

2015-08-12T22:21:30.086420124 47.641620s notice: Connected to server "rethink1" 5c0f5431-e72d-4e30-bda3-a4804f19d281
2015-08-12T22:23:19.046464266 156.601664s error: Error in src/rdb_protocol/changefeed.cc at line 1739:
2015-08-12T22:23:19.046499149 156.601698s error: Guarantee failed: [active()]
2015-08-12T22:23:19.046508241 156.601707s error: Backtrace:
2015-08-12T22:23:19.200879225 156.756080s error: Error in src/rdb_protocol/changefeed.cc at line 1739:
2015-08-12T22:23:19.200918701 156.756118s error: Guarantee failed: [active()]
2015-08-12T22:23:19.200928371 156.756127s error: Backtrace:
2015-08-12T22:23:19.286724209 156.841926s error: Wed Aug 12 22:23:19 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error
(char const*, int, char const*, ...) at ??:?
4: ql::changefeed::range_sub_t::apply_ops(ql::datum_t) at ??:?
5: ql::changefeed::msg_visitor_t::operator()(ql::changefeed::msg_t::change_t const&) const::{lambda(ql::changefeed::range_sub_t*)#1}::operator()(ql::changefeed::range_sub_t*) const at ??:?
6: void ql::changefeed::feed_t::each_sub_in_vec_cb<ql::changefeed::range_sub_t>(std::function<void (ql::changefeed::range_sub_t*)> const&, std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, std::vector<int, std::allocator<int> > const&, int) at ??:?
7: callable_action_instance_t<pmap_runner_one_arg_t<std::_Bind<std::_Mem_fn<void (ql::changefeed::feed_t::*)(std::function<void (ql::changefeed::range_sub_t*)> const&, std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, std::vector<int, std::allocator<int> > const&, int)> (ql::changefeed::feed_t*, std::reference_wrapper<std::function<void (ql::changefeed::range_sub_t*)> const>, std::reference_wrapper<std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const>, std::reference_wrapper<std::vector<int, std::allocator<int> > const>, std::_Placeholder<1>)>, long> >::run_action() at ??:?
8: coro_t::run() at ??:?

@mlucy mlucy self-assigned this Aug 13, 2015

@mlucy mlucy added this to the 2.1.x milestone Aug 13, 2015

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 13, 2015

Member

@danielmewes and I put together a fix for the second bug. We still don't know what's going on with the first one.

Member

mlucy commented Aug 13, 2015

@danielmewes and I put together a fix for the second bug. We still don't know what's going on with the first one.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 13, 2015

Member

(CR 3150 by @danielmewes .)

Member

mlucy commented Aug 13, 2015

(CR 3150 by @danielmewes .)

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 13, 2015

I just dumped all my data using rethinkdb's dump, then restored into 2.0.4 without using indexes (since I guess the dump format is otherwise not compatible), then recreated my indexes (which is easy for me, since I define them all in code). I'm now running that combination live in production and it seems to be working very, very well.

Something I didn't mention above is that many of my Node.js clients for rethinkdb were pegged at 100% cpu, as rethinkdb itself was often at about 500% (or more CPU), when I was running 2.1, and moreover when it did sort of work for requests, it was slow. Now with 2.0.4, the rethinkdb server is at 20% cpu and the clients are at about 3%. Also all the Reql* errors that were in the log with 2.1 are gone.

I'm going to try to write a stress test in node.js tomorrow to simulate the load that my node.js program parts on rethinkdb, and use that to test out anything you guys provide. Thanks for being so responsive and helpful!!

I just dumped all my data using rethinkdb's dump, then restored into 2.0.4 without using indexes (since I guess the dump format is otherwise not compatible), then recreated my indexes (which is easy for me, since I define them all in code). I'm now running that combination live in production and it seems to be working very, very well.

Something I didn't mention above is that many of my Node.js clients for rethinkdb were pegged at 100% cpu, as rethinkdb itself was often at about 500% (or more CPU), when I was running 2.1, and moreover when it did sort of work for requests, it was slow. Now with 2.0.4, the rethinkdb server is at 20% cpu and the clients are at about 3%. Also all the Reql* errors that were in the log with 2.1 are gone.

I'm going to try to write a stress test in node.js tomorrow to simulate the load that my node.js program parts on rethinkdb, and use that to test out anything you guys provide. Thanks for being so responsive and helpful!!

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 13, 2015

I switched to 2.0.4 as mentioned above, and was just rudely awaken when my 2.0.4 one-node server (with SSD, 32GB RAM, 8 cores) crashed with:

2015-08-13T04:21:46.948300789 0.239325s notice: Server ready, "dbstable" afe6a4c9-92d2-4add-9bf0-1648507b3efc
2015-08-13T13:55:34.076442491 34427.367467s error: Error in src/rdb_protocol/changefeed.cc at line 1476:
2015-08-13T13:55:34.076478044 34427.367503s error: Guarantee failed: [active()]
2015-08-13T13:55:34.076488347 34427.367513s error: Backtrace:
2015-08-13T13:55:34.239396061 34427.530424s error: Thu Aug 13 13:55:34 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: rep
ort_fatal_error(char const*, int, char const*, ...) at ??:?\n4: ql::changefeed::range_sub_t::apply_ops(ql::datum_t) at ??:?\n5: ql::changefeed::msg_visitor_t::
operator()(ql::changefeed::msg_t::change_t const&) const::{lambda(ql::changefeed::range_sub_t*)#1}::operator()(ql::changefeed::range_sub_t*) const at ??:?\n6:
void ql::changefeed::feed_t::each_sub_in_vec_cb<ql::changefeed::range_sub_t>(std::function<void (ql::changefeed::range_sub_t*)> const&, std::vector<std::set<ql
::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> >, std::allocator<std::set<ql::changefeed::ra
nge_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, std::vector<int, std::allocator<int> > const&,
int) at ??:?\n7: callable_action_instance_t<pmap_runner_one_arg_t<std::_Bind<std::_Mem_fn<void (ql::changefeed::feed_t::*)(std::function<void (ql::changefeed::
range_sub_t*)> const&, std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*>
 >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, s
td::vector<int, std::allocator<int> > const&, int)> (ql::changefeed::feed_t*, std::reference_wrapper<std::function<void (ql::changefeed::range_sub_t*)> const>,
 std::reference_wrapper<std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*
> >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const>,
std::reference_wrapper<std::vector<int, std::allocator<int> > const>, std::_Placeholder<1>)>, long> >::run_action() at ??:?\n8: coro_t::run() at ??:?
2015-08-13T13:55:34.239486888 34427.530511s error: Exiting.

Here:

salvus@db-stable:/var/lib/rethinkdb/default/data$ rethinkdb -v
rethinkdb 2.0.4~0vivid (GCC 4.9.2)

Trying to restart again resulted in

2015-08-13T14:10:05.151074210 0.259386s notice: Listening on addresses: 127.0.0.1, 10.240.186.14
2015-08-13T14:10:05.151091344 0.259402s notice: Server ready, "dbstable" afe6a4c9-92d2-4add-9bf0-1648507b3efc
2015-08-13T14:10:13.970131861 9.078444s error: accept() failed: Too many open files.
2015-08-13T14:11:58.017734655 113.126046s notice: Server got SIGINT from pid 3901, uid 0; shutting down...
2015-08-13T14:11:58.017819601 113.126130s notice: Shutting down client connections...
2015-08-13T14:11:58.017945535 113.126257s error: Error in src/errors.cc at line 125:
2015-08-13T14:11:58.017971071 113.126282s error: Uncaught exception of type "interrupted_exc_t"\n  what(): interrupted
2015-08-13T14:11:58.017986454 113.126297s error: Backtrace:
2015-08-13T14:11:58.020727180 113.129038s error: Thu Aug 13 14:11:58 2015\n\n1: backtrace_t::backtrace_t() at 0xa381f0 (/usr/bin/rethinkdb)\n2: format_backtrac
e(bool) at 0xa385bc (/usr/bin/rethinkdb)\n3: report_fatal_error(char const*, int, char const*, ...) at 0xc217f3 (/usr/bin/rethinkdb)\n4: terminate_handler() at
 0xc21a7e (/usr/bin/rethinkdb)\n5: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5eee6) [0x7fef60afdee6] at 0x7fef60afdee6 (/usr/lib/x86_64-linux-gnu/libstdc++.s
o.6)\n6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ef31) [0x7fef60afdf31] at 0x7fef60afdf31 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)\n7: /usr/lib/x86_64-l
inux-gnu/libstdc++.so.6(+0x5f149) [0x7fef60afe149] at 0x7fef60afe149 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)\n8: wait_interruptible(signal_t const*, signal_
t const*) at 0xa43620 (/usr/bin/rethinkdb)\n9: nap(long, signal_t const*) at 0x942999 (/usr/bin/rethinkdb)\n10: linux_nonthrowing_tcp_listener_t::accept_loop(a
uto_drainer_t::lock_t) at 0x937ec4 (/usr/bin/rethinkdb)\n11: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto
_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)> >::run_action() at 0x93c3b0 (/usr/bin/rethinkdb)\n12: coro_t::run() at 0x94d2
18 (/usr/bin/rethinkdb)
2015-08-13T14:11:58.020757903 113.129069s error: Exiting.

Then I tried again (no reboot), and it's now been running fine for 20 minutes.

Rethinkdb itself is not using a lot of CPU (this server is dedicated to running rethinkdb):

salvus@db-stable:/var/lib/rethinkdb/default/data$ uptime
 14:32:15 up 10:16,  3 users,  load average: 0.01, 0.27, 0.31

Any thoughts? Note that I haven't tried the new 2.1 binary you sent me as I just woke up. Also just "trying it" isn't straightforward since right now my only way to test is to use it in production...

If the fix you used there is the same as what you would use to fix 2.0.4 (since the first error above is basically the same), would it be hard to fix it in a 2.0.4 binary that I could just swap in, try, and report what happens after 24 hours of running it live?

I switched to 2.0.4 as mentioned above, and was just rudely awaken when my 2.0.4 one-node server (with SSD, 32GB RAM, 8 cores) crashed with:

2015-08-13T04:21:46.948300789 0.239325s notice: Server ready, "dbstable" afe6a4c9-92d2-4add-9bf0-1648507b3efc
2015-08-13T13:55:34.076442491 34427.367467s error: Error in src/rdb_protocol/changefeed.cc at line 1476:
2015-08-13T13:55:34.076478044 34427.367503s error: Guarantee failed: [active()]
2015-08-13T13:55:34.076488347 34427.367513s error: Backtrace:
2015-08-13T13:55:34.239396061 34427.530424s error: Thu Aug 13 13:55:34 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: rep
ort_fatal_error(char const*, int, char const*, ...) at ??:?\n4: ql::changefeed::range_sub_t::apply_ops(ql::datum_t) at ??:?\n5: ql::changefeed::msg_visitor_t::
operator()(ql::changefeed::msg_t::change_t const&) const::{lambda(ql::changefeed::range_sub_t*)#1}::operator()(ql::changefeed::range_sub_t*) const at ??:?\n6:
void ql::changefeed::feed_t::each_sub_in_vec_cb<ql::changefeed::range_sub_t>(std::function<void (ql::changefeed::range_sub_t*)> const&, std::vector<std::set<ql
::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> >, std::allocator<std::set<ql::changefeed::ra
nge_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, std::vector<int, std::allocator<int> > const&,
int) at ??:?\n7: callable_action_instance_t<pmap_runner_one_arg_t<std::_Bind<std::_Mem_fn<void (ql::changefeed::feed_t::*)(std::function<void (ql::changefeed::
range_sub_t*)> const&, std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*>
 >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const&, s
td::vector<int, std::allocator<int> > const&, int)> (ql::changefeed::feed_t*, std::reference_wrapper<std::function<void (ql::changefeed::range_sub_t*)> const>,
 std::reference_wrapper<std::vector<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*
> >, std::allocator<std::set<ql::changefeed::range_sub_t*, std::less<ql::changefeed::range_sub_t*>, std::allocator<ql::changefeed::range_sub_t*> > > > const>,
std::reference_wrapper<std::vector<int, std::allocator<int> > const>, std::_Placeholder<1>)>, long> >::run_action() at ??:?\n8: coro_t::run() at ??:?
2015-08-13T13:55:34.239486888 34427.530511s error: Exiting.

Here:

salvus@db-stable:/var/lib/rethinkdb/default/data$ rethinkdb -v
rethinkdb 2.0.4~0vivid (GCC 4.9.2)

Trying to restart again resulted in

2015-08-13T14:10:05.151074210 0.259386s notice: Listening on addresses: 127.0.0.1, 10.240.186.14
2015-08-13T14:10:05.151091344 0.259402s notice: Server ready, "dbstable" afe6a4c9-92d2-4add-9bf0-1648507b3efc
2015-08-13T14:10:13.970131861 9.078444s error: accept() failed: Too many open files.
2015-08-13T14:11:58.017734655 113.126046s notice: Server got SIGINT from pid 3901, uid 0; shutting down...
2015-08-13T14:11:58.017819601 113.126130s notice: Shutting down client connections...
2015-08-13T14:11:58.017945535 113.126257s error: Error in src/errors.cc at line 125:
2015-08-13T14:11:58.017971071 113.126282s error: Uncaught exception of type "interrupted_exc_t"\n  what(): interrupted
2015-08-13T14:11:58.017986454 113.126297s error: Backtrace:
2015-08-13T14:11:58.020727180 113.129038s error: Thu Aug 13 14:11:58 2015\n\n1: backtrace_t::backtrace_t() at 0xa381f0 (/usr/bin/rethinkdb)\n2: format_backtrac
e(bool) at 0xa385bc (/usr/bin/rethinkdb)\n3: report_fatal_error(char const*, int, char const*, ...) at 0xc217f3 (/usr/bin/rethinkdb)\n4: terminate_handler() at
 0xc21a7e (/usr/bin/rethinkdb)\n5: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5eee6) [0x7fef60afdee6] at 0x7fef60afdee6 (/usr/lib/x86_64-linux-gnu/libstdc++.s
o.6)\n6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ef31) [0x7fef60afdf31] at 0x7fef60afdf31 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)\n7: /usr/lib/x86_64-l
inux-gnu/libstdc++.so.6(+0x5f149) [0x7fef60afe149] at 0x7fef60afe149 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)\n8: wait_interruptible(signal_t const*, signal_
t const*) at 0xa43620 (/usr/bin/rethinkdb)\n9: nap(long, signal_t const*) at 0x942999 (/usr/bin/rethinkdb)\n10: linux_nonthrowing_tcp_listener_t::accept_loop(a
uto_drainer_t::lock_t) at 0x937ec4 (/usr/bin/rethinkdb)\n11: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto
_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)> >::run_action() at 0x93c3b0 (/usr/bin/rethinkdb)\n12: coro_t::run() at 0x94d2
18 (/usr/bin/rethinkdb)
2015-08-13T14:11:58.020757903 113.129069s error: Exiting.

Then I tried again (no reboot), and it's now been running fine for 20 minutes.

Rethinkdb itself is not using a lot of CPU (this server is dedicated to running rethinkdb):

salvus@db-stable:/var/lib/rethinkdb/default/data$ uptime
 14:32:15 up 10:16,  3 users,  load average: 0.01, 0.27, 0.31

Any thoughts? Note that I haven't tried the new 2.1 binary you sent me as I just woke up. Also just "trying it" isn't straightforward since right now my only way to test is to use it in production...

If the fix you used there is the same as what you would use to fix 2.0.4 (since the first error above is basically the same), would it be hard to fix it in a 2.0.4 binary that I could just swap in, try, and report what happens after 24 hours of running it live?

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 13, 2015

Still working fine an hour later. Due to the new semester, we have really heavy load suddenly, and the web server are all at <5% and the database at least than 50% of a core (variable of course). The only change is RethinkDB 2.0.4 versus 2.1. With 2.1 it was "everything at 100% of possible". This makes me extremely happy, since I was worried about whether RethinkDB could fundamentally handle how I'm using it (with lots of changefeeds for each user, etc). Clearly it works well, which is exciting (except for some crashes that I'm sure you will help me sort out once I have more data.)

I will try the bugfixed 2.1 with a copy of the data on another machine, and write a stress test in node.js using queries similar to my live ones. I can then compare 2.1+bugfix to 2.0.4 and see what happens.

Still working fine an hour later. Due to the new semester, we have really heavy load suddenly, and the web server are all at <5% and the database at least than 50% of a core (variable of course). The only change is RethinkDB 2.0.4 versus 2.1. With 2.1 it was "everything at 100% of possible". This makes me extremely happy, since I was worried about whether RethinkDB could fundamentally handle how I'm using it (with lots of changefeeds for each user, etc). Clearly it works well, which is exciting (except for some crashes that I'm sure you will help me sort out once I have more data.)

I will try the bugfixed 2.1 with a copy of the data on another machine, and write a stress test in node.js using queries similar to my live ones. I can then compare 2.1+bugfix to 2.0.4 and see what happens.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 14, 2015

Member

@williamstein -- we can definitely put together a version of 2.0 with the same patch. We can probably get it to you tomorrow.

It's surprising that 2.0 and 2.1 are behaving so differently on your workload. I wonder if it's an unrelated regression.

Sorry again that your first experience putting us into production was so rocky. Hopefully we can get these things sorted out pretty soon.

Member

mlucy commented Aug 14, 2015

@williamstein -- we can definitely put together a version of 2.0 with the same patch. We can probably get it to you tomorrow.

It's surprising that 2.0 and 2.1 are behaving so differently on your workload. I wonder if it's an unrelated regression.

Sorry again that your first experience putting us into production was so rocky. Hopefully we can get these things sorted out pretty soon.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 14, 2015

Sorry again that your first experience putting us into production was so rocky.
Hopefully we can get these things sorted out pretty soon.

Thanks. RethinkDB 2.0.4 is working extremely well regarding efficiency and solving my problems. I'm running it live in production right now at https://cloud.sagemath.com, and I'm very happy with how it is working. Once I get load simulated, I'll push hard to get 2.1 (or whatever) tested and we'll get to the bottom of things. You guys are great!

Sorry again that your first experience putting us into production was so rocky.
Hopefully we can get these things sorted out pretty soon.

Thanks. RethinkDB 2.0.4 is working extremely well regarding efficiency and solving my problems. I'm running it live in production right now at https://cloud.sagemath.com, and I'm very happy with how it is working. Once I get load simulated, I'll push hard to get 2.1 (or whatever) tested and we'll get to the bottom of things. You guys are great!

@v3ss0n

This comment has been minimized.

Show comment
Hide comment
@v3ss0n

v3ss0n Aug 14, 2015

from reading this , i am going to stay with 2.0.4 for now.

v3ss0n commented Aug 14, 2015

from reading this , i am going to stay with 2.0.4 for now.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 15, 2015

Member

@williamstein -- I just sent you an email with a patched version of 2.0.

Member

mlucy commented Aug 15, 2015

@williamstein -- I just sent you an email with a patched version of 2.0.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 15, 2015

Member

The fixes for the active guarantee are in 2.0.x and 2.1.x. They should go out in the next point release.

Member

mlucy commented Aug 15, 2015

The fixes for the active guarantee are in 2.0.x and 2.1.x. They should go out in the next point release.

@mlucy

This comment has been minimized.

Show comment
Hide comment
@mlucy

mlucy Aug 15, 2015

Member

William mentioned to me in an email that he's using RethinkDB Dash which opens a lot of connections for his workload. We should double-check that there isn't a regression in 2.1 when lots of connections are open at once.

Member

mlucy commented Aug 15, 2015

William mentioned to me in an email that he's using RethinkDB Dash which opens a lot of connections for his workload. We should double-check that there isn't a regression in 2.1 when lots of connections are open at once.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 15, 2015

On Saturday, August 15, 2015, Michael Lucy notifications@github.com wrote:

William mentioned to me in an email that he's using RethinkDB Dash which
opens a lot of connections for his workload. We should double-check that
there isn't a regression in 2.1 when lots of connections are open at once.

Here a lot is on the order of thousands. Eg 400 simultaneous users with
maybe 7 change feeds each would mean 2800 simultaneous open connections.
I will of course be looking into not using rethinkdb dash to see what
difference that makes.

Thanks - william


Reply to this email directly or view it on GitHub
#4678 (comment)
.

Sent from my massive iPhone 6 plus.

On Saturday, August 15, 2015, Michael Lucy notifications@github.com wrote:

William mentioned to me in an email that he's using RethinkDB Dash which
opens a lot of connections for his workload. We should double-check that
there isn't a regression in 2.1 when lots of connections are open at once.

Here a lot is on the order of thousands. Eg 400 simultaneous users with
maybe 7 change feeds each would mean 2800 simultaneous open connections.
I will of course be looking into not using rethinkdb dash to see what
difference that makes.

Thanks - william


Reply to this email directly or view it on GitHub
#4678 (comment)
.

Sent from my massive iPhone 6 plus.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 16, 2015

Problem solved.

TL;DR; The load problem wasn't the fault of RethinkDB at all and there's no performance regression. (So @v3ss0n don't let my experience dissuade you...) However, there is a recent change in semantics (probably a bug fix) that leads to a major problem in my application (which I've fixed).

The number of connections mentioned above was another problem, but not the core issue. The root cause of my problem seems to be a recent change to the semantics of getAll and change feeds in RethinkDB 2.1, which must not have been in 2.1.beta. For example, I was doing this sort of query:

db.table('accounts').getAll('0c5fc6d4-5d36-422e-8799-513c5b19d2b3', '25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb5', 'a76956a7-e59a-47e4-9a1c-d2cf896fd7cc').changes()...

  • In 2.0.x (and the beta) this would only produce output when there were new changes.
  • In 2.1 it acts more like a "point changefeed", and produces an output for each of the inputs.

My application uses changefeeds as "kill feeds" in some cases; when a change is seen it then kills and resets other changefeeds. (For example, your collaborators change when you get added to a new project, so another changefeed for the names of all your friends is cancelled and re-created. With the above change in semantics, my application would constantly cancel and recreate changefeeds, thus leading to infinite loops, which overloaded/crashed everything.)

Of course, you guys have already nicely implemented what I need to make things work with either semantics in a consistent way. With {includeStates: true}, I can just read from the iterator until state is 'ready', which works in both 2.0.x and 2.1. I implemented this small change and everything works fine with 2.1 now, as far as I can tell. In fact, I've been stress testing 2.1 very hard (e.g., making thousands of somewhat complicated changefeeds at once, making changes, etc.), and it's impressively efficient!

Many, many thanks again for all the help and fixing the edge case crashes that I reported (which may have been unrelated to the issue above). In my opinion, you may close this ticket.

Problem solved.

TL;DR; The load problem wasn't the fault of RethinkDB at all and there's no performance regression. (So @v3ss0n don't let my experience dissuade you...) However, there is a recent change in semantics (probably a bug fix) that leads to a major problem in my application (which I've fixed).

The number of connections mentioned above was another problem, but not the core issue. The root cause of my problem seems to be a recent change to the semantics of getAll and change feeds in RethinkDB 2.1, which must not have been in 2.1.beta. For example, I was doing this sort of query:

db.table('accounts').getAll('0c5fc6d4-5d36-422e-8799-513c5b19d2b3', '25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb5', 'a76956a7-e59a-47e4-9a1c-d2cf896fd7cc').changes()...

  • In 2.0.x (and the beta) this would only produce output when there were new changes.
  • In 2.1 it acts more like a "point changefeed", and produces an output for each of the inputs.

My application uses changefeeds as "kill feeds" in some cases; when a change is seen it then kills and resets other changefeeds. (For example, your collaborators change when you get added to a new project, so another changefeed for the names of all your friends is cancelled and re-created. With the above change in semantics, my application would constantly cancel and recreate changefeeds, thus leading to infinite loops, which overloaded/crashed everything.)

Of course, you guys have already nicely implemented what I need to make things work with either semantics in a consistent way. With {includeStates: true}, I can just read from the iterator until state is 'ready', which works in both 2.0.x and 2.1. I implemented this small change and everything works fine with 2.1 now, as far as I can tell. In fact, I've been stress testing 2.1 very hard (e.g., making thousands of somewhat complicated changefeeds at once, making changes, etc.), and it's impressively efficient!

Many, many thanks again for all the help and fixing the edge case crashes that I reported (which may have been unrelated to the issue above). In my opinion, you may close this ticket.

@v3ss0n

This comment has been minimized.

Show comment
Hide comment
@v3ss0n

v3ss0n Aug 16, 2015

@williamstein and rethinkdb , thanks a lot! thats awesome , and its nice to learn the way you using changefeeds as kill feeds , i am a fan of SageMath , back in previous projects too. Nice to see it migrating to Rethinkdb!

v3ss0n commented Aug 16, 2015

@williamstein and rethinkdb , thanks a lot! thats awesome , and its nice to learn the way you using changefeeds as kill feeds , i am a fan of SageMath , back in previous projects too. Nice to see it migrating to Rethinkdb!

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 17, 2015

Bad news. I actually tried to switch to 2.1 in production, using the binary that you sent me in email, and with the "killfeed" related issue above fixed. It was a disaster. Again, 99% cpu load on all machines. Also, I saw "SANITY CHECK FAILED" stuff like this in my log:

2015-08-17T02:32:18.997Z - debug: FEED 64523879-a9e4-40a7-a143-829a543475fa -- saw a change! [{"message":"SANITY CHECK FAILED: [batch.s
ize() != 0 || (parent->coro_env->return_empty_normal_batches == return_empty_normal_batches_t::YES) || (bs.get_batch_type() == batch_ty
pe_t::NORMAL_FIRST)] at `src/rdb_protocol/datum_stream.cc:1286` (server is buggy).  Backtrace:\nMon Aug 17 02:32:18 2015\n\n1: backtrac
e_t::backtrace_t() at 0xa4aa10 (/usr/bin/rethinkdb)\n2: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at 0xa4ad0e (/usr/bin/
rethinkdb)\n3: ql::runtime_sanity_check_failed(char const*, int, char const*, std::string const&) at 0x789716 (/usr/bin/rethinkdb)\n4:
ql::coro_stream_t::cb(auto_drainer_t::lock_t) at 0x7cdd37 (/usr/bin/rethinkdb)\n5: callable_action_instance_t<ql::coro_stream_t::maybe_
launch_read()::{lambda()#2}>::run_action() at 0x7ce100 (/usr/bin/rethinkdb)\n6: coro_t::run() at 0x954178 (/usr/bin/rethinkdb)\n in:\nr
.db(\"smc\").table(\"accounts\").getAll(\"45a74e89-3165-43b1-a25f-08f97466159e\", \"25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb5\", \"3fcf140d-
b4c1-4eab-a80e-c3f8dd980a0d\", \"7676bc45-5623-4a5a-84f3-c042ef361edd\", \"79c4b5b1-1348-424d-804c-89200606ea16\", \"91dcf4bc-5b4a-4ade
-b1cb-26ae581f77d0\", \"f4cf9216-765b-4638-9811-5f834bd552fc\")\n^^^^

I'm not sure what to do at this point. Rethink 2.0.4 works very well. 2.1 seems seriously broken. Maybe I should wait for 2.2.

Bad news. I actually tried to switch to 2.1 in production, using the binary that you sent me in email, and with the "killfeed" related issue above fixed. It was a disaster. Again, 99% cpu load on all machines. Also, I saw "SANITY CHECK FAILED" stuff like this in my log:

2015-08-17T02:32:18.997Z - debug: FEED 64523879-a9e4-40a7-a143-829a543475fa -- saw a change! [{"message":"SANITY CHECK FAILED: [batch.s
ize() != 0 || (parent->coro_env->return_empty_normal_batches == return_empty_normal_batches_t::YES) || (bs.get_batch_type() == batch_ty
pe_t::NORMAL_FIRST)] at `src/rdb_protocol/datum_stream.cc:1286` (server is buggy).  Backtrace:\nMon Aug 17 02:32:18 2015\n\n1: backtrac
e_t::backtrace_t() at 0xa4aa10 (/usr/bin/rethinkdb)\n2: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at 0xa4ad0e (/usr/bin/
rethinkdb)\n3: ql::runtime_sanity_check_failed(char const*, int, char const*, std::string const&) at 0x789716 (/usr/bin/rethinkdb)\n4:
ql::coro_stream_t::cb(auto_drainer_t::lock_t) at 0x7cdd37 (/usr/bin/rethinkdb)\n5: callable_action_instance_t<ql::coro_stream_t::maybe_
launch_read()::{lambda()#2}>::run_action() at 0x7ce100 (/usr/bin/rethinkdb)\n6: coro_t::run() at 0x954178 (/usr/bin/rethinkdb)\n in:\nr
.db(\"smc\").table(\"accounts\").getAll(\"45a74e89-3165-43b1-a25f-08f97466159e\", \"25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb5\", \"3fcf140d-
b4c1-4eab-a80e-c3f8dd980a0d\", \"7676bc45-5623-4a5a-84f3-c042ef361edd\", \"79c4b5b1-1348-424d-804c-89200606ea16\", \"91dcf4bc-5b4a-4ade
-b1cb-26ae581f77d0\", \"f4cf9216-765b-4638-9811-5f834bd552fc\")\n^^^^

I'm not sure what to do at this point. Rethink 2.0.4 works very well. 2.1 seems seriously broken. Maybe I should wait for 2.2.

@v3ss0n

This comment has been minimized.

Show comment
Hide comment
@v3ss0n

v3ss0n Aug 17, 2015

I believe the team will fix it in upcoming point release. It won't take till 2.2 probably . This is a high priority bug .

v3ss0n commented Aug 17, 2015

I believe the team will fix it in upcoming point release. It won't take till 2.2 probably . This is a high priority bug .

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 17, 2015

Member

Arr this is embarrassing. We have implemented an extension of changefeeds that allows them to return initial values before starting to stream changes. As part of the change, we also enabled that behavior by default for getAll, which is the changed behavior that you've observed in those queries.

However this feature was not supposed to ship with 2.1, because it is largely untested (and as it turns out indeed buggy, assuming the errors actually stem from this).
We disabled it in #4355 , but for some reason (I think this was probably my mistake), the feature ended up being enabled again in 2.1.0 and 2.1.1.

@williamstein You can explicitly disable the new (buggy) behavior in 2.1 by supplying the optarg include_initial_vals=False (Python) / includeInitialVals: false (JavaScript) to the changes term, or as a global optarg to run.
I understand you've already gone through a lot of trouble because of this, so I inderstand if you're not up for further experiments at the moment. If you have a second testing setup somewhere, it would be interesting to know though if supplying that argument with 2.1 actually avoids the issue.

We will also revert this for the upcoming 2.1.2 point release, and getAll changefeeds will be back to their previous behavior after that.

Member

danielmewes commented Aug 17, 2015

Arr this is embarrassing. We have implemented an extension of changefeeds that allows them to return initial values before starting to stream changes. As part of the change, we also enabled that behavior by default for getAll, which is the changed behavior that you've observed in those queries.

However this feature was not supposed to ship with 2.1, because it is largely untested (and as it turns out indeed buggy, assuming the errors actually stem from this).
We disabled it in #4355 , but for some reason (I think this was probably my mistake), the feature ended up being enabled again in 2.1.0 and 2.1.1.

@williamstein You can explicitly disable the new (buggy) behavior in 2.1 by supplying the optarg include_initial_vals=False (Python) / includeInitialVals: false (JavaScript) to the changes term, or as a global optarg to run.
I understand you've already gone through a lot of trouble because of this, so I inderstand if you're not up for further experiments at the moment. If you have a second testing setup somewhere, it would be interesting to know though if supplying that argument with 2.1 actually avoids the issue.

We will also revert this for the upcoming 2.1.2 point release, and getAll changefeeds will be back to their previous behavior after that.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 17, 2015

Member

include_initial_vals is disabled again as of a0a210c in the branch v2.1.x. This will be included in RethinkDB 2.1.2.

Member

danielmewes commented Aug 17, 2015

include_initial_vals is disabled again as of a0a210c in the branch v2.1.x. This will be included in RethinkDB 2.1.2.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 17, 2015

Thanks. I will test soon with that option to disable initial vals in production with 2.1.x, now that I'm good at switching back and forth. (I settled on switching by setting up a separate 2.1.x machine, then rsyncing the live data from 2.0.4 repeatedly, then shutting down the 2.0.4 server, then doing a final rsync, then starting the 2.1.x server on that data, then finally rebuild indexes manually. I end up with about 1 min downtime by doing this, as opposed to using "rethinkdb dump/restore", which can take 15-20 minutes with my data.)

I can also happily report that RethinkDB 2.0.4 has stopped crashing for me in production (with the above backtrace) when I used the slightly updated binary you guys emailed me. THANKS.

Thanks. I will test soon with that option to disable initial vals in production with 2.1.x, now that I'm good at switching back and forth. (I settled on switching by setting up a separate 2.1.x machine, then rsyncing the live data from 2.0.4 repeatedly, then shutting down the 2.0.4 server, then doing a final rsync, then starting the 2.1.x server on that data, then finally rebuild indexes manually. I end up with about 1 min downtime by doing this, as opposed to using "rethinkdb dump/restore", which can take 15-20 minutes with my data.)

I can also happily report that RethinkDB 2.0.4 has stopped crashing for me in production (with the above backtrace) when I used the slightly updated binary you guys emailed me. THANKS.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 18, 2015

I'm using Javascript (with rethinkdbdash, but I think that shouldn't matter for building the query). When I use the includeInitialVals option, I get

Uncaught exception: ReqlDriverError: Unrecognized option `includeInitialVals` in `changes` after:
...
Available options are squash <bool>

That option isn't documented here, so I don't know if there is a typo: http://rethinkdb.com/api/javascript/changes/
Also, the "Available options" error message isn't right, since it only includes squash, thus omitting the documented includeStates option.

I also tried passing the option to run and get this error:

{ [ReqlDriverError: Unrecognized option `includeInitialVals` in `run`. Available options are readMode <string>, durability <string>, noreply <bool>, timeFormat <string>, grou

pFormat: , profile , binaryFormat , cursor , stream .]

I'm using the RethinkDB 2.1 dpkg that Michael emailed to me. And I'm sorry again if rethinkdbdash is somehow relevant to this.

I've figured out how to fully stress test a testing setup, so I can now easily test anything...

I'm using Javascript (with rethinkdbdash, but I think that shouldn't matter for building the query). When I use the includeInitialVals option, I get

Uncaught exception: ReqlDriverError: Unrecognized option `includeInitialVals` in `changes` after:
...
Available options are squash <bool>

That option isn't documented here, so I don't know if there is a typo: http://rethinkdb.com/api/javascript/changes/
Also, the "Available options" error message isn't right, since it only includes squash, thus omitting the documented includeStates option.

I also tried passing the option to run and get this error:

{ [ReqlDriverError: Unrecognized option `includeInitialVals` in `run`. Available options are readMode <string>, durability <string>, noreply <bool>, timeFormat <string>, grou

pFormat: , profile , binaryFormat , cursor , stream .]

I'm using the RethinkDB 2.1 dpkg that Michael emailed to me. And I'm sorry again if rethinkdbdash is somehow relevant to this.

I've figured out how to fully stress test a testing setup, so I can now easily test anything...

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 18, 2015

Member

That indeed looks like a rethinkdbdash limitation, in that it appears to use a white-list of flags. Since we didn't intent to ship this flag, it's not on the white list.

@neumino What's the easiest way to get an extra optarg through rethinkdbdash? Could it be passed to run maybe?

Member

danielmewes commented Aug 18, 2015

That indeed looks like a rethinkdbdash limitation, in that it appears to use a white-list of flags. Since we didn't intent to ship this flag, it's not on the white list.

@neumino What's the easiest way to get an extra optarg through rethinkdbdash? Could it be passed to run maybe?

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 18, 2015

Thanks for checking. I can modify the source code of rethinkdbdash as well. (But I won't be able to work on this until later.) I've also been working on writing a monkey patch of the run method of the official javascript rethinkdb driver which gives it the same semantics as the rethinkdbdash run command, so that I can flip a switch and test with either rethinkdbdash or the official driver without having to change any of my other code.

Thanks for checking. I can modify the source code of rethinkdbdash as well. (But I won't be able to work on this until later.) I've also been working on writing a monkey patch of the run method of the official javascript rethinkdb driver which gives it the same semantics as the rethinkdbdash run command, so that I can flip a switch and test with either rethinkdbdash or the official driver without having to change any of my other code.

@neumino

This comment has been minimized.

Show comment
Hide comment
Member

neumino commented Aug 19, 2015

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 20, 2015

I rewrote my code to optionally use the native driver instead of rethinkdbdash (and also just wrote my own threadpool system for better control/monitoring). I've run a lot of tests in the last day, and I guess the main issues that I reported can be worked around. I just started hitting this new traceback/crash a few seconds after starting one of the main nodes in my testing cluster:

2015-08-19T23:56:52.203766831 26.607850s error: Error in src/clustering/query_routing/table_query_client.cc at line 89:
2015-08-19T23:56:52.203808424 26.607891s error: Guarantee failed: [!r.route_to_primary()]
2015-08-19T23:56:52.203816429 26.607899s error: Backtrace:
2015-08-19T23:56:52.318109819 26.722195s error: Wed Aug 19 23:56:52 2015

1: backtrace_t::backtrace_t() at ??:?
2: format_backtrace(bool) at ??:?
3: report_fatal_error(char const*, int, char const*, ...) at ??:?
4: table_query_client_t::read(read_t con
st&, read_response_t*, order_token_t, signal_t*) at ??:?
5: real_table_t::read_with_profile(ql::env_t*, read_t const&, read_response_t*) at ??:?
6: ql::rget_response_reader_t::do_read(ql::env_t*, read_t const&) at ??:?
7: ql::rget_reader_t::do_range_read
(ql::env_t*, read_t const&) at ??:?
8: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at ??:?
9: ql::rget_response_reader_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
10: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*
, ql::batchspec_t const&) at ??:?
11: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
12: ql::changefeed::splice_stream_t::next_stream_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
13: ql::changefeed::stream_t<ql::changefee
d::range_sub_t>::next_raw_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
14: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_le
ss_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*) at ??:?
15: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at ??:?
16: ql::datum_stream_t::next_batch(ql::env_t
*, ql::batchspec_t const&) at ??:?
17: ql::coro_stream_t::cb(auto_drainer_t::lock_t) at ??:?
18: callable_action_instance_t<ql::coro_stream_t::maybe_launch_read()::{lambda()#2}>::run_action() at ??:?
19: coro_t::run() at ??:?
2015-08-19T23:56:52.318167726 26.722251s error: Exiting.

I'm just testing making one changefeed using a getAll query with an index (over a few hundred keys).
I'm using this binary: rethinkdb_2.1.1+1+g989a97-0vivid_amd64.deb, which was emailed to me. The above crash isn't at all random. It happens every time I try to do this query (I realize this may mean little without more info about the schema):

table('file_use').getAll(ids..., index:'project_id').pluck('project_id', 'path', 'users', 'last_edited').changes(includeStates: false).

My plan at this point is to rewrite my application to use only the simplest features of changefeeds, namely listening for all changes to a table (which works without crashing), and basically re-implement the same idea as changefeeds (but in node.js) in a way that is rich enough to support my application. Even when things don't crash, the performance of doing certain more complicated queries like above is such that I really have to just move more to be in memory in my web server, unfortunately. (Obviously, it's possible I'm just naive, though I've spent a lot of time now trying a lot of different approaches and benchmarks systematically...)

I rewrote my code to optionally use the native driver instead of rethinkdbdash (and also just wrote my own threadpool system for better control/monitoring). I've run a lot of tests in the last day, and I guess the main issues that I reported can be worked around. I just started hitting this new traceback/crash a few seconds after starting one of the main nodes in my testing cluster:

2015-08-19T23:56:52.203766831 26.607850s error: Error in src/clustering/query_routing/table_query_client.cc at line 89:
2015-08-19T23:56:52.203808424 26.607891s error: Guarantee failed: [!r.route_to_primary()]
2015-08-19T23:56:52.203816429 26.607899s error: Backtrace:
2015-08-19T23:56:52.318109819 26.722195s error: Wed Aug 19 23:56:52 2015

1: backtrace_t::backtrace_t() at ??:?
2: format_backtrace(bool) at ??:?
3: report_fatal_error(char const*, int, char const*, ...) at ??:?
4: table_query_client_t::read(read_t con
st&, read_response_t*, order_token_t, signal_t*) at ??:?
5: real_table_t::read_with_profile(ql::env_t*, read_t const&, read_response_t*) at ??:?
6: ql::rget_response_reader_t::do_read(ql::env_t*, read_t const&) at ??:?
7: ql::rget_reader_t::do_range_read
(ql::env_t*, read_t const&) at ??:?
8: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at ??:?
9: ql::rget_response_reader_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
10: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*
, ql::batchspec_t const&) at ??:?
11: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
12: ql::changefeed::splice_stream_t::next_stream_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
13: ql::changefeed::stream_t<ql::changefee
d::range_sub_t>::next_raw_batch(ql::env_t*, ql::batchspec_t const&) at ??:?
14: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_le
ss_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*) at ??:?
15: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at ??:?
16: ql::datum_stream_t::next_batch(ql::env_t
*, ql::batchspec_t const&) at ??:?
17: ql::coro_stream_t::cb(auto_drainer_t::lock_t) at ??:?
18: callable_action_instance_t<ql::coro_stream_t::maybe_launch_read()::{lambda()#2}>::run_action() at ??:?
19: coro_t::run() at ??:?
2015-08-19T23:56:52.318167726 26.722251s error: Exiting.

I'm just testing making one changefeed using a getAll query with an index (over a few hundred keys).
I'm using this binary: rethinkdb_2.1.1+1+g989a97-0vivid_amd64.deb, which was emailed to me. The above crash isn't at all random. It happens every time I try to do this query (I realize this may mean little without more info about the schema):

table('file_use').getAll(ids..., index:'project_id').pluck('project_id', 'path', 'users', 'last_edited').changes(includeStates: false).

My plan at this point is to rewrite my application to use only the simplest features of changefeeds, namely listening for all changes to a table (which works without crashing), and basically re-implement the same idea as changefeeds (but in node.js) in a way that is rich enough to support my application. Even when things don't crash, the performance of doing certain more complicated queries like above is such that I really have to just move more to be in memory in my web server, unfortunately. (Obviously, it's possible I'm just naive, though I've spent a lot of time now trying a lot of different approaches and benchmarks systematically...)

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 20, 2015

Member

@williamstein Are you setting readMode: 'outdated' anywhere? (obviously this still shouldn't crash).

Member

danielmewes commented Aug 20, 2015

@williamstein Are you setting readMode: 'outdated' anywhere? (obviously this still shouldn't crash).

@Tryneus

This comment has been minimized.

Show comment
Hide comment
@Tryneus

Tryneus Aug 20, 2015

Member

@danielmewes - I can reproduce the crash by adding {readMode: 'outdated'} to the table term. Here is a backtrace including line numbers:

Version: rethinkdb 2.1.0-0BETA2-161-g6b2755 (CLANG 3.2 (tags/RELEASE_32/final))
error: Error in ../src/clustering/query_routing/table_query_client.cc at line 89:
error: Guarantee failed: [!r.route_to_primary()]
error: Backtrace:
error: Wed Aug 19 17:56:39 2015

       1: backtrace_t::backtrace_t() at backtrace.cc:203
       2: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at basic_string.h:269
       3: format_backtrace(bool) at backtrace.cc:197
       4: report_fatal_error(char const*, int, char const*, ...) at basic_string.h:287
       5: table_query_client_t::read(read_t const&, read_response_t*, order_token_t, signal_t*) at table_query_client.cc:89
       6: real_table_t::read_with_profile(ql::env_t*, read_t const&, read_response_t*) at real_table.cc:332
       7: ql::rget_response_reader_t::do_read(ql::env_t*, read_t const&) at datum_stream.cc:165
       8: ql::rget_reader_t::do_range_read(ql::env_t*, read_t const&) at datum_stream.cc:202
       9: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:256
       10: ql::rget_response_reader_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:69
       11: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:979
       12: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:850
       13: ql::changefeed::splice_stream_t::next_stream_batch(ql::env_t*, ql::batchspec_t const&) at changefeed.cc:2435
       14: ql::changefeed::stream_t<ql::changefeed::range_sub_t>::next_raw_batch(ql::env_t*, ql::batchspec_t const&) at changefeed.cc:1180
       15: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_less_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*) at stl_vector.h:571
       16: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:932
       17: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:850
       18: ql::coro_stream_t::cb(auto_drainer_t::lock_t) at stl_vector.h:571
       19: callable_action_instance_t<ql::coro_stream_t::maybe_launch_read()::{lambda()#2}>::run_action() at datum_stream.cc:1251
       20: coro_t::run() at coroutines.cc:214
Member

Tryneus commented Aug 20, 2015

@danielmewes - I can reproduce the crash by adding {readMode: 'outdated'} to the table term. Here is a backtrace including line numbers:

Version: rethinkdb 2.1.0-0BETA2-161-g6b2755 (CLANG 3.2 (tags/RELEASE_32/final))
error: Error in ../src/clustering/query_routing/table_query_client.cc at line 89:
error: Guarantee failed: [!r.route_to_primary()]
error: Backtrace:
error: Wed Aug 19 17:56:39 2015

       1: backtrace_t::backtrace_t() at backtrace.cc:203
       2: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at basic_string.h:269
       3: format_backtrace(bool) at backtrace.cc:197
       4: report_fatal_error(char const*, int, char const*, ...) at basic_string.h:287
       5: table_query_client_t::read(read_t const&, read_response_t*, order_token_t, signal_t*) at table_query_client.cc:89
       6: real_table_t::read_with_profile(ql::env_t*, read_t const&, read_response_t*) at real_table.cc:332
       7: ql::rget_response_reader_t::do_read(ql::env_t*, read_t const&) at datum_stream.cc:165
       8: ql::rget_reader_t::do_range_read(ql::env_t*, read_t const&) at datum_stream.cc:202
       9: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:256
       10: ql::rget_response_reader_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:69
       11: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:979
       12: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:850
       13: ql::changefeed::splice_stream_t::next_stream_batch(ql::env_t*, ql::batchspec_t const&) at changefeed.cc:2435
       14: ql::changefeed::stream_t<ql::changefeed::range_sub_t>::next_raw_batch(ql::env_t*, ql::batchspec_t const&) at changefeed.cc:1180
       15: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_less_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*) at stl_vector.h:571
       16: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:932
       17: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:850
       18: ql::coro_stream_t::cb(auto_drainer_t::lock_t) at stl_vector.h:571
       19: callable_action_instance_t<ql::coro_stream_t::maybe_launch_read()::{lambda()#2}>::run_action() at datum_stream.cc:1251
       20: coro_t::run() at coroutines.cc:214
@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 20, 2015

Member

Ok I can reproduce that crash by running these commands:

r.tableCreate('t1')
r.table('t1', {readMode: "outdated"}).getAll("").changes()

This seems to be another side-effect of our accidentally included change to getAll(). This works fine for me:

r.tableCreate('t1')
r.table('t1', {readMode: "outdated"}).getAll("").changes({includeInitialVals: false})
Member

danielmewes commented Aug 20, 2015

Ok I can reproduce that crash by running these commands:

r.tableCreate('t1')
r.table('t1', {readMode: "outdated"}).getAll("").changes()

This seems to be another side-effect of our accidentally included change to getAll(). This works fine for me:

r.tableCreate('t1')
r.table('t1', {readMode: "outdated"}).getAll("").changes({includeInitialVals: false})
@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 20, 2015

Yes, I was testing using readMode:'outdated'. I just tried turning that off and the crash goes away. I turned it back on and got the crash. I've also added includeInitialVals: false to my run wrapper function so I won't forget it again.

Yes, I was testing using readMode:'outdated'. I just tried turning that off and the crash goes away. I turned it back on and got the crash. I've also added includeInitialVals: false to my run wrapper function so I won't forget it again.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Aug 20, 2015

I'm happy to report that I'm now finally successfully running a 5-node n1-standard-1 (with 50GB SSD/each) RethinkDB 2.1.1 cluster in production backing https://cloud.sagemath.com. Thanks for all the help!

I did get this backtrack/crash when sharding/replicating from 1 to 5 nodes as it was finishing the last table:

2015-08-20T05:03:49.608418117 873.940379s error: Error in src/arch/runtime/thread_pool.cc at line 354:
2015-08-20T05:03:49.608481873 873.940442s error: Segmentation fault from reading the address (nil).
2015-08-20T05:03:49.608493462 873.940453s error: Backtrace:
2015-08-20T05:03:50.015825973 874.347788s error: Thu Aug 20 05:03:49 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_err
or(char const*, int, char const*, ...) at ??:?\n4: linux_thread_pool_t::fatal_signal_handler(int, siginfo_t*, void*) at ??:?\n5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x
10d10) [0x7f9d90d16d10] at 0x7f9d90d16d10 (/lib/x86_64-linux-gnu/libpthread.so.0)\n6: std::_Rb_tree_increment(std::_Rb_tree_node_base*) at 0x7f9d914b8223 (/usr/lib/x86_64
-linux-gnu/libstdc++.so.6)\n7: rdb_update_single_sindex(store_t*, store_t::sindex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned l
ong*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned
 long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*) at ??:?\n
8: callable_action_instance_t<std::_Bind<void (*(store_t*, store_t::sindex_access_t*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_dr
ainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*
, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*))(store_t*, store_t::si
ndex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boo
st::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned lo
ng> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*)> >::run_action() at ??:?\n9: coro_t::run() at ??:?

I just restarted rethinkdb immediately after this and it worked.

I am fully using nontrivial changefeed queries with rethinkdb still, thanks to the hints above.

I'm happy to report that I'm now finally successfully running a 5-node n1-standard-1 (with 50GB SSD/each) RethinkDB 2.1.1 cluster in production backing https://cloud.sagemath.com. Thanks for all the help!

I did get this backtrack/crash when sharding/replicating from 1 to 5 nodes as it was finishing the last table:

2015-08-20T05:03:49.608418117 873.940379s error: Error in src/arch/runtime/thread_pool.cc at line 354:
2015-08-20T05:03:49.608481873 873.940442s error: Segmentation fault from reading the address (nil).
2015-08-20T05:03:49.608493462 873.940453s error: Backtrace:
2015-08-20T05:03:50.015825973 874.347788s error: Thu Aug 20 05:03:49 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_err
or(char const*, int, char const*, ...) at ??:?\n4: linux_thread_pool_t::fatal_signal_handler(int, siginfo_t*, void*) at ??:?\n5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x
10d10) [0x7f9d90d16d10] at 0x7f9d90d16d10 (/lib/x86_64-linux-gnu/libpthread.so.0)\n6: std::_Rb_tree_increment(std::_Rb_tree_node_base*) at 0x7f9d914b8223 (/usr/lib/x86_64
-linux-gnu/libstdc++.so.6)\n7: rdb_update_single_sindex(store_t*, store_t::sindex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned l
ong*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned
 long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*) at ??:?\n
8: callable_action_instance_t<std::_Bind<void (*(store_t*, store_t::sindex_access_t*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_dr
ainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*
, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*))(store_t*, store_t::si
ndex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boo
st::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned lo
ng> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*)> >::run_action() at ??:?\n9: coro_t::run() at ??:?

I just restarted rethinkdb immediately after this and it worked.

I am fully using nontrivial changefeed queries with rethinkdb still, thanks to the hints above.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 20, 2015

Member

@williamstein I'm really really sorry for the huge amount of trouble you had to go through until you got it working. 2.1.0 ended up being much less stable than we had anticipated after our internal testing. The good news is that your feedback helped us a lot in fixing various issues, so we'll end up with a much better software once we ship the next point release.

Member

danielmewes commented Aug 20, 2015

@williamstein I'm really really sorry for the huge amount of trouble you had to go through until you got it working. 2.1.0 ended up being much less stable than we had anticipated after our internal testing. The good news is that your feedback helped us a lot in fixing various issues, so we'll end up with a much better software once we ship the next point release.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 20, 2015

Member

There's a race condition in combination with changefeeds that's consistent with the observed segmentation fault crash. I'm currently testing a fix for that. The race condition can only happen while a table is being reconfigured, so this should be rare in normal operation.

An open question is whether this is the same problem as the seg faults in #4675. One report there was describing the crash as happening while deleting a database with tables in it, which would be consistent with this, but the other report doesn't mention any reconfiguration.

Member

danielmewes commented Aug 20, 2015

There's a race condition in combination with changefeeds that's consistent with the observed segmentation fault crash. I'm currently testing a fix for that. The race condition can only happen while a table is being reconfigured, so this should be rare in normal operation.

An open question is whether this is the same problem as the seg faults in #4675. One report there was describing the crash as happening while deleting a database with tables in it, which would be consistent with this, but the other report doesn't mention any reconfiguration.

@danielmewes danielmewes removed this from the 2.1.x milestone Aug 21, 2015

@danielmewes danielmewes modified the milestones: 2.1.2, 2.1.x Aug 21, 2015

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Aug 25, 2015

Member

Fixes for that last segmentation fault are in v2.1.x 6391bb8 and next 294f9e4.

Member

danielmewes commented Aug 25, 2015

Fixes for that last segmentation fault are in v2.1.x 6391bb8 and next 294f9e4.

@williamstein

This comment has been minimized.

Show comment
Hide comment
@williamstein

williamstein Nov 15, 2015

I upgraded to RethinkDB 2.2, and I'm getting this constantly:

2015-11-15T15:52:55.896809798 5334.337322s error: Error in src/rdb_protocol/datum_stream.cc at line 855:
2015-11-15T15:52:55.896843712 5334.337355s error: Guarantee failed: [found_hash_pair]
2015-11-15T15:52:55.896851427 5334.337363s error: Backtrace:
2015-11-15T15:52:56.101595162 5334.542109s error: Sun Nov 15 15:52:55 2015\n\n1 [0xa99990]: backtrace_t::backtrace_t() at ??:?\n2 [0xa99d5c]: format_backtrace(bool) at ??:?\n3 [0xd12ea3]: report_fatal_error(char const*, int, char c
onst*, ...) at ??:?\n4 [0x7ef4a9]: ql::primary_readgen_t::restrict_active_ranges(sorting_t, ql::active_ranges_t*) const at ??:?\n5 [0x7f5ba6]: ql::rget_response_reader_t::unshard(sorting_t, rget_read_response_t&&) at ??:?\n6 [0x7f8
dd2]: ql::rget_reader_t::do_range_read(ql::env_t*, read_t const&) at ??:?\n7 [0x7f932a]: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at ??:?\n8 [0x7e9d2a]: ql::rget_response_reader_t::next_batch(ql::env_t*, ql
::batchspec_t const&) at ??:?\n9 [0x7e7345]: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at ??:?\n10 [0x7e8907]: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?\n11 [0x889
239]: ql::replace_term_t::eval_impl(ql::scope_env_t*, ql::args_t*, ql::eval_flags_t) const at ??:?\n12 [0x7ce492]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n13 [0x8f07aa]: ql::runtime_term_t::eval_
on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n14 [0x8f0adc]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n15 [0x6f6cb9]: ql::query_cache_t::ref_t::run(ql::env_t*, ql::response_t*
) at ??:?\n16 [0x6f8676]: ql::query_cache_t::ref_t::fill_response(ql::response_t*) at ??:?\n17 [0x7c8d5b]: rdb_query_server_t::run_query(ql::query_params_t*, ql::response_t*, signal_t*) at ??:?\n18 [0xa963ba]: void save_exception<v
oid query_server_t::connection_loop<json_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const::{lambda()#1}>(std::__exception_ptr::exception_ptr*, std::string*, cond_t*, voi
d query_server_t::connection_loop<json_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const::{lambda()#1}&&) at ??:?\n19 [0xa96a2d]: void query_server_t::connection_loop<jso
n_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const at ??:?\n20 [0x9a4f68]: coro_t::run() at ??:?
2015-11-15T15:52:56.101699892 5334.542211s error: Exiting.
2015-11-15T15:54:01.444564463 0.232422s notice: Running rethinkdb 2.2.0~0vivid (GCC 4.9.2)...
2015-11-15T15:54:01.448363754 0.236220s notice: Running on Linux 3.19.0-31-generic x86_64
2015-11-15T15:54:01.448399911 0.236255s notice: Loading data from directory /var/lib/rethinkdb/default/data
2015-11-15T15:54:01.492573965 0.280430s info: Cache size is set to 9000 MB

I upgraded to RethinkDB 2.2, and I'm getting this constantly:

2015-11-15T15:52:55.896809798 5334.337322s error: Error in src/rdb_protocol/datum_stream.cc at line 855:
2015-11-15T15:52:55.896843712 5334.337355s error: Guarantee failed: [found_hash_pair]
2015-11-15T15:52:55.896851427 5334.337363s error: Backtrace:
2015-11-15T15:52:56.101595162 5334.542109s error: Sun Nov 15 15:52:55 2015\n\n1 [0xa99990]: backtrace_t::backtrace_t() at ??:?\n2 [0xa99d5c]: format_backtrace(bool) at ??:?\n3 [0xd12ea3]: report_fatal_error(char const*, int, char c
onst*, ...) at ??:?\n4 [0x7ef4a9]: ql::primary_readgen_t::restrict_active_ranges(sorting_t, ql::active_ranges_t*) const at ??:?\n5 [0x7f5ba6]: ql::rget_response_reader_t::unshard(sorting_t, rget_read_response_t&&) at ??:?\n6 [0x7f8
dd2]: ql::rget_reader_t::do_range_read(ql::env_t*, read_t const&) at ??:?\n7 [0x7f932a]: ql::rget_reader_t::load_items(ql::env_t*, ql::batchspec_t const&) at ??:?\n8 [0x7e9d2a]: ql::rget_response_reader_t::next_batch(ql::env_t*, ql
::batchspec_t const&) at ??:?\n9 [0x7e7345]: ql::lazy_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at ??:?\n10 [0x7e8907]: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at ??:?\n11 [0x889
239]: ql::replace_term_t::eval_impl(ql::scope_env_t*, ql::args_t*, ql::eval_flags_t) const at ??:?\n12 [0x7ce492]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n13 [0x8f07aa]: ql::runtime_term_t::eval_
on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n14 [0x8f0adc]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n15 [0x6f6cb9]: ql::query_cache_t::ref_t::run(ql::env_t*, ql::response_t*
) at ??:?\n16 [0x6f8676]: ql::query_cache_t::ref_t::fill_response(ql::response_t*) at ??:?\n17 [0x7c8d5b]: rdb_query_server_t::run_query(ql::query_params_t*, ql::response_t*, signal_t*) at ??:?\n18 [0xa963ba]: void save_exception<v
oid query_server_t::connection_loop<json_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const::{lambda()#1}>(std::__exception_ptr::exception_ptr*, std::string*, cond_t*, voi
d query_server_t::connection_loop<json_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const::{lambda()#1}&&) at ??:?\n19 [0xa96a2d]: void query_server_t::connection_loop<jso
n_protocol_t>(linux_tcp_conn_t*, unsigned long, ql::query_cache_t*, signal_t*)::{lambda()#1}::operator()() const at ??:?\n20 [0x9a4f68]: coro_t::run() at ??:?
2015-11-15T15:52:56.101699892 5334.542211s error: Exiting.
2015-11-15T15:54:01.444564463 0.232422s notice: Running rethinkdb 2.2.0~0vivid (GCC 4.9.2)...
2015-11-15T15:54:01.448363754 0.236220s notice: Running on Linux 3.19.0-31-generic x86_64
2015-11-15T15:54:01.448399911 0.236255s notice: Loading data from directory /var/lib/rethinkdb/default/data
2015-11-15T15:54:01.492573965 0.280430s info: Cache size is set to 9000 MB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment