New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash on ubuntu 12.04 #2399
Comments
Here's a cleaned up backtrace, hope it's useful:
|
Thank you for the report @colprog. Sorry you are experiencing this problem. The error message is typical for problems related to snapshotting in our cache, and we have seen and fixed a couple of issues before that resulted in similar crashes. @colprog Can you tell us a bit more about your cluster configuration, data and workload?
I know it takes some time to answer all of these questions, and I really appreciate your help. I'm asking them because I would like to replicate your setup and workload as closely as possible in order to reproduce the bug. |
we have 1 node with 22 tables on them, no sharding or replication. running following query would crash the server: |
@colprog Thank you for the info. The db file would be helpful. If you (g)zip the data directory, it should compress fairly well. You can try sending it to me in an email to daniel@rethinkdb.com. If that doesn't work we can provide you with an upload site. |
you can try this link: |
Thank you @colprog for the data. I downloaded it and can start a server with it. (Internal note: the data is on newton in /home/daniel/2399-data.tar.bz2 ) |
Great! if you need anything else from my part. please let me know. |
I can reproduce the crash with the query you gave me @colprog . Your information and data has been extremely helpful. |
The table file is already corrupted, and it is difficult or even impossible to trace how it got to that state from looking at the crash and the file. My best bet right now is to a) read all the involved code again and b) try to replicate as much of the table configuration as possible and leave a server running over a long time with additional checks enabled, hoping that that will trigger the issue. I don't know if that is actually relevant, but had you run previous versions of RethinkDB (e.g. 1.12.3) on the same data directory @colprog? Or did you start from scratch after installing 1.12.4? |
@danielmewes the server has been upgraded from previous release since 1.11.0. but I started a new set of tables yesterday, and now it is crashing as well. |
@danielmewes and it seems to be worse. a simple r.table("role") would crash the server |
on a newly created and healthy table, the following query sequence seems to trigger the bug and corrupt the table: r.db('sheath_dev').table("role").insert({"bar":{},"cleared":{},"coins":120,"contribs":0,"createTime":{"$reql_type$":"TIME","epoch_time":1400221967.619,"timezone":"+00:00"},"dailyRefreshData":{"dailyReward":true,"qhourlyReward":1},"energy":5,"energyRefreshTime":{"$reql_type$":"TIME","epoch_time":1400221967.619,"timezone":"+00:00"},"exp":0,"formation":0,"formationLevel":[0,0,0,0,0,0],"golds":20,"id":"66583953-4553-4759-8918-18e3960452dc","lastLogOff":{"$reql_type$":"TIME","epoch_time":1400221968.917,"timezone":"+00:00"},"level":1,"levelCleared":{},"levelGain":{},"name":"新玩家","owner":"8f240ce6-8c3e-4cf2-a997-4736816fb8c0","partition":"f79279b1-3b29-4e08-83d9-bdec419355ce","souls":{},"spent":0,"storageRoom":25,"store":{},"taskClaimed":[],"taskData":{},"taskDone":[],"team":["a470caaf-983c-4516-a470-30c1fd2587d0","49b90dbd-e135-450b-bbc5-f311ec3925db",null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],"title":"","tutorial":1,"vip":0})
r.db('sheath_dev').table("role").update({update: {"dailyRefreshData": r.literal({})}}) |
@colprog Thank you for figuring out the exact query that causes this to happen. That is incredibly helpful! |
Here is the simplest sequence of commands that I could find to trigger the corruption:
Replacing the update by |
Valgrind reports a jump depending on uninitialized data. Could be related:
|
/cc @mlucy |
( |
Initializing |
This crashes the server as well, without having to read any data back from the table:
The error and backtrace don't look very helpful:
It doesn't seem necessary to serialize the updated value into the table to get bad behavior. |
@mlucy: This is the datum that's actually generated by the
As you can see the LITERAL pseudotype becomes part of the document. Any idea how the literal keyword can end up in the result of an |
Oh I see it never recurses into "a" because the field "a" does not exist on the left side of the merge. Hmm... Besides the problem with |
A fix is implemented in branch daniel_2399. In code review 1509 by @mlucy |
How is the review going? |
@colprog The fix for this issue has been released today as part of RethinkDB 1.12.5. |
Great, thanks! I'll upgrade right away |
running rethinkdb 1.12.4 for two weeks without a problem ,but today it just crashed.
2014-05-01T01:25:32.395479491 0.024969s info: Running rethinkdb 1.12.4-0ubuntu1~precise (GCC 4.6.3)...
2014-05-01T01:25:32.397210840 0.026700s info: Running on Linux 3.2.0-29-generic x86_64
2014-05-01T01:25:32.397258453 0.026748s info: Using cache size of 593 MB
2014-05-01T01:25:32.397418814 0.026909s info: Loading data from directory /data/rethinkdb/db0
2014-05-01T01:25:32.564420794 0.193910s info: Listening for intracluster connections on port 29015
2014-05-01T01:25:32.588540351 0.218030s info: Listening for client driver connections on port 28015
2014-05-01T01:25:32.588758499 0.218248s info: Listening for administrative HTTP connections on port 8080
2014-05-01T01:25:32.588759897 0.218249s info: Listening on addresses: 127.0.0.1, 10.161.142.205
2014-05-01T01:25:32.588760444 0.218250s info: To fully expose RethinkDB on the network, bind to all addresses
2014-05-01T01:25:32.588761001 0.218250s info: by adding
bind=all' to the config file (/etc/rethinkdb/instances.d/db0.conf). 2014-05-01T01:25:32.588761535 0.218251s info: Server ready 2014-05-15T11:12:29.635045878 1244817.264535s error: Error in src/rdb_protocol/lazy_json.cc at line 19: 2014-05-15T11:12:29.635112958 1244817.264602s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR. 2014-05-15T11:12:29.635135037 1244817.264624s error: Backtrace: 2014-05-15T11:12:29.819389752 1244817.448880s error: Thu May 15 11:12:29 2014\n\n1: format_backtrace(bool) at 0xd1cb43 (/usr/bin/rethinkdb)\n2: report_fatal_error(char const*, int, char const*, ...) at 0xdd4794 (/usr/bin/rethinkdb)\n3: get_data(rdb_value_t const*, buf_parent_t) at 0xd101a8 (/usr/bin/rethinkdb)\n4: lazy_json_t::get() const at 0xd10304 (/usr/bin/rethinkdb)\n5: rget_cb_t::handle_pair(scoped_key_value_t&&, concurrent_traversal_fifo_enforcer_signal_t) at 0xc95f89 (/usr/bin/rethinkdb)\n6: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at 0xb38ef7 (/usr/bin/rethinkdb)\n7: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (concurrent_traversal_adapter_t::*)(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> (concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at 0xb38dd8 (/usr/bin/rethinkdb)\n8: coro_t::run() at 0xdc751e (/usr/bin/rethinkdb) 2014-05-15T11:12:29.819499407 1244817.448989s error: Exiting. 2014-05-15T11:18:02.048885620 0.028158s info: Running rethinkdb 1.12.4-0ubuntu1~precise (GCC 4.6.3)... 2014-05-15T11:18:02.050985612 0.030258s info: Running on Linux 3.2.0-29-generic x86_64 2014-05-15T11:18:02.051027521 0.030300s info: Using cache size of 602 MB 2014-05-15T11:18:02.051150303 0.030423s info: Loading data from directory /data/rethinkdb/db0 2014-05-15T11:18:02.237516228 0.216788s info: Listening for intracluster connections on port 29015 2014-05-15T11:18:02.269815418 0.249088s info: Listening for client driver connections on port 28015 2014-05-15T11:18:02.270194158 0.249466s info: Listening for administrative HTTP connections on port 8080 2014-05-15T11:18:02.270195561 0.249468s info: Listening on addresses: 127.0.0.1, 10.161.142.205 2014-05-15T11:18:02.270196172 0.249468s info: To fully expose RethinkDB on the network, bind to all addresses 2014-05-15T11:18:02.270196707 0.249469s info: by adding
bind=all' to the config file (/etc/rethinkdb/instances.d/db0.conf).2014-05-15T11:18:02.270200599 0.249473s info: Server ready
2014-05-15T11:18:33.728954227 31.708227s error: Error in src/rdb_protocol/lazy_json.cc at line 19:
2014-05-15T11:18:33.729024758 31.708297s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
2014-05-15T11:18:33.729044585 31.708317s error: Backtrace:
2014-05-15T11:18:33.744542726 31.723815s error: Error in src/rdb_protocol/lazy_json.cc at line 19:
2014-05-15T11:18:33.744636516 31.723909s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
2014-05-15T11:18:33.744659853 31.723932s error: Backtrace:
2014-05-15T11:18:33.892180415 31.871454s error: Thu May 15 11:18:33 2014\n\n1: format_backtrace(bool) at 0xd1cb43 (/usr/bin/rethinkdb)\n2: report_fatal_error(char const_, int, char const_, ...) at 0xdd4794 (/usr/bin/rethinkdb)\n3: get_data(rdb_value_t const_, buf_parent_t) at 0xd101a8 (/usr/bin/rethinkdb)\n4: lazy_json_t::get() const at 0xd10304 (/usr/bin/rethinkdb)\n5: rget_cb_t::handle_pair(scoped_key_value_t&&, concurrent_traversal_fifo_enforcer_signal_t) at 0xc95f89 (/usr/bin/rethinkdb)\n6: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t_, semaphore_acq_t_, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at 0xb38ef7 (/usr/bin/rethinkdb)\n7: callable_action_instance_t<std::Bind<std::Mem_fn<void (concurrent_traversal_adapter_t::)(scoped_key_value_t, semaphore_acq_t_, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> (concurrent_traversal_adapter_t_, scoped_key_value_t_, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at 0xb38dd8 (/usr/bin/rethinkdb)\n8: coro_t::run() at 0xdc751e (/usr/bin/rethinkdb)
2014-05-15T11:18:33.892286047 31.871558s error: Exiting.
The text was updated successfully, but these errors were encountered: