New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR. #2410
Comments
Hi @janih, thank you for the issue report. Do you think you could send us the RethinkDB data directory with the table files for further debugging on our end? If you had important data in the tables, we can also try to recover some of it, though it might not be completely possible. Another question: Were you running queries using the |
The data directory size is ~375 MB after archiving. Can you provide an upload page? I haven't used r.literal(), at least not explicitly(I'm using the Java driver). I ran some automated tests against the database and captured the protobuf but didn't notice any use of LITERAL keywords. |
@mglukhovsky Can you arrange an upload option for @janih ? |
@janih, I'm happy to get that set up for you. Could you email me at mike@rethinkdb.com, so I can prepare a private upload page? |
@danielmewes, the data directory is available on our internal server. Thanks @janih for helping us track this down! |
@janih I've recovered the faulty table Do you have a chance to run a memory tester on your machine @janih (e.g. Memtest86+ http://www.memtest.org/ )? We will run more tests on our side, because this is obviously a very serious issue. Still I would like to make sure that it wasn't caused by faulty hardware in the first place. |
Great work, thanks! I would like a copy of the data. I'll run Memtest86+ on the machine, it has 32 GB of ram in four sticks so maybe there is a faulty stick or something. |
@danielmewes I ran Memtest86+ a couple of times, it didn't find any errors. |
@janih Thank you for running Memtest. That's very helpful. So we have to assume that this was indeed caused by a bug in RethinkDB.
I'll send you an email with information on how to download the recovered data file. |
@danielmewes I'm inserting documents one at a time to the
Here is a couple of other unusual things that I did before this data corruption happened:
I hope this is of any help :) |
@janih Thank you for all the information. It's definitely very helpful! |
@danielmewes The same error occurred with 1.2.5 version:
I was doing the same kind of insert/update queries as I described earlier: #2410 (comment) |
@janih -- did the second error occur on the same hardware as the first? (I'd like to know for scientific reasons, we're not blaming the bug on the hardware or anything, we're working hard to get to the bottom of it). |
@coffeemug Yes, same hardware. |
Thanks for the update @janih. I've had a test case running 24 hours for the past week, with the same RethinkDB binary you are using (and same data), but the bug hasn't shown up yet. @janih: Which operating system are you using (Ubuntu 14.04?)? Thank you so much for sticking with us in debugging this issue @janih! (Also: Are you on our IRC channel by any chance? (#rethinkdb on Freenode). My name there is "danielmewes". Just in case I have additional questions later...) |
@danielmewes I made mostly the same things again. But I think I didn't import the complete table this time.
Here is the startup log:
I'm not at the IRC chanel, but I'll see if I can join later. |
@danielmewes I installed RethinkDB on another older machine, restored a dump from a few days back and when the index for
I just executed this: |
@janih First of all thanks for the additional info. The crash on the older machine is interesting. Did you restore the data dump using rethinkdb import? @mglukhovsky Can you set up another upload page/server for @janih please? |
@danielmewes I used |
Thank you @janih for the table dump. |
Hmm I could get a crash once, but wasn't able to repeat it since then. @janih: If I sent you a modified rethinkdb binary, could you install that and use it instead to see if it works better for you? |
@danielmewes Yes, I can try the binary. I'm also on #rethinkdb at Freenode, my nick is janih if any additional questions. |
Sorry, the one crash I got with We should really fix #1945 . |
@danielmewes With the new binary, I got this after running my app for a while:
|
Thanks to @janih 's unit test https://github.com/janih/rethinkdb-junit, I can now reliably reproduce the following two ways of crashing.
or with
These are reproducible on v1.12.x, but no longer on v1.13.0, and @janih also mentioned that he hadn't seen crashes on 1.13 anymore. We still have to find out what causes this. I'm currently bisecting between 1.12 and 1.13 to find the change that made this disappear. |
After further testing, I have to take back the observation that this works on 1.13. I also got the crash on 1.13. |
There is a problem that a disk read reads data before it has been written. It's still unclear how that happens in detail. (edited to remove unreliable work-arounds.) |
@danielmewes I also got the crash on 1.13. |
It turns out that on my machine (kernel 3.13.0) ReiserFS, when mounted with the I've written a small test program to verify that the file system behaves correctly under direct I/O.
@janih Can you run that program to see if your server is affected by such a file system / i/o issue? If you don't have a C++ compiler installed, I can also send you a binary. |
@danielmewes I was able to compile and run the program. I ran it on my server and two other machines and it didn't output "Data mismatch" on any of them(all have ext4 file systems). I'll try the |
I still had no luck in reproducing on any file system other than ReiserFS with data journalling. Also a late thank you @janih for running the test program. The fact that it passed in your case indicates that there is a different problem than the one that exists on ReiserFS+journalling. How is running with |
I've been using |
There's some hope finally: Issue #2840 showed up a corruption issue that was easy to reproduce and for which there is now a fix. See #2840 (comment) onwards. |
Good to hear because I've not been able to reproduce the issue anymore with 1.13.1 |
Thanks everyone for your patience, and especially @janih for helping a lot with debugging! |
I have one node and was inserting data. I don't think it was anything that I havent done many times before. Then I got this:
Runnin on:
Lines on dmesg(after a few retries and errors):
I was going to try to dump the database and recreate it but I get a deserialization error while running rethindb dump:
The text was updated successfully, but these errors were encountered: