Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "You forgot to include one of the files/devices" error #1535

Closed
coffeemug opened this issue Oct 11, 2013 · 16 comments
Closed

Improve "You forgot to include one of the files/devices" error #1535

coffeemug opened this issue Oct 11, 2013 · 16 comments
Assignees
Milestone

Comments

@coffeemug
Copy link
Contributor

A few times people got the error message You forgot to include one of the files/devices that you included when the database was first created. Unfortunately it's really confusing, as it gives no additional information.

We should change it to print the names of the missing files it's expecting.

@danielmewes
Copy link
Member

This error does not seem to have anything to do with missing files anymore. At least I can delete any of the files in rethinkdb_data and I do not see this error message. Maybe it's caused by corrupted metadata?

This of course makes it even more confusing and only adds to the need to change the error message once we know what this error really means nowadays.

@ghost ghost assigned srh Oct 12, 2013
@coffeemug
Copy link
Contributor Author

Cool, we'll talk to @srh about it when he gets back. If anyone knows what this means -- it's him.

@danielmewes
Copy link
Member

@mlucy seems to be investigating this as part of #1534 (comment)

@mlucy
Copy link
Member

mlucy commented Oct 15, 2013

I'm looking into this a little more, but my current working hypothesis is that you get this error message when you load a 32-bit rethinkdb_data directory with 64-bit rethinkdb.

@mlucy
Copy link
Member

mlucy commented Oct 16, 2013

I can confirm that once cause of this issue is loading a 32-bit rethinkdb_data directory with 64-bit rethinkdb. We should fix this.

@coffeemug
Copy link
Contributor Author

@mlucy -- would you mind opening a separate issue for that? (It seems different from fixing the "forgot to include files" error).

@mlucy
Copy link
Member

mlucy commented Oct 18, 2013

@coffeemug -- I don't think you actually get this error if you forget to include a file. I think this is a general "fix this error message to say what's actually going wrong" issue.

@coffeemug
Copy link
Contributor Author

I see, makes sense. So from what I understand there are three questions/issues here:

  • Fix the 32/64 bit issue
  • Once that's fixed, figure out in what circumstances the message is actually printed (there were some questions about that)
  • Improve the error message itself

@danielmewes
Copy link
Member

As far as I understand it will never be printed unless the DB is corrupted (or we have other bugs like the 32bit one). Maybe it should just be made a guarantee?

@ghost ghost assigned Tryneus Oct 23, 2013
@coffeemug
Copy link
Contributor Author

Assigning to @Tryneus.

@Tryneus
Copy link
Member

Tryneus commented Nov 1, 2013

So, as near as I can tell, this is a combination of two problems. First of all, the multiplexer_config_block_t type is neither __attribute__((packed)) nor padded. Secondly, its creation_timestamp_t member is a typedef of time_t, which has a platform-dependent size.

The first is a simple enough fix. As for time_t, we could convert all existing time_t code to use a statically-sized type, or add padding in a wrapper struct, any suggestions?

@mlucy
Copy link
Member

mlucy commented Nov 1, 2013

Using a platform-independent size for time_t sounds best to me.

@Tryneus
Copy link
Member

Tryneus commented Nov 1, 2013

Ok, after fixing those, ran across another crash, so there's more to do.

error: Error in ../src/buffer_cache/semantic_checking.tcc at line 146:
error: Assertion failed: [block_id == get_block_id()] 
error: Backtrace:
error: Sat Oct 19 03:36:07 2013

       1: rethinkdb_backtrace(void**, int) at thread_stack_pcs.cc:151
       2: lazy_backtrace_t::lazy_backtrace_t() at backtrace.cc:250
       3: format_backtrace(bool) at backtrace.cc:197
       4: report_fatal_error(char const*, int, char const*, ...) at errors.cc:68
       5: scc_buf_lock_t<mc_cache_t>::scc_buf_lock_t(scc_transaction_t<mc_cache_t>*, unsigned long long, access_t, buffer_cache_order_mode_t, lock_in_line_callback_t*) at semantic_checking.tcc:146
       6: btree_store_t<rdb_protocol_t>::acquire_sindex_block_for_read(read_token_pair_t*, scc_transaction_t<mc_cache_t>*, scoped_ptr_t<scc_buf_lock_t<mc_cache_t> >*, unsigned long long, signal_t*) at btree_store.cc:324
       7: btree_store_t<rdb_protocol_t>::btree_store_t(serializer_t*, std::string const&, long long, bool, perfmon_collection_t*, rdb_protocol_t::context_t*, io_backender_t*, base_path_t const&) at btree_store.cc:78
       8: rdb_protocol_t::store_t::store_t(serializer_t*, std::string const&, long long, bool, perfmon_collection_t*, rdb_protocol_t::context_t*, io_backender_t*, base_path_t const&) at protocol.cc:1096
       9: void do_construct_existing_store<rdb_protocol_t>(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**) at file_based_svs_by_namespace.cc:54
       10: void boost::_bi::list6<boost::_bi::value<std::vector<threadnum_t, std::allocator<threadnum_t> > >, boost::arg<1>, boost::_bi::value<store_args_t<rdb_protocol_t> >, boost::_bi::value<serializer_multiplexer_t*>, boost::_bi::value<scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*>, boost::_bi::value<store_view_t<rdb_protocol_t>**> >::operator()<void (*)(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**), boost::_bi::list1<int const&> >(boost::_bi::type<void>, void (* const&)(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**), boost::_bi::list1<int const&>&, int) const at bind.hpp:594
       11: void boost::_bi::bind_t<void, void (*)(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**), boost::_bi::list6<boost::_bi::value<std::vector<threadnum_t, std::allocator<threadnum_t> > >, boost::arg<1>, boost::_bi::value<store_args_t<rdb_protocol_t> >, boost::_bi::value<serializer_multiplexer_t*>, boost::_bi::value<scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*>, boost::_bi::value<store_view_t<rdb_protocol_t>**> > >::operator()<int>(int const&) const at bind_template.hpp:54
       12: void pmap<boost::_bi::bind_t<void, void (*)(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**), boost::_bi::list6<boost::_bi::value<std::vector<threadnum_t, std::allocator<threadnum_t> > >, boost::arg<1>, boost::_bi::value<store_args_t<rdb_protocol_t> >, boost::_bi::value<serializer_multiplexer_t*>, boost::_bi::value<scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*>, boost::_bi::value<store_view_t<rdb_protocol_t>**> > > >(int, boost::_bi::bind_t<void, void (*)(std::vector<threadnum_t, std::allocator<threadnum_t> > const&, int, store_args_t<rdb_protocol_t>, serializer_multiplexer_t*, scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*, store_view_t<rdb_protocol_t>**), boost::_bi::list6<boost::_bi::value<std::vector<threadnum_t, std::allocator<threadnum_t> > >, boost::arg<1>, boost::_bi::value<store_args_t<rdb_protocol_t> >, boost::_bi::value<serializer_multiplexer_t*>, boost::_bi::value<scoped_array_t<scoped_ptr_t<rdb_protocol_t::store_t> >*>, boost::_bi::value<store_view_t<rdb_protocol_t>**> > > const&) at pmap.hpp:48
       13: file_based_svs_by_namespace_t<rdb_protocol_t>::get_svs(perfmon_collection_t*, uuid_u, long long, stores_lifetimer_t<rdb_protocol_t>*, scoped_ptr_t<multistore_ptr_t<rdb_protocol_t> >*, rdb_protocol_t::context_t*) at file_based_svs_by_namespace.cc:142
       14: watchable_and_reactor_t<rdb_protocol_t>::initialize_reactor(io_backender_t*) at reactor_driver.tcc:312
       15: boost::_mfi::mf1<void, watchable_and_reactor_t<rdb_protocol_t>, io_backender_t*>::operator()(watchable_and_reactor_t<rdb_protocol_t>*, io_backender_t*) const at mem_fn_template.hpp:163
       16: void boost::_bi::list2<boost::_bi::value<watchable_and_reactor_t<rdb_protocol_t>*>, boost::_bi::value<io_backender_t*> >::operator()<boost::_mfi::mf1<void, watchable_and_reactor_t<rdb_protocol_t>, io_backender_t*>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf1<void, watchable_and_reactor_t<rdb_protocol_t>, io_backender_t*>&, boost::_bi::list0&, int) at bind.hpp:307
       17: boost::_bi::bind_t<void, boost::_mfi::mf1<void, watchable_and_reactor_t<rdb_protocol_t>, io_backender_t*>, boost::_bi::list2<boost::_bi::value<watchable_and_reactor_t<rdb_protocol_t>*>, boost::_bi::value<io_backender_t*> > >::operator()() at bind_template.hpp:21
       18: callable_action_instance_t<boost::_bi::bind_t<void, boost::_mfi::mf1<void, watchable_and_reactor_t<rdb_protocol_t>, io_backender_t*>, boost::_bi::list2<boost::_bi::value<watchable_and_reactor_t<rdb_protocol_t>*>, boost::_bi::value<io_backender_t*> > > >::run_action() at callable_action.hpp:28
       19: callable_action_wrapper_t::run() at runtime_utils.cc:67
       20: coro_t::run() at coroutines.cc:178

@Tryneus
Copy link
Member

Tryneus commented Nov 4, 2013

Ok, had to add __attribute__((packed)) to two more structures: btree_superblock_t and metablock_manager_t<metablock_t>::crc_metablock_t, and I can get the database files running between 32-bit and 64-bit architectures. Going to do a little more testing, but we still can't guarantee this works unless we are sure that all data serialization is also safe, which is a little tougher to do.

@Tryneus
Copy link
Member

Tryneus commented Nov 5, 2013

Alright, fixes are up in code review 1007. I have not run across any other crashes during testing. The only thing I am slightly concerned about is that reql_time object precision is slightly different when I export the same database on 32-bit and 64-bit machines. ex:

32-bit export:

"datetime": {"timezone": "+00:00", "$reql_type$": "TIME", "epoch_time": 1383607457.7049999}

64-bit export:

"datetime": {"timezone": "+00:00", "$reql_type$": "TIME", "epoch_time": 1383607457.705}

As you can see, the 64-bit data is 'more' correct, since we are supposed to truncate to milliseconds. I have not tracked down the source of this discrepency, but it appears to happen in just 32-bit alone, so at the moment I believe it is just a problem with our reql_time implementation on the 32-bit architecture. In any case, this can be its own issue, as it is not a problem with our on-disk format's portability.

@Tryneus
Copy link
Member

Tryneus commented Nov 6, 2013

The portability changes have been approved and merged to next in commits 4bdab21, c1097e3, f005bfa, and 9a9dcfc. This will be in release 1.11.

I also investigated the minor differences I found in exported time objects, and they appear to be a result of the python implementation on the two systems, there was no difference in the data received by the client. Therefore, there will be no new issue for that.

@Tryneus Tryneus closed this as completed Nov 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants