Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during TestMaterializedViews.interrupt_build_process_with_resharding_max_to_half_test #4386

Closed
bhalevy opened this issue Apr 2, 2019 · 13 comments

Comments

Projects
None yet
7 participants
@bhalevy
Copy link
Contributor

commented Apr 2, 2019

scylla version 77a0d5c

Seen in dtest-release/75/artifact/logs-release.2/1554080746355_materialized_views_test.TestMaterializedViews.interrupt_build_process_with_resharding_max_to_half_test/node2.log:

WARN  2019-04-01 01:03:16,204 [shard 22] seastar - Exceptional future ignored: exceptions::mutation_write_timeout_exception (Operation timed out for ks.t_by_v2 - received only 0 responses from 1 CL=ONE.), backtrace:   0x4498efb
  0x402a1f4
  0x1f5c0d4
  0x1f5c503
  0x1e9fe74
  0x2f41c8b
  0x98ad41

Segmentation fault on shard 22.
Backtrace:
  0x000000000412d802
  0x000000000402b8f5
  0x000000000402bbf5
  0x000000000402bc43
  0x00007f665813c02f
  0x00000000005b4aa9
  0x0000000001ea44c3
  0x0000000001ebf8ed
  0x0000000004028d61
  0x0000000004028f5e
  0x00000000040fdb95
  0x00000000041249ab
  0x0000000003ff8b3d
  /jenkins/workspace/scylla-master/dtest-release@2/scylla-dtest/../scylla/dynamic_libs/libpthread.so.0+0x000000000000858d
  /jenkins/workspace/scylla-master/dtest-release@2/scylla-dtest/../scylla/dynamic_libs/libc.so.6+0x00000000000fd6a2
$ addr2line -Cfpi -e logs-release.2/scylla
  0x402a1f4
  0x1f5c0d4
  0x1f5c503
  0x1e9fe74
  0x2f41c8b
  0x98ad41

seastar::report_failed_future(std::__exception_ptr::exception_ptr) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:5506
service::abstract_write_response_handler::~abstract_write_response_handler() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1345
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1338
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:470
 (inlined by) service::abstract_write_response_handler::~abstract_write_response_handler() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:255
service::view_update_write_response_handler::~view_update_write_response_handler() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:459
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:474
 (inlined by) service::view_update_write_response_handler::~view_update_write_response_handler() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:474
service::storage_proxy::on_down(gms::inet_address const&) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/shared_ptr.hh:535
 (inlined by) service::storage_proxy::on_down(gms::inet_address const&) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:3618
operator() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_service.cc:3327
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/apply.hh:35
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/apply.hh:43
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1430
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1464
 (inlined by) operator() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/thread.hh:324
 (inlined by) call at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/util/noncopyable_function.hh:71
seastar::thread_context::main() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../include/seastar/util/noncopyable_function.hh:145
 (inlined by) seastar::thread_context::main() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/thread.cc:317
?? ??:0
$ addr2line -Cfpi -e logs-release.2/scylla
  0x00000000005b4aa9
  0x0000000001ea44c3
  0x0000000001ebf8ed
  0x0000000004028d61
  0x0000000004028f5e
  0x00000000040fdb95
  0x00000000041249ab
  0x0000000003ff8b3d
seastar::continuation<seastar::future<>::then_wrapped_impl<seastar::future<>::handle_exception<service::storage_proxy::send_to_live_endpoints(unsigned long, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(std::__exception_ptr::exception_ptr)#3}>(service::storage_proxy::send_to_live_endpoints(unsigned long, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(std::__exception_ptr::exception_ptr)#3}&&)::{lambda(auto:1)#1}, seastar::future<> >(seastar::future<>::handle_exception<service::storage_proxy::send_to_live_endpoints(unsigned long, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(std::__exception_ptr::exception_ptr)#3}>(service::storage_proxy::send_to_live_endpoints(unsigned long, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(std::__exception_ptr::exception_ptr)#3}&&)::{lambda(auto:1)#1})::{lambda(auto:1)#1}>::run_and_dispose() [clone .cold.5624] at /usr/include/c++/8/bits/unique_ptr.h:273
operator() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:1104
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1486
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1558
 (inlined by) parallel_for_each<__gnu_cxx::__normal_iterator<service::storage_proxy::unique_response_handler*, std::vector<service::storage_proxy::unique_response_handler> >, service::storage_proxy::mutate_begin(std::vector<service::storage_proxy::unique_response_handler>, db::consistency_level, std::optional<std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long int, std::ratio<1, 1000> > > >)::<lambda(service::storage_proxy::unique_response_handler&)> > at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future-util.hh:129
seastar::continuation<seastar::future<std::vector<service::storage_proxy::unique_response_handler, std::allocator<service::storage_proxy::unique_response_handler> > >::then_impl<service::storage_proxy::send_to_endpoint(std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, gms::inet_address, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, db::write_type, service::storage_proxy_stats::write_stats&, seastar::bool_class<service::allow_hints_tag>)::{lambda(std::vector<service::storage_proxy::unique_response_handler, std::allocator<service::storage_proxy::unique_response_handler> >)#2}, seastar::future<> >(service::storage_proxy::send_to_endpoint(std::unique_ptr<service::mutation_holder, std::default_delete<service::mutation_holder> >, gms::inet_address, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, db::write_type, service::storage_proxy_stats::write_stats&, seastar::bool_class<service::allow_hints_tag>)::{lambda(std::vector<service::storage_proxy::unique_response_handler, std::allocator<service::storage_proxy::unique_response_handler> >)#2}&&)::{lambda()#1}::operator()() const::{lambda(auto:1)#1}, std::vector<service::storage_proxy::unique_response_handler, std::allocator<service::storage_proxy::unique_response_handler> > >::run_and_dispose() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future-util.hh:180
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:1111
 (inlined by) operator() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:1463
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/apply.hh:35
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/apply.hh:43
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1476
 (inlined by) operator()<seastar::future_state<std::vector<service::storage_proxy::unique_response_handler, std::allocator<service::storage_proxy::unique_response_handler> > > > at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:1018
 (inlined by) run_and_dispose at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/include/seastar/core/future.hh:425
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:3633
seastar::reactor::run_some_tasks() [clone .part.2867] at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:4058
seastar::reactor::run() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:4041
 (inlined by) seastar::reactor::run() at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:4201
seastar::smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator()() const at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/reactor.cc:5369
seastar::posix_thread::start_routine(void*) at /usr/include/c++/8/bits/std_function.h:687
 (inlined by) seastar::posix_thread::start_routine(void*) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/seastar/build/release/../../src/core/posix.cc:52

@bhalevy bhalevy added bug dtest labels Apr 2, 2019

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

@nyh

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

I think the unhandled exception and subsequent segfault are related:

Let's start with the unhandled exception:

abstract_write_response_handler has a "promise<> _ready" member. ~abstract_write_response_handler() may set this promise to an exception, and then destory it. If nobody holds a future for that promise, the message we see was generated.
That future is returned by handler.wait(), and that is used by storage_proxy::response_wait()
This is called in just one place, in storage_proxy::mutate_begin():

    return parallel_for_each(ids, [this, cl, timeout_opt] (unique_response_handler& protected_response) {
        ...
        auto f = response_wait(response_id, timeout);
        send_to_live_endpoints(protected_response.release(), timeout); // response is now running and it will either complete or timeout
        return std::move(f);
    });

The reason why the future f is not waited is probably that send_to_live_endpoints() throws an exception. If send_to_live_endpoints() throws, f is lost, we don't wait for the send to finish.

I'm guessing (but am not sure...) that the crash was caused by this same problem: by not waiting for the future, maybe the mutation or something is freed, so some continuation in send_to_live_endpoints() started earlier now crashes.

So I think we have two bugs here:

  1. We should add exception safety in the above snippet in case send_to_live_endpoints().
  2. send_to_live_endpoints() threw, I think. I don't know why.

@gleb-cloudius You're the expert in this code. Do you have a guess what's going on? Do you agree that the lack of exception safety is a bug? Do you have an idea why send_to_live_endpoints() can throw?

Looking at the logs, I see the last the last message for shard 22 before the ignored exception was:

INFO  2019-04-01 01:03:16,069 [shard 22] query_processor - Column definitions for ks.t changed, invalidating related prepared statements

And also in a lot of other shards, I see std::out_of_range. Perhaps this was the exception that also afflicted send_to_live_endpoints() in shard 22, but in our unlucky case, it caused a crash?

I can easily send a patch for the exception safety, but would like to hear what @gleb-cloudius thinks first.
@bhalevy how reproducible is this issue?

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

@bhalevy how reproducible is this issue?

@nyh, this was the first and only time it was hit, and unfortunately the core dump was not collected (code for that is in next)

@gleb-cloudius

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

@bhalevy Why the trace for segfault is not complete?

@gleb-cloudius

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

@nyh send_to_live_endpoints applies mutations inside a lamda wrapped in futurize::apply, so it should not throw but return a exceptional future instead. The crash seams to be inside send_to_live_enpoint (although one frame points to service/storage_proxy.cc:1104 which is not even on the path for the next frame in my tree with the same hash id).

@slivne slivne added this to the 3.1 milestone Apr 7, 2019

@nyh

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

@bhalevy @psarna this is materialized-view-only bug coming from the new "ondown()" handling of materialized-views writes (to time out writes immediately when a view replica goes down):

We can see that ondown() is implicated in the unhandled exception backtrace, the relevant line is:

(inlined by) service::storage_proxy::on_down(gms::inet_address const&) at /jenkins/workspace/scylla-master/dtest-release@2/scylla/service/storage_proxy.cc:3618

This line of code is in:

    for (auto it = _view_update_handlers_list->begin(); it != _view_update_handlers_list->end(); ++it) {
        auto guard = it->shared_from_this();
        if (it->get_targets().count(endpoint) > 0) {
            it->timeout_cb();          // <-------------- HERE!
        }
        seastar::thread::yield();
    }

This timeout_cb() removes the handler from the list, and apparently - nobody is yet waiting for it yet. Let's understand why, and where the crash comes from:

Let's look at the stoage_proxy::send_to_endpoint() code:

    return mutate_prepare(std::array{std::move(m)}, cl, type,
            [this, target = std::array{target}, pending_endpoints = std::move(pending_endpoints), &stats] (
                std::unique_ptr<mutation_holder>& m,
                db::consistency_level cl,
                db::write_type type) mutable {
        std::unordered_set<gms::inet_address> targets;
        targets.reserve(pending_endpoints.size() + 1);
        std::vector<gms::inet_address> dead_endpoints;
        boost::algorithm::partition_copy(
                boost::range::join(pending_endpoints, target),
                std::inserter(targets, targets.begin()),
                std::back_inserter(dead_endpoints),
                [] (gms::inet_address ep) { return gms::get_local_gossiper().is_alive(ep); });
        auto& ks = _db.local().find_keyspace(m->schema()->ks_name());
        slogger.trace("Creating write handler with live: {}; dead: {}", targets, dead_endpoints);
        db::assure_sufficient_live_nodes(cl, ks, targets, pending_endpoints);
        return create_write_response_handler(
            ks,
            cl,
            type,
            std::move(m),
            std::move(targets),
            pending_endpoints,
            std::move(dead_endpoints),
            nullptr,
            stats);
    }).then([this, cl, timeout = std::move(timeout)] (std::vector<unique_response_handler> ids) mutable {
        return mutate_begin(std::move(ids), cl, std::move(timeout));
    }).then_wrapped([p = shared_from_this(), lc, &stats] (future<>&& f) {
        return p->mutate_end(std::move(f), lc, stats, nullptr);
    });
}

Let's see what can happen now, and how we get first the unhandled exception warning, and then the crash:

  1. If the node is still alive (is_alive()), create_write_response_handler() is called and creates the handler object. Because this is a view update, it also puts this handler in a special view_update_handlers list.
  2. Now, because we have a then(), we can defer before calling mutate_begin().
  3. This deferal can now give Scylla time to recognize the remote node is down and call the on_down() code shown earlier. This code will remove the response handler we just created from the list. At this point the code adding this handler holds just its id - not a shared pointer - so the handler will be destroyed. When destroyed with a timeout, it will set a promise that nobody is waiting for, so we get the unhandled exception warning.
  4. Finally the scheduler goes back to the above code and runs mutate_begin().
  5. mutate_begin() calls hint_to_dead_endpoints(response_id, cl). This function just calls find() for the id without checking, and crashes.

Thanks to @gleb-cloudius for helping me to analyze this bug.

@nyh

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

@gleb-cloudius @psarna I was thinking how to solve this bug, and thought of various complicated changes using shared pointers, exceptions, and other refactoring, but then thought of the following very simple solution (just 3 lines of code). What do you think? It compiles, but I haven't tested it (I don't really know how). Perhaps you guys can confirm (or deny...) that this approach makes sense.

diff --git a/service/storage_proxy.cc b/service/storage_proxy.cc
index 8ae91f36f..85f823454 100644
--- a/service/storage_proxy.cc
+++ b/service/storage_proxy.cc
@@ -1097,6 +1097,23 @@ future<> storage_proxy::mutate_begin(std::vector<unique_r
esponse_handler> ids, d
                                      std::optional<clock_type::time_point> timeout_opt) {
     return parallel_for_each(ids, [this, cl, timeout_opt] (unique_response_handler& protected_response) {
         auto response_id = protected_response.id;
+        // This function, mutate_begin(), is called after a preemption point
+        // so it's possible that other code besides our caller just ran. In
+        // particular, Scylla may have noticed that a remote node went down,
+        // called storage_proxy::on_down(), and removed some of the ongoing
+        // handlers, including this id. If this happens, we need to ignore
+        // this id - not try to look it up or start a send.
+        if (_response_handlers.find(response_id) == _response_handlers.end()) {
+            protected_response.release(); // Don't try to remove this id again
+            // Requests that time-out normally below after response_wait()
+            // result in an exception (see ~abstract_write_response_handler())
+            // However, here we no longer have the handler or its information
+            // to put in the exception. The exception is not needed for
+            // correctness (e.g., hints are written by timeout_cb(), not
+            // because of an exception here). The only thing we lose is
+            // accounting of these rare cases as timeouts.
+            return make_ready_future<>();
+        }
         // it is better to send first and hint afterwards to reduce latency
         // but request may complete before hint_to_dead_endpoints() is called and
         // response_id handler will be removed, so we will have to do hint with separate
@gleb-cloudius

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

@psarna

This comment has been minimized.

Copy link
Member

commented May 15, 2019

note: 3.0 deserves a backport too.

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

Hit again in test_remove_node_during_index_build_2

@nyh how about going forward with your proposed fix?

@bhalevy

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

ping @nyh

@nyh

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

@bhalevy I sent a patch 3 weeks ago (May 13), titled "storage_proxy: fix race and crash in case of MV and other node shutdown". After a long thread, tt was never committed, I don't remember why. I'll go back and check the discussion and if there is no real complaint, I'll ping :-(

tgrabiec added a commit that referenced this issue Jun 21, 2019

storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>

avikivity added a commit that referenced this issue Jun 23, 2019

storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
(cherry picked from commit 6e87bca)

avikivity added a commit that referenced this issue Jun 23, 2019

storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
(cherry picked from commit 6e87bca)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.