Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seastar - Exceptional future ignored: std::out_of_range (_Map_base::at) #1656

Closed
slivne opened this issue Sep 8, 2016 · 20 comments
Closed
Assignees
Labels
Milestone

Comments

@slivne
Copy link
Contributor

slivne commented Sep 8, 2016

Installation details
Scylla version (or git commit hash):
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): ScyllaDB 1.3 (ami-78d7b46f)
Instance c3.large

on initial boot we get:

Sep 08 12:06:20 ip-10-167-238-78 scylla[2144]:  [shard 0] seastar - Exceptional future ignored: std::out_of_range (_Map_base::at)

I was not able to reproduce this with ccm easily.

@slivne slivne added the bug label Sep 8, 2016
@slivne slivne added this to the 1.3 milestone Sep 8, 2016
@slivne
Copy link
Contributor Author

slivne commented Sep 8, 2016

@asias I think I also have seen this in cases that nodes have data and we do a restart ....

@asias
Copy link
Contributor

asias commented Sep 8, 2016

any other log messages, the issue looks strange ...

@penberg
Copy link
Contributor

penberg commented Sep 8, 2016

@asias @slivne There's a full backtrace of this issue on the mailing list captured by @gleb-cloudius himself!

@gleb-cloudius
Copy link
Contributor

On Thu, Sep 08, 2016 at 10:20:36AM -0700, Asias He wrote:

any other log messages, the issue looks strange ...

This looks like the one I reported last week:

https://groups.google.com/forum/#!topic/scylladb-dev/YZjSRf8RU4s

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 12, 2016

I can't reproduce neither locally or on AWS.

@slivne slivne assigned elcallio and unassigned asias Sep 14, 2016
@slivne
Copy link
Contributor Author

slivne commented Sep 14, 2016

@slivne find a reproducer

@elcallio
Copy link
Contributor

@gleb-cloudius - you managed to get this locally, no? What did you run?

@gleb-cloudius
Copy link
Contributor

On Wed, Sep 14, 2016 at 02:51:30AM -0700, Calle Wilund wrote:

@gleb-cloudius - you managed to get this locally, no? What did you run?

Just restarted a process and got this during the boot. May be it was
there even during the initial run, but I did not notice.

        Gleb.

@elcallio
Copy link
Contributor

Ok, checking a little further:

out_of_range("_Map_base::at") is thrown (not surprisingly) in std::__detail::_Map_base. However, this is only used as super type in _Hashtable derivatives, i.e. only std::unordered_map.

Now, there is an unordered_map in the collectd loop, in the "cpwriter" that outputs data to packets. However, it does not use the "at" method. In fact, it only uses operator[] and "clear". So unless there is serious library bug or compiler weirdness going on, I don't see how that could be the culprit either.

@slivne
Copy link
Contributor Author

slivne commented Sep 14, 2016

how to reproduce

  1. create a cluster of 3 servers on i2.2x using the ami (
  2. run cassandra-stress cassandra-stress write n=1000000 -node hostname``
  3. run sudo service scylla-server restart
  4. check journalctl

I did it 5 times and on every restart I got the error.

@eyalgutkind
Copy link

@slivne @asias Please update on progress on this issue.
Thx.

@kthommandra
Copy link

Pasting the snippet from scylladb-users group

software: scylla-server-1.3.0-20160824.ec3ace5.el7.centos.x86_64

setup: 3 racks, 2 nodes per rack, 1 seed node per rack

test: start the full 6 node cluster and then shutdown the 6 node cluster .. repeat..

  • PASS: when I stop the services across all nodes and delete the data and commitlogs and then start the services then everything works as expected
  • FAIL: when I stop the services across all nodes and without deleting the data and commitlogs and then start the services then startup doesn't complete

In both cases above, I flush the nodes prior to shutting down the services. I'm not doing any IO between the start and stop. Just cluster up and cluster down.

In the failure case, I see continuous stream of following messages on the nodes during startup (results in startup failure)

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional future ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:16 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (73 seconds passed)
Sep 18 11:33:17 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (74 seconds passed)
Sep 18 11:33:18 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (75 seconds passed)
Sep 18 11:33:19 XXX scylla[19483]: [shard 0] seastar - Exceptional future ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:19 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (76 seconds passed)
Sep 18 11:33:20 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (77 seconds passed)
Sep 18 11:33:21 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (78 seconds passed)
Sep 18 11:33:22 XXX scylla[19483]: [shard 0] seastar - Exceptional future ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:22 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (79 seconds passed)
Sep 18 11:33:23 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (80 seconds passed)
Sep 18 11:33:24 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (81 seconds passed)

Again, above issue does not happen if I start the cluster without any data or commitlog directories i.e. clean.

This is very easily recreatable and happens every single time.

I'm not suspecting anything wrong with configuration files since the cluster comes up successfully when started without the data and commitlog dirs.

Initially I did not have "nodetool flush" before the service shutdown so I added it but it didn't help.

Are there are "startup ordering" constraints in the current version ?
I start all the seed nodes first and once they are stable as per nodetool, then I startup the non-seed nodes. Is this necessary ?
I'm using our internal ansible scripts to orchestrate the cluster startup and shutdown. I'm aware of the github project with scylla and ansible for EC2.

Any help is appreciated

-krishna

@slivne
Copy link
Contributor Author

slivne commented Sep 20, 2016

@asias, @

eyalgutkind https://github.com/eyalgutkind I am not sure the issue is
related to the error message (we are still chasing that one and currently
it is in a different part of the code - metrics related).

I have created scylldb/scylla#1679 to follow this one

On Tue, Sep 20, 2016 at 1:45 AM, Krishnanand Thommandra <
notifications@github.com> wrote:

Pasting the snippet from scylladb-users group

software: scylla-server-1.3.0-20160824.ec3ace5.el7.centos.x86_64

setup: 3 racks, 2 nodes per rack, 1 seed node per rack

test: start the full 6 node cluster and then shutdown the 6 node cluster
.. repeat..

  • PASS: when I stop the services across all nodes and delete the data
    and commitlogs and then start the services then everything works as expected
  • FAIL: when I stop the services across all nodes and without
    deleting the data and commitlogs and then start the services then startup
    doesn't complete

In both cases above, I flush the nodes prior to shutting down the
services. I'm not doing any IO between the start and stop. Just cluster up
and cluster down.

In the failure case, I see continuous stream of following messages on the
nodes during startup (results in startup failure)

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional future
ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:16 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (73 seconds passed)
Sep 18 11:33:17 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (74 seconds passed)
Sep 18 11:33:18 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (75 seconds passed)
Sep 18 11:33:19 XXX scylla[19483]: [shard 0] seastar - Exceptional future
ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:19 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (76 seconds passed)
Sep 18 11:33:20 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (77 seconds passed)
Sep 18 11:33:21 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (78 seconds passed)
Sep 18 11:33:22 XXX scylla[19483]: [shard 0] seastar - Exceptional future
ignored: std::out_of_range (_Map_base::at)
Sep 18 11:33:22 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (79 seconds passed)
Sep 18 11:33:23 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (80 seconds passed)
Sep 18 11:33:24 XXX scylla[19483]: [shard 0] gossip - Connect seeds again
... (81 seconds passed)

Again, above issue does not happen if I start the cluster without any data
or commitlog directories i.e. clean.

This is very easily recreatable and happens every single time.

I'm not suspecting anything wrong with configuration files since the
cluster comes up successfully when started without the data and commitlog
dirs.

Initially I did not have "nodetool flush" before the service shutdown so I
added it but it didn't help.

Are there are "startup ordering" constraints in the current version ?
I start all the seed nodes first and once they are stable as per nodetool,
then I startup the non-seed nodes. Is this necessary ?
I'm using our internal ansible scripts to orchestrate the cluster startup
and shutdown. I'm aware of the github project with scylla and ansible for
EC2.

Any help is appreciated

-krishna


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADThCFuVGd3hedF8YXCo61j2e3tVjZfAks5qrxBxgaJpZM4J39g0
.

@asias
Copy link
Contributor

asias commented Sep 20, 2016

So, there are two issues:

  1. "seastar - Exceptional future ignored: std::out_of_range (_Map_base::at)"
  2. "Sep 18 11:33:24 XXX scylla[19483]: [shard 0] gossip - Connect seeds again ... (81 seconds passe" which is tracked by issue Nodes do not start if they are not able to connect to seed nodes on initial start #1679

@gleb-cloudius
Copy link
Contributor

Here is better trace:

#0 0x00007ffff7251eb0 in std::out_of_range::out_of_range(char const_)@plt () from /lib64/libstdc++.so.6
#1 0x00007ffff727e659 in std::__throw_out_of_range (__s=__s@entry=0x16912cf "_Map_base::at") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:90
#2 0x0000000000ea208d in std::__detail::_Map_base<gms::inet_address, std::pair<gms::inet_address const, gms::endpoint_state>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> >, std::__detail::_Select1st, std::equal_togms::inet_address, std::hashgms::inet_address, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::Prime_rehash_policy, std::__detail::Hashtable_traits<true, false, true>, true>::at (this=, __k=...)
at /usr/include/c++/5.3.1/bits/hashtable_policy.h:646
#3 0x0000000000e792e4 in std::unordered_map<gms::inet_address, gms::endpoint_state, std::hashgms::inet_address, std::equal_togms::inet_address, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::at (__k=...,
this=) at /usr/include/c++/5.3.1/bits/unordered_map.h:685
#4 gms::gossiper::<lambda()>::operator() (__closure=0x7fffffffcaf8) at gms/gossiper.cc:127
#5 scollectd::valuegms::gossiper::setup_collectd()::<lambda() >::operator() (this=0x7fffffffcaf0) at /home/gleb/work/seastar/seastar/core/scollectd.hh:603
#6 scollectd::valuegms::gossiper::setup_collectd()::<lambda() >::operator uint64_t (this=0x7fffffffcaf0) at /home/gleb/work/seastar/seastar/core/scollectd.hh:612
#7 scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda() > > >::<lambda(scollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda()> > >)>::operator() (args#0=...,
__closure=) at /home/gleb/work/seastar/seastar/core/scollectd.hh:724
#8 scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda() > > >::do_unpack<0ul, scollectd::values_impl::values(net::packed
) const [with Args = {scollectd::valuescollectd::typed<gms::gossiper::setup_collectd()::<lambda() > >}; net::packed = unaligned]::<lambda(scollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda()> > >)> > (this=, op=,
t=...) at /home/gleb/work/seastar/seastar/core/scollectd.hh:736
#9 scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda() > > >::unpackscollectd::values_impl<Args::values(net::packed
) const [with Args = {scollectd::valuescollectd::typed<gms::gossiper::setup_collectd()::<lambda() > >}; net::packed = unaligned]::<lambda(scollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda()> > >)> > (this=, op=, t=...)
at /home/gleb/work/seastar/seastar/core/scollectd.hh:731
#10 scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda() > > >::values(net::packed ) const (this=, p=0x600008a78dc7) at /home/gleb/work/seastar/seastar/core/scollectd.hh:723
#11 0x0000000000534c8a in scollectd::cpwriter::put (v=warning: RTTI symbol not found for class 'scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::{lambda()#1} > >'
..., type=scollectd::part_type::Values, this=0x600008a78c08) at core/scollectd.cc:183
#12 scollectd::cpwriter::put (v=warning: RTTI symbol not found for class 'scollectd::values_implscollectd::value<scollectd::typed<gms::gossiper::setup_collectd()::{lambda()#1} > >'
..., id=..., period=..., host=..., this=0x600008a78c08) at core/scollectd.cc:214
#13 scollectd::impl::<lambda()>::operator() (__closure=0x600008bf6dc0) at core/scollectd.cc:353
#14 do_until_continuedscollectd::impl::run()::<lambda()&, scollectd::impl::run()::<lambda()>&>(scollectd::impl::<lambda()> &, scollectd::impl::<lambda()> &, promise<>) (stop_cond=..., action=..., p=...) at ./core/future-util.hh:148
#15 0x0000000000537010 in <lambda(std::result_of_t<scollectd::impl::run()::<lambda()>&()>)>::operator() (fut=..., __closure=) at ./core/future-util.hh:153
#16 do_void_futurize_apply<do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = scollectd::impl::run()::<lambda()>&; StopCondition = scollectd::impl::run()::<lambda()>&]::<lambda(std::result_of_t<scollectd::impl::run()::<lambda()>&()>)>, future<> >(<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x62f57c, DIE 0x7008b4>) (func=func@entry=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x62f57c, DIE 0x7008b4>) at ./core/future.hh:1183
#17 0x0000000000537134 in futurize::apply<do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = scollectd::impl::run()::<lambda()>&; StopCondition = scollectd::impl::run()::<lambda()>&]::<lambda(std::result_of_t<scollectd::impl::run()::<lambda()>&()>)>, future<> > (func=) at ./core/future.hh:1231
#18 future<>::<lambda(auto:2&&)>::operator()<future_state<> > (state=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x62f57c, DIE 0x6da9c6>, _closure=0x600008bf6da0) at ./core/future.hh:887
#19 continuation<future::then_wrapped(Func&&) [with Func = do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with AsyncAction = scollectd::impl::run()::<lambda()>&; StopCondition = scollectd::impl::run()::<lambda()>&]::<lambda(std::result_of_t<scollectd::impl::run()::<lambda()>&()>)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >::run(void) (this=0x600008bf6d90) at ./core/future.hh:390
#20 0x000000000048bb2f in reactor::run_tasks (this=this@entry=0x60000028e000, tasks=...) at core/reactor.cc:1896
#21 0x00000000004c60fb in reactor::run (this=0x60000028e000) at core/reactor.cc:2316
#22 0x0000000000541604 in app_template::run_deprecated(int, char
, std::function<void ()>&&) (this=this@entry=0x7fffffffd820, ac=ac@entry=29, av=av@entry=0x7fffffffda98,
func=func@entry=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x7232ef, DIE 0x7e9d35>) at core/app-template.cc:131
#23 0x000000000041da9b in main (ac=29, av=0x7fffffffda98) at main.cc:663

        Gleb.

@slivne
Copy link
Contributor Author

slivne commented Sep 20, 2016

Thanks gleb @AssiaS - collectd info related to gossip ?

On Tue, Sep 20, 2016 at 3:41 PM, Gleb Natapov notifications@github.com
wrote:

Here is better trace:

#0 0x00007ffff7251eb0 in std::out_of_range::out_of_range(char const*)@plt
() from /lib64/libstdc++.so.6
#1 0x00007ffff727e659 in std::__throw_out_of_range (__s=__s@entry=0x16912cf
"_Map_base::at") at ../../../../../libstdc++-v3/
src/c++11/functexcept.cc:90
#2 0x0000000000ea208d in std::__detail::_Map_base<gms::inet_address,
std::pair<gms::inet_address const, gms::endpoint_state>,
std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> >,
std::__detail::_Select1st, std::equal_togms::inet_address,
std::hashgms::inet_address, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits<true, false, true>, true>::at
(this=, __k=...)
at /usr/include/c++/5.3.1/bits/hashtable_policy.h:646
#3 0x0000000000e792e4 in std::unordered_map<gms::inet_address,
gms::endpoint_state, std::hashgms::inet_address, std::equal_togms::inet_address,
std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> >

::at (__k=...,
this=) at /usr/include/c++/5.3.1/bits/unordered_map.h:685
#4 gms::gossiper::<lambda()>::operator() (__closure=0x7fffffffcaf8) at
gms/gossiper.cc:127
#5 scollectd::valuegms::gossiper::setup_collectd()::<lambda()
::operator() (this=0x7fffffffcaf0) at /home/gleb/work/seastar/
seastar/core/scollectd.hh:603
#6 scollectd::valuegms::gossiper::setup_collectd()::<lambda()
::operator uint64_t (this=0x7fffffffcaf0) at /home/gleb/work/seastar/
seastar/core/scollectd.hh:612
#7 scollectd::values_impl<scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > >
::<lambda(scollectd::value<scollectd::typed<gms::
gossiper::setup_collectd()::<lambda()> > >)>::operator() (args#0=...,
__closure=) at /home/gleb/work/seastar/
seastar/core/scollectd.hh:724
#8 scollectd::values_impl<scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > > >::do_unpack<0ul,
scollectd::values_impl::values(net::packed)
const [with Args = {scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > >}; net::packed = unaligned]::<lambda(scollectd::
value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda()> > >)>
(this=, op=,
t=...) at /home/gleb/work/seastar/seastar/core/scollectd.hh:736
#9 scollectd::values_impl<scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > >
::unpackscollectd::values_impl<Args::values(net::packed
) const [with Args = {scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > >}; net::packed = unaligned]::<lambda(scollectd::
value<scollectd::typed<gms::gossiper::setup_collectd()::<lambda()> > >)>
(this=, op=, t=...)
at /home/gleb/work/seastar/seastar/core/scollectd.hh:731
#10 scollectd::values_impl<scollectd::value<scollectd::
typed<gms::gossiper::setup_collectd()::<lambda()> > >
::values(net::packed _) const (this=, p=0x600008a78dc7) at
/home/gleb/work/seastar/seastar/core/scollectd.hh:723
#11 0x0000000000534c8a in scollectd::cpwriter::put (v=warning: RTTI symbol
not found for class 'scollectd::values_impl<scollectd::value<scollectd::
typedgms::gossiper::setup_collectd()::{lambda()#1} > >'
..., type=scollectd::part_type::Values, this=0x600008a78c08) at
core/scollectd.cc:183
#12 scollectd::cpwriter::put (v=warning: RTTI symbol not found for class
'scollectd::values_impl<scollectd::value<scollectd::
typedgms::gossiper::setup_collectd()::{lambda()#1} > >'
..., id=..., period=..., host=..., this=0x600008a78c08) at
core/scollectd.cc:214
#13 scollectd::impl::<lambda()>::operator() (__closure=0x600008bf6dc0) at
core/scollectd.cc:353
#14 do_until_continuedscollectd::impl::run()::<lambda()&,
scollectd::impl::run()::<lambda()>&>(scollectd::impl::<lambda()> &,
scollectd::impl::<lambda()> &, promise<>) (stop_cond=..., action=...,
p=...) at ./core/future-util.hh:148
#15 0x0000000000537010 in <lambda(std::result_of_t<
scollectd::impl::run()::<lambda()>&()>)>::operator() (fut=...,
_closure=) at ./core/future-util.hh:153
#16 do_void_futurize_apply<do_until_continued(StopCondition&&,
AsyncAction&&, promise<>) [with AsyncAction = scollectd::impl::run()::<lambda()>&;
StopCondition = scollectd::impl::run()::<lambda()>&]::<lambda(std::
result_of_t<scollectd::impl::run()::<lambda()>&()>)>, future<> >(<unknown
type in /home/gleb/work/seastar/build/release/scylla, CU 0x62f57c, DIE
0x7008b4>) (func=func@entry=<unknown type in
/home/gleb/work/seastar/build/release/scylla, CU 0x62f57c, DIE 0x7008b4>)
at ./core/future.hh:1183
#17 0x0000000000537134 in futurize::apply<do

until_continued(StopCondition&&, AsyncAction&&, promise<>) [with
AsyncAction = scollectd::impl::run()::<lambda()>&; StopCondition =
scollectd::impl::run()::<lambda()>&]::<lambda(std::
result_of_t<scollectd::impl::run()::<lambda()>&()>)>, future<> >
(func=) at ./core/future.hh:1231
#18 future<>::<lambda(auto:2&&)>::operator()<future_state<> >
(state=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU
0x62f57c, DIE 0x6da9c6>, _closure=0x600008bf6da0) at ./core/future.hh:887
#19 continuation<future::then_wrapped(Func&&) [with Func =
do_until_continued(StopCondition&&, AsyncAction&&, promise<>) [with
AsyncAction = scollectd::impl::run()::<lambda()>&; StopCondition =
scollectd::impl::run()::<lambda()>&]::<lambda(std::
result_of_t<scollectd::impl::run()::<lambda()>&()>)>; Result = future<>;
T = {}]::<lambda(auto:2&&)> >::run(void) (this=0x600008bf6d90) at
./core/future.hh:390
#20 0x000000000048bb2f in reactor::run_tasks (this=this@entry=0x60000028e000,
tasks=...) at core/reactor.cc:1896
#21 0x00000000004c60fb in reactor::run (this=0x60000028e000) at
core/reactor.cc:2316
#22 0x0000000000541604 in app_template::run_deprecated(int, char
*,
std::function<void ()>&&) (this=this@entry=0x7fffffffd820, ac=ac@entry=29,
av=av@entry=0x7fffffffda98,
func=func@entry=<unknown type in /home/gleb/work/seastar/build/release/scylla,
CU 0x7232ef, DIE 0x7e9d35>) at core/app-template.cc:131
#23 0x000000000041da9b in main (ac=29, av=0x7fffffffda98) at main.cc:663

Gleb.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADThCHAg9deblXqkGjEFIhDtlvyz2Xjtks5qr9RggaJpZM4J39g0
.

@asias
Copy link
Contributor

asias commented Sep 20, 2016

@gleb-cloudius Thanks. I think I found the culprit.

@asias
Copy link
Contributor

asias commented Sep 20, 2016

Gleb, can you try this patch if you can reproduce locally?

--- a/gms/gossiper.cc
+++ b/gms/gossiper.cc
@@ -124,7 +124,12 @@ gossiper::setup_collectd() {
             scollectd::type_instance_id("gossip", scollectd::per_cpu_plugin_instance,
                     "derive", "heart_beat_version"),
             scollectd::make_typed(scollectd::data_type::DERIVE, [ep, this] {
-                return this->endpoint_state_map.at(ep).get_heart_beat_state().get_heart_beat_version(); })),
+                if (this->endpoint_state_map.count(ep)) {
+                    return this->endpoint_state_map.at(ep).get_heart_beat_state().get_heart_beat_version();
+                } else {
+                    return 0;
+                }
+            })),
     };
 }

penberg pushed a commit that referenced this issue Sep 20, 2016
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
(cherry picked from commit aa47265)
@gleb-cloudius
Copy link
Contributor

On Tue, Sep 20, 2016 at 06:17:25AM -0700, Asias He wrote:

Gleb, can you try this patch if you can reproduce locally?

The patch fixed the issue.

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 21, 2016

@gleb-cloudius thanks for confirming.

denesb pushed a commit to denesb/scylla that referenced this issue Oct 20, 2021
…cross versions (2021.1)' from Eliran Sinvani

This is the 2021.1 version of scylladb#1656
It has a lot less commits because almost all of the API commits and the reverted commits (from 2020.1) are already there.

Closes scylladb#1660

* github.com:scylladb/scylla-enterprise:
  Merge 'Fix inconsistencies in MV and SI (reworked)' from Eliran Sinvani
  storage_proxy: Add .local_db() getters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants