Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossip cannot rejoin a node #403

Closed
gleb-cloudius opened this issue Sep 24, 2015 · 32 comments
Closed

gossip cannot rejoin a node #403

gleb-cloudius opened this issue Sep 24, 2015 · 32 comments
Assignees
Labels

Comments

@gleb-cloudius
Copy link
Contributor

I run 3 instance cluster with ami-afb6c8ca and loaded it with write (RF=3 Cl=ALL) until some node fell out of cluster due to big networking latencies, but after traffic stopped it never rejoined the cluster. Now I see very strange picture: there are three nodes 81, 82 and 83. 81 and 83 see only 2 nodes in the cluster (node 81 and 83), but node 82 sees all three!

@gleb-cloudius
Copy link
Contributor Author

After restarting node 82 everything is back to normal.

@asias
Copy link
Contributor

asias commented Sep 28, 2015

@gleb-cloudius This can happen when node 82 is not running the gossiper::run() again due to stuck in

          _the_gossiper.invoke_on_all([this, endpoint_map_changed,
              live_endpoint_changed] (gossiper& local_gossiper) {
              // Don't copy gossiper(CPU0) maps into themselves!
              if (engine().cpu_id() != 0) {
                  if (endpoint_map_changed) {
                      local_gossiper.endpoint_state_map = _shadow_endpoint_state_map;
                  }

                  if (live_endpoint_changed) {
                      local_gossiper._live_endpoints = _shadow_live_endpoints;
                  }
              }
          }).get();

I saw this happen myself once.

In this state, gossiper::do_status_check() will not be called, node 82 will not update other nodes's status information (up or down), thus you saw node 2 thought all the node were up.

However, node 83 and 81 can not connect to node 82, so it thought 82 is down which is correct.

When you restart node 82, it is recovered from the hang, the node 83 and 81 can talk to it again, thus cluster becomes normal again.

I will send a patch to expose the heart beat version info though our HTTP api, so that we can check if gossiper::run() still runs when this issue happens.

@asias
Copy link
Contributor

asias commented Sep 28, 2015

@gleb-cloudius I verified my theory with cl=all,rf=3 on a 3 nodes scylla cluster + 2 c-s nodes, after 1h of run.

3547 is the heart beat version for node 155. It is not changed after

[fedora@ip-172-31-47-55 c-s]$ date
Mon Sep 28 08:50:01 UTC 2015
[fedora@ip-172-31-47-55 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]
3547
===================check 172.31.33.156
["172.31.33.156","172.31.33.157"]
3745
===================check 172.31.33.157
["172.31.33.156","172.31.33.157"]
3739

a few seconds later

[fedora@ip-172-31-47-55 c-s]$ date
Mon Sep 28 08:50:09 UTC 2015
[fedora@ip-172-31-47-55 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]
3547
===================check 172.31.33.156
["172.31.33.156","172.31.33.157"]
3754
===================check 172.31.33.157
["172.31.33.156","172.31.33.157"]
3748

Let's take a closer look at node 155. In node 155's eye, node 156 and node 157's heart beat version are increasing this means node 155 can receive gossip data from them and send node 155's info back, however, node 155's heart beat version is not updated, so the other node will think node 155 is not alive.
So, we stuck somewhere in gossip::run and the gossiper timer is not enabled again, next gossip round is not scheduled.

[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.155";echo
3547
[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.156";echo
4201
[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.157";echo
4198
[fedora@ip-172-31-47-55 c-s]$ sleep 5
[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.155";echo
3547
[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.156";echo
4225
[fedora@ip-172-31-47-55 c-s]$ curl --silent -X GET "http://172.31.33.155:10000/gossiper/heart_beat_version/172.31.33.157";echo
4222

On node 155, last 100 lines of journal

Sep 28 08:46:06 ip-172-31-33-155 scylla[1651]:  [shard 0] compaction - Compacting [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-403-Data.db:level=0, /data/data/pie5Eohe/standard1-475974
Sep 28 08:46:06 ip-172-31-33-155 scylla[1651]:  [shard 13] compaction - Compacting [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-416-Data.db:level=0, /data/data/pie5Eohe/standard1-47597
Sep 28 08:46:07 ip-172-31-33-155 scylla[1651]:  [shard 22] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-549-Data.db]. 36349036 bytes to 9087259 (~2
Sep 28 08:46:07 ip-172-31-33-155 scylla[1651]:  [shard 2] compaction - Compacted 4 sstables to [/data/data/Iefaihu5/standard1-53b53a6065b511e58e2a000000000000/Iefaihu5-standard1-ka-2761-Data.db]. 36304076 bytes to 9076019 (~2
Sep 28 08:46:07 ip-172-31-33-155 scylla[1651]:  [shard 2] compaction - Compacting [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-405-Data.db:level=0, /data/data/pie5Eohe/standard1-475974
Sep 28 08:46:07 ip-172-31-33-155 scylla[1651]:  [shard 30] compaction - Compacted 4 sstables to [/data/data/Iefaihu5/standard1-53b53a6065b511e58e2a000000000000/Iefaihu5-standard1-ka-2789-Data.db]. 36335548 bytes to 9083887 (~
Sep 28 08:46:07 ip-172-31-33-155 scylla[1651]:  [shard 12] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-539-Data.db]. 36289464 bytes to 9072366 (~2
Sep 28 08:46:08 ip-172-31-33-155 scylla[1651]:  [shard 0] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-527-Data.db]. 36272604 bytes to 9068151 (~25
Sep 28 08:46:08 ip-172-31-33-155 scylla[1651]:  [shard 13] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-540-Data.db]. 36280472 bytes to 9070118 (~2
Sep 28 08:46:08 ip-172-31-33-155 scylla[1651]:  [shard 2] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-529-Data.db]. 36304076 bytes to 9076019 (~25
Sep 28 08:46:09 ip-172-31-33-155 scylla[1651]:  [shard 23] compaction - Compacting [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-426-Data.db:level=0, /data/data/pie5Eohe/standard1-47597
Sep 28 08:46:10 ip-172-31-33-155 scylla[1651]:  [shard 23] compaction - Compacted 4 sstables to [/data/data/pie5Eohe/standard1-4759745065bc11e59ad1000000000007/pie5Eohe-standard1-ka-550-Data.db]. 35803896 bytes to 8950974 (~2
Sep 28 08:46:10 ip-172-31-33-155 scylla[1651]:  [shard 23] compaction - Compacting [/data/data/Iefaihu5/standard1-53b53a6065b511e58e2a000000000000/Iefaihu5-standard1-ka-2658-Data.db:level=0, /data/data/Iefaihu5/standard1-53b5
Sep 28 08:46:11 ip-172-31-33-155 scylla[1651]:  [shard 23] compaction - Compacted 4 sstables to [/data/data/Iefaihu5/standard1-53b53a6065b511e58e2a000000000000/Iefaihu5-standard1-ka-2782-Data.db]. 35803896 bytes to 8950974 (~
Sep 28 08:47:19 ip-172-31-33-155 systemd[1]: Cannot add dependency job for unit tmp.mount, ignoring: Unit tmp.mount is masked.
Sep 28 08:47:19 ip-172-31-33-155 systemd[1]: Starting dnf makecache...
Sep 28 08:47:19 ip-172-31-33-155 dnf[7906]: cachedir: /var/cache/dnf
Sep 28 08:47:19 ip-172-31-33-155 dnf[7906]: Loaded plugins: download, kickstart, config-manager, reposync, needs-restarting, debuginfo-install, noroot, generate_completion_cache, builddep, Query, protected_packages, copr, pla
Sep 28 08:47:19 ip-172-31-33-155 dnf[7906]: DNF version: 1.1.1
Sep 28 08:47:19 ip-172-31-33-155 dnf[7906]: Making cache files for all metadata files.
Sep 28 08:47:19 ip-172-31-33-155 dnf[7906]: Metadata cache refreshed recently.
Sep 28 08:47:19 ip-172-31-33-155 systemd[1]: Started dnf makecache.
top
  1651 scylla    20   0 16.000t 0.015t  25888 R  3099 26.5   2045:37 scylla

@asias
Copy link
Contributor

asias commented Sep 28, 2015

I added debug code to check if replicate data to other cores will block:

diff --git a/gms/gossiper.cc b/gms/gossiper.cc
index 012a19a..bbb569a 100644
--- a/gms/gossiper.cc
+++ b/gms/gossiper.cc
@@ -533,6 +533,8 @@ void gossiper::run() {
             do_status_check();
         }

+        logger.info("After do_status_check");
+
         //
         // Gossiper task runs only on CPU0:
         //
@@ -552,8 +554,10 @@ void gossiper::run() {
                 _shadow_live_endpoints = _live_endpoints;
             }

+            auto sem = make_shared<semaphore>(0);
+            logger.info("Before invoke_on_all");
             _the_gossiper.invoke_on_all([this, endpoint_map_changed,
-                live_endpoint_changed] (gossiper& local_gossiper) {
+                live_endpoint_changed, sem] (gossiper& local_gossiper) {
                 // Don't copy gossiper(CPU0) maps into themselves!
                 if (engine().cpu_id() != 0) {
                     if (endpoint_map_changed) {
@@ -564,8 +568,14 @@ void gossiper::run() {
                         local_gossiper._live_endpoints = _shadow_live_endpoints;
                     }
                 }
+            }).finally([sem] {
+                sem->signal();
+            });
+            sem->wait(std::chrono::seconds(10)).handle_exception([] (auto ep) {
+                logger.info("Fail to replicate gossip data to other nodes: {}", ep);
             }).get();
         }
+        logger.info("After invoke_on_all");
     }).then_wrapped([this] (auto&& f) {
         try {
             f.get();
@@ -579,6 +589,8 @@ void gossiper::run() {
                 logger.debug("ep={}, eps={}", x.first, x.second);
             }
         }
+    }).finally([this] {
+        logger.info("=== Gossip SET TIMER");
         _scheduled_gossip_task.arm(INTERVAL);
     });
 }
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 5] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-160-Data.db]. 30050983 bytes to 9092879 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 22] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-177-Data.db]. 30050140 bytes to 9087259 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 2] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-157-Data.db]. 30080769 bytes to 9076019 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 30] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-185-Data.db]. 30045925 bytes to 9083887 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 0] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-155-Data.db]. 30045082 bytes to 9068151 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 7] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-162-Data.db]. 30065314 bytes to 9098218 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 12] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-167-Data.db]. 30149614 bytes to 9072366 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]:  [shard 10] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-41-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 17] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-172-Data.db]. 30081893 bytes to 9068151 (~3
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 15] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-46-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After do_status_check
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After invoke_on_all
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 1] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-32-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7d
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 24] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-55-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 26] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-57-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 28] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-59-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]:  [shard 13] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-168-Data.db]. 30102406 bytes to 9070118 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 6] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-161-Data.db]. 30110555 bytes to 9051853 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 29] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-184-Data.db]. 30116456 bytes to 9054382 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 4] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-159-Data.db]. 30107183 bytes to 9056349 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After do_status_check
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 25] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-180-Data.db]. 30145680 bytes to 9055787 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 16] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-47-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 11] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-166-Data.db]. 30109993 bytes to 9042861 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 3] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-158-Data.db]. 30131630 bytes to 9052134 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]:  [shard 10] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-165-Data.db]. 30162259 bytes to 9048481 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 1] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-156-Data.db]. 30108588 bytes to 9035274 (~30
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 24] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-179-Data.db]. 30108026 bytes to 9031340 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 15] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-170-Data.db]. 30111960 bytes to 9028249 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 26] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-181-Data.db]. 30051264 bytes to 9015604 (~3
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 28] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-183-Data.db]. 30067843 bytes to 9012232 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]:  [shard 16] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-171-Data.db]. 30017544 bytes to 8991157 (~2
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After invoke_on_all
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:25 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After do_status_check
Sep 28 10:38:25 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After invoke_on_all
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:36 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After do_status_check
Sep 28 10:38:36 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After invoke_on_all
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:47 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After do_status_check
Sep 28 10:38:47 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - After invoke_on_all
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]:  [shard 0] gossip - === Gossip SET TIMER

[fedora@ip-172-31-37-234 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]640
===================check 172.31.33.156
["172.31.33.155","172.31.33.156","172.31.33.157"]4262
===================check 172.31.33.157
["172.31.33.155","172.31.33.156","172.31.33.157"]4282
[fedora@ip-172-31-37-234 c-s]$
[fedora@ip-172-31-37-234 c-s]$ sleep 5
[fedora@ip-172-31-37-234 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]641
===================check 172.31.33.156
["172.31.33.155","172.31.33.156","172.31.33.157"]4275
===================check 172.31.33.157
["172.31.33.155","172.31.33.156","172.31.33.157"]4296

$ sudo perf top
 26.83%  [kernel]             [k] pvclock_clocksource_read
  26.63%  scylla               [.] futurize<future<> >::apply<do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}, file>
   9.66%  scylla               [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_erase
   6.89%  scylla               [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
   6.83%  scylla               [.] reactor::stop
   3.19%  scylla               [.] future<> parallel_for_each<boost::range_detail::integer_iterator<unsigned int>, seastar::sharded<cql3::query_processor>::stop()::{lambda(unsigned int)#1}>(boost::range_detail::integer_iterat
   2.50%  [kernel]             [k] system_call_after_swapgs
   1.67%  scylla               [.] reactor::reactor
   1.48%  [kernel]             [k] __fget
   1.34%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.26%  libc-2.21.so         [.] __libc_enable_asynccancel
   1.17%  [vdso]               [.] __vdso_clock_gettime
   0.74%  scylla               [.] smp_message_queue::async_work_item<reactor::exit(int)::{lambda()#1}>::process
   0.72%  libc-2.21.so         [.] 0x0000000000103193
   0.64%  [kernel]             [k] copy_user_enhanced_fast_string
   0.58%  [kernel]             [k] sys_clock_gettime
   0.51%  libc-2.21.so         [.] __libc_disable_asynccancel
   0.47%  [kernel]             [k] fput
   0.47%  [kernel]             [k] sys_epoll_wait
   0.45%  [kernel]             [k] __fget_light
   0.44%  [kernel]             [k] __getnstimeofday64
   0.32%  libaio.so.1.0.1      [.] io_getevents
   0.31%  [kernel]             [k] ep_poll
   0.30%  scylla               [.] sstables::sstable::prepare_write_components
   0.27%  scylla               [.] validate
   0.27%  scylla               [.] __static_initialization_and_destruction_0
   0.24%  libstdc++.so.6.0.21  [.] std::chrono::_V2::system_clock::now
   0.22%  libc-2.21.so         [.] __clock_gettime
   0.21%  scylla               [.] __static_initialization_and_destruction_0
   0.19%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.17%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor11rename_fileE13basic_sstringIcjLj15EES7_EUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED2Ev
   0.16%  scylla               [.] _GLOBAL__sub_I__ZN3api18set_stream_managerERNS_12http_contextERN5httpd6routesE
   0.14%  scylla               [.] _ZN8futurizeI6futureIJ16temporary_bufferIcEEEE5applyIZNS3_7finallyIZ7do_withI12input_streamIcEZN8sstables7sstable9data_readEmmEUlRT_E_EDaOSC_OT0_EUlvE_EES3_SF_EUlS3_E_JS3_EEES3_SF_DpOT0_.isr
   0.14%  [kernel]             [k] _copy_to_user
   0.14%  [kernel]             [k] _raw_spin_lock_irqsave
   0.13%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.12%  [kernel]             [k] xen_clocksource_get_cycles
   0.12%  [kernel]             [k] posix_clock_realtime_get
   0.10%  scylla               [.] __static_initialization_and_destruction_0
   0.09%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.09%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN18blockdev_file_impl7discardEmmEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.09%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor14make_directoryE13basic_sstringIcjLj15EEEUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED0Ev
   0.08%  [kernel]             [k] __fdget
   0.07%  scylla               [.] future<std::tuple<future<>, future<> > >::then<future<std::tuple<future<>, future<> > >::discard_result()::{lambda(std::tuple<future<>, future<> >&&)#1}, future<> >
   0.07%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED2Ev
   0.07%  [kernel]             [k] system_call_fastpath
   0.07%  [kernel]             [k] getnstimeofday64
   0.06%  scylla               [.] boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >::~clone_impl
   0.06%  libc-2.21.so         [.] epoll_wait
   0.05%  libc-2.21.so         [.] 0x0000000000103197

This is on a 32 cores system.

processor       : 31
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz

@asias
Copy link
Contributor

asias commented Sep 28, 2015

@avikivity ideas?

@asias
Copy link
Contributor

asias commented Sep 28, 2015

$ sudo perf record --call-graph dwarf -p pgrep scylla
$ sudo perf report

-   60.60%     0.00%  scylla   libc-2.21.so         [.] __clone                                                                                                                                                                 ▒
     __clone                                                                                                                                                                                                                    ◆
-   60.60%     0.00%  scylla   libpthread-2.21.so   [.] start_thread                                                                                                                                                            ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   60.60%     0.00%  scylla   scylla               [.] 0x00000000000d7dfe                                                                                                                                                      ▒
     0x4d7dfe                                                                                                                                                                                                                   ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   60.60%     0.00%  scylla   scylla               [.] 0x0000000000099ee6                                                                                                                                                      ▒
     0x499ee6                                                                                                                                                                                                                   ▒
     0x4d7dfe                                                                                                                                                                                                                   ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   60.43%     0.09%  scylla   scylla               [.] 0x000000000008e04a                                                                                                                                                      ▒
   + 0x48e04a                                                                                                                                                                                                                   ▒
-   34.91%     0.06%  scylla   [kernel.kallsyms]    [k] system_call_fastpath                                                                                                                                                    ▒
   - system_call_fastpath                                                                                                                                                                                                       ▒
      + 0x7f351b1c4193                                                                                                                                                                                                          ▒
-   29.38%     0.61%  scylla   [kernel.kallsyms]    [k] sys_clock_gettime                                                                                                                                                       ▒
   - sys_clock_gettime                                                                                                                                                                                                          ▒
        system_call_fastpath                                                                                                                                                                                                    ▒
-   28.82%     0.11%  scylla   [kernel.kallsyms]    [k] posix_clock_realtime_get                                                                                                                                                ▒
   - posix_clock_realtime_get                                                                                                                                                                                                   ▒
      + sys_clock_gettime                                                                                                                                                                                                       ▒
-   28.72%     0.07%  scylla   [kernel.kallsyms]    [k] getnstimeofday64                                                                                                                                                        ▒
   - getnstimeofday64                                                                                                                                                                                                           ▒
      + posix_clock_realtime_get                                                                                                                                                                                                ▒
+   28.65%     0.44%  scylla   [kernel.kallsyms]    [k] __getnstimeofday64                                                                                                                                                      ▒
+   28.21%     0.13%  scylla   [kernel.kallsyms]    [k] xen_clocksource_get_cycles                                                                                                                                              ▒
+   28.11%    28.11%  scylla   [kernel.kallsyms]    [k] pvclock_clocksource_read                                                                                                                                                ▒
+   27.42%     0.45%  scylla   scylla               [.] 0x00000000000ace36                                                                                                                                                      ▒
+   14.66%    14.66%  scylla   scylla               [.] 0x000000000008a798                                                                                                                                                      ▒
+    7.64%     0.00%  scylla   scylla               [.] 0x0000000000072728                                                                                                                                                      ▒
+    7.45%     0.60%  scylla   scylla               [.] 0x00000000000acdf3                                                                                                                                                      ▒
+    5.63%     0.74%  scylla   libc-2.21.so         [.] 0x0000000000103193                                                                                                                                                      ▒
+    4.61%     0.58%  scylla   scylla               [.] 0x00000000000ace42                                                                                                                                                      ▒
+    4.19%     0.48%  scylla   [kernel.kallsyms]    [k] sys_epoll_wait                                                                                                                                                          ▒
+    4.10%     0.64%  scylla   scylla               [.] 0x00000000000acdfe                                                                                                                                                      ▒
+    3.77%     3.77%  scylla   scylla               [.] 0x000000000008a79f                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   libc-2.21.so         [.] __start_context                                                                                                                                                         ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000000eea02                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x0000000000013dde                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002bf181                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002e0032                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002dfdff                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002df848                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002d9e33                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002e4566                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002d7c00                                                                                                                                                      ▒
+    3.14%     0.00%  scylla   scylla               [.] 0x00000000002d767d

@asias
Copy link
Contributor

asias commented Sep 28, 2015

scylla$ addr2line -Cfpi -e build/release/scylla.aws.symbol 0x00000000000d7dfe
?? ??:0
scylla$ addr2line -Cfpi -e build/release/scylla.aws.symbol 0x0000000000099ee6
?? ??:0
scylla$ addr2line -Cfpi -e build/release/scylla.aws.symbol 0x000000000008e04a
?? ??:0

@gleb-cloudius
Copy link
Contributor Author

  •        auto sem = make_shared<semaphore>(0);
    
  •        logger.info("Before invoke_on_all");
         _the_gossiper.invoke_on_all([this, endpoint_map_changed,
    
  •            live_endpoint_changed] (gossiper& local_gossiper) {
    
  •            live_endpoint_changed, sem] (gossiper& local_gossiper) {
             // Don't copy gossiper(CPU0) maps into themselves!
    

    Add logging here to see which cpu does not respond.

             if (engine().cpu_id() != 0) {
                 if (endpoint_map_changed) {
    

    @@ -564,8 +568,14 @@ void gossiper::run() {
    local_gossiper._live_endpoints = _shadow_live_endpoints;
    }
    }

  •        }).finally([sem] {
    
  •            sem->signal();
    
  •        });
    
  •        sem->wait(std::chrono::seconds(10)).handle_exception([](auto ep) {
    
  •            logger.info("Fail to replicate gossip data to other nodes: {}", ep);
         }).get();
     }
    

        Gleb.

@tgrabiec
Copy link
Contributor

2015-09-28 13:55 GMT+02:00 Asias He notifications@github.com:

I added debug code to check if replicate data to other cores will block:

diff --git a/gms/gossiper.cc b/gms/gossiper.cc
index 012a19a..bbb569a 100644
--- a/gms/gossiper.cc
+++ b/gms/gossiper.cc
@@ -533,6 +533,8 @@ void gossiper::run() {
do_status_check();
}

  •    logger.info("After do_status_check");
    
    • //
      // Gossiper task runs only on CPU0:
      //
      @@ -552,8 +554,10 @@ void gossiper::run() {
      _shadow_live_endpoints = _live_endpoints;
      }
  •        auto sem = make_shared<semaphore>(0);
    
  •        logger.info("Before invoke_on_all");
         _the_gossiper.invoke_on_all([this, endpoint_map_changed,
    
  •            live_endpoint_changed] (gossiper& local_gossiper) {
    
  •            live_endpoint_changed, sem] (gossiper& local_gossiper) {
             // Don't copy gossiper(CPU0) maps into themselves!
             if (engine().cpu_id() != 0) {
                 if (endpoint_map_changed) {
    

    @@ -564,8 +568,14 @@ void gossiper::run() {
    local_gossiper._live_endpoints = _shadow_live_endpoints;
    }
    }

  •        }).finally([sem] {
    
  •            sem->signal();
    
  •        });
    
  •        sem->wait(std::chrono::seconds(10)).handle_exception([](auto ep) {
    
  •            logger.info("Fail to replicate gossip data to other nodes: {}", ep);
         }).get();
     }
    
  •    logger.info("After invoke_on_all");
    

    }).then_wrapped([this](auto&& f) {
    try {
    f.get();
    @@ -579,6 +589,8 @@ void gossiper::run() {
    logger.debug("ep={}, eps={}", x.first, x.second);
    }
    }

  • }).finally([this] {

  •    logger.info("=== Gossip SET TIMER");
     _scheduled_gossip_task.arm(INTERVAL);
    

    });
    }

Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 5] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-160-Data.db]. 30050983 bytes to 9092879 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 22] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-177-Data.db]. 30050140 bytes to 9087259 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 2] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-157-Data.db]. 30080769 bytes to 9076019 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 30] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-185-Data.db]. 30045925 bytes to 9083887 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 0] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-155-Data.db]. 30045082 bytes to 9068151 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 7] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-162-Data.db]. 30065314 bytes to 9098218 (~30
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 12] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-167-Data.db]. 30149614 bytes to 9072366 (~3
Sep 28 10:38:12 ip-172-31-33-155 scylla[8435]: [shard 10] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-41-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 17] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-172-Data.db]. 30081893 bytes to 9068151 (~3
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 15] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-46-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After do_status_check
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After invoke_on_all
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 1] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-32-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7d
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 24] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-55-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 26] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-57-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 28] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-59-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:13 ip-172-31-33-155 scylla[8435]: [shard 13] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-168-Data.db]. 30102406 bytes to 9070118 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 6] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-161-Data.db]. 30110555 bytes to 9051853 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 29] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-184-Data.db]. 30116456 bytes to 9054382 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 4] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-159-Data.db]. 30107183 bytes to 9056349 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After do_status_check
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 25] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-180-Data.db]. 30145680 bytes to 9055787 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 16] compaction - Compacting [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-47-Data.db:level=0, /data/data/kei4Ohfe/standard1-b42bc7
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 11] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-166-Data.db]. 30109993 bytes to 9042861 (~3
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 3] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-158-Data.db]. 30131630 bytes to 9052134 (~30
Sep 28 10:38:14 ip-172-31-33-155 scylla[8435]: [shard 10] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-165-Data.db]. 30162259 bytes to 9048481 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 1] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-156-Data.db]. 30108588 bytes to 9035274 (~30
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 24] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-179-Data.db]. 30108026 bytes to 9031340 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 15] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-170-Data.db]. 30111960 bytes to 9028249 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 26] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-181-Data.db]. 30051264 bytes to 9015604 (~3
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 28] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-183-Data.db]. 30067843 bytes to 9012232 (~2
Sep 28 10:38:15 ip-172-31-33-155 scylla[8435]: [shard 16] compaction - Compacted 4 sstables to [/data/data/kei4Ohfe/standard1-b42bc7d065cc11e59d7b000000000009/kei4Ohfe-standard1-ka-171-Data.db]. 30017544 bytes to 8991157 (~2
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After invoke_on_all
Sep 28 10:38:24 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:25 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After do_status_check
Sep 28 10:38:25 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After invoke_on_all
Sep 28 10:38:35 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:36 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After do_status_check
Sep 28 10:38:36 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After invoke_on_all
Sep 28 10:38:46 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - === Gossip SET TIMER
Sep 28 10:38:47 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After do_status_check
Sep 28 10:38:47 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Before invoke_on_all
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - After invoke_on_all
Sep 28 10:38:57 ip-172-31-33-155 scylla[8435]: [shard 0] gossip - === Gossip SET TIMER

[fedora@ip-172-31-37-234 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]640
===================check 172.31.33.156
["172.31.33.155","172.31.33.156","172.31.33.157"]4262
===================check 172.31.33.157
["172.31.33.155","172.31.33.156","172.31.33.157"]4282
[fedora@ip-172-31-37-234 c-s]$
[fedora@ip-172-31-37-234 c-s]$ sleep 5
[fedora@ip-172-31-37-234 c-s]$ sh run.live.sh
===================check 172.31.33.155
["172.31.33.155","172.31.33.156","172.31.33.157"]641
===================check 172.31.33.156
["172.31.33.155","172.31.33.156","172.31.33.157"]4275
===================check 172.31.33.157
["172.31.33.155","172.31.33.156","172.31.33.157"]4296

26.83% [kernel] [k] pvclock_clocksource_read
26.63% scylla [.] futurize<future<> >::apply<do_flush_directory(basic_sstring<char, unsigned i

If this is accurate location, it looks suspicious. I can see that
do_flush_directory() is called only from do_recursive_touch_directory(),
which is called from recursive_touch_directory(), which is called during
shard startup. Could it be that do_recursive_touch_directory() entered an
infinite recursion?

nt, 15u>)::{lambda(file)#1}, file>
9.66% scylla [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_erase
6.89% scylla [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
6.83% scylla [.] reactor::stop
3.19% scylla [.] future<> parallel_for_each<boost::range_detail::integer_iterator, seastar::shardedcql3::query_processor::stop()::{lambda(unsigned int)#1}>(boost::range_detail::integer_iterat
2.50% [kernel] [k] system_call_after_swapgs
1.67% scylla [.] reactor::reactor
1.48% [kernel] [k] __fget
1.34% [kernel] [k] _raw_spin_unlock_irqrestore
1.26% libc-2.21.so [.] __libc_enable_asynccancel
1.17% [vdso] [.] __vdso_clock_gettime
0.74% scylla [.] smp_message_queue::async_work_itemreactor::exit(int)::{lambda()#1}::process
0.72% libc-2.21.so [.] 0x0000000000103193
0.64% [kernel] [k] copy_user_enhanced_fast_string
0.58% [kernel] [k] sys_clock_gettime
0.51% libc-2.21.so [.] __libc_disable_asynccancel
0.47% [kernel] [k] fput
0.47% [kernel] [k] sys_epoll_wait
0.45% [kernel] [k] __fget_light
0.44% [kernel] [k] __getnstimeofday64
0.32% libaio.so.1.0.1 [.] io_getevents
0.31% [kernel] [k] ep_poll
0.30% scylla [.] sstables::sstable::prepare_write_components
0.27% scylla [.] validate
0.27% scylla [.] __static_initialization_and_destruction_0
0.24% libstdc++.so.6.0.21 [.] std::chrono::_V2::system_clock::now
0.22% libc-2.21.so [.] __clock_gettime
0.21% scylla [.] __static_initialization_and_destruction_0
0.19% scylla [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
0.17% scylla [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor11rename_fileE13basic_sstringIcjLj15EES7_EUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED2Ev
0.16% scylla [.] _GLOBAL__sub_I__ZN3api18set_stream_managerERNS_12http_contextERN5httpd6routesE
0.14% scylla [.] ZN8futurizeI6futureIJ16temporary_bufferIcEEEE5applyIZNS3_7finallyIZ7do_withI12input_streamIcEZN8sstables7sstable9data_readEmmEUlRT_E_EDaOSC_OT0_EUlvE_EES3_SF_EUlS3_E_JS3_EEES3_SF_DpOT0.isr
0.14% [kernel] [k] _copy_to_user
0.14% [kernel] [k] _raw_spin_lock_irqsave
0.13% scylla [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
0.12% [kernel] [k] xen_clocksource_get_cycles
0.12% [kernel] [k] posix_clock_realtime_get
0.10% scylla [.] __static_initialization_and_destruction_0
0.09% scylla [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
0.09% scylla [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN18blockdev_file_impl7discardEmmEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
0.09% scylla [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor14make_directoryE13basic_sstringIcjLj15EEEUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED0Ev
0.08% [kernel] [k] __fdget
0.07% scylla [.] futurestd::tuple<future<, future<> > >::then<future<std::tuple<future<>, future<> > >::discard_result()::{lambda(std::tuple<future<>, future<> >&&)#1}, future<> >
0.07% scylla [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED2Ev
0.07% [kernel] [k] system_call_fastpath
0.07% [kernel] [k] getnstimeofday64
0.06% scylla [.] boost::exception_detail::clone_implboost::exception_detail::error_info_injector<boost::lock_error >::~clone_impl
0.06% libc-2.21.so [.] epoll_wait
0.05% libc-2.21.so [.] 0x0000000000103197


Reply to this email directly or view it on GitHub
#403 (comment).

@asias
Copy link
Contributor

asias commented Sep 28, 2015

I did see compaction in the log around the replication timeout. @raphaelsc Can you take a look at do_recursive_touch_directory?

@gleb-cloudius
Copy link
Contributor Author

On Mon, Sep 28, 2015 at 06:13:20AM -0700, Tomasz Grabiec wrote:

26.83% [kernel] [k] pvclock_clocksource_read
26.63% scylla [.] futurize<future<> >::apply<do_flush_directory(basic_sstring<char, unsigned i

If this is accurate location, it looks suspicious. I can see that
do_flush_directory() is called only from do_recursive_touch_directory(),
which is called from recursive_touch_directory(), which is called during
shard startup. Could it be that do_recursive_touch_directory() entered an
infinite recursion?

I wouldn't put to much trust into symbol names here since there are
reactor::stop at 6.83% which is definitely nonsense.

nt, 15u>)::{lambda(file)#1}, file>
9.66% scylla [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_erase
6.89% scylla [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
6.83% scylla [.] reactor::stop

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 29, 2015

 perf top

  28.32%  scylla               [.] futurize<future<> >::apply<future<> future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()() const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
  25.05%  [kernel]             [k] pvclock_clocksource_read
  10.27%  scylla               [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_push_back_aux<smp_message_queue::work_item* const&>
   8.32%  scylla               [.] reactor::stop()::{lambda()#1}::operator()
   6.13%  scylla               [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
   3.21%  scylla               [.] future<> parallel_for_each<boost::range_detail::integer_iterator<unsigned int>, seastar::sharded<cql3::query_processor>::stop()::{lambda(unsigned int)#1}>(boost::range_detail::integer_iterat
   2.44%  [kernel]             [k] system_call_after_swapgs
   1.41%  scylla               [.] _ZN12continuationIZN6futureIJEE4thenIZ28do_recursive_touch_directory13basic_sstringIcjLj15EES4_EUlvE_S1_EET0_OT_EUlS8_E_JEE3runEv
   1.39%  [kernel]             [k] __fget
   1.29%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.17%  [vdso]               [.] __vdso_clock_gettime
   1.14%  libc-2.21.so         [.] __libc_enable_asynccancel
   0.69%  libc-2.21.so         [.] 0x0000000000103193
   0.61%  [kernel]             [k] copy_user_enhanced_fast_string
   0.59%  [kernel]             [k] sys_clock_gettime
   0.55%  [kernel]             [k] __fget_light
   0.48%  libc-2.21.so         [.] __libc_disable_asynccancel
   0.47%  [kernel]             [k] __getnstimeofday64
   0.47%  [kernel]             [k] fput
   0.45%  [kernel]             [k] sys_epoll_wait
   0.34%  libaio.so.1.0.1      [.] io_getevents
   0.32%  scylla               [.] sstables::sstable::prepare_write_components
   0.30%  [kernel]             [k] ep_poll
   0.28%  scylla               [.] validate
   0.23%  scylla               [.] reactor::~reactor
   0.22%  scylla               [.] __static_initialization_and_destruction_0
   0.21%  libstdc++.so.6.0.21  [.] std::chrono::_V2::system_clock::now
   0.19%  scylla               [.] std::fill
   0.19%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.19%  scylla               [.] __static_initialization_and_destruction_0
   0.19%  libc-2.21.so         [.] __clock_gettime
   0.16%  scylla               [.] _ZN8futurizeI6futureIJ16temporary_bufferIcEEEE5applyIZNS3_7finallyIZ7do_withI12input_streamIcEZN8sstables7sstable9data_readEmmEUlRT_E_EDaOSC_OT0_EUlvE_EES3_SF_EUlS3_E_JS3_EEES3_SF_DpOT0_.isr
   0.16%  scylla               [.] _GLOBAL__sub_I__ZN3api18set_stream_managerERNS_12http_contextERN5httpd6routesE
   0.14%  [kernel]             [k] _copy_to_user
   0.13%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor11rename_fileE13basic_sstringIcjLj15EES7_EUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED2Ev
   0.13%  [kernel]             [k] _raw_spin_lock_irqsave
   0.12%  [kernel]             [k] xen_clocksource_get_cycles
   0.12%  [kernel]             [k] posix_clock_realtime_get
   0.10%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.09%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.09%  scylla               [.] __static_initialization_and_destruction_0
   0.09%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN18blockdev_file_impl7discardEmmEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.09%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor14make_directoryE13basic_sstringIcjLj15EEEUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED0Ev
   0.08%  scylla               [.] prefetch<2ul, 3>
   0.08%  libc-2.21.so         [.] epoll_wait
   0.07%  [kernel]             [k] __fdget
   0.07%  [kernel]             [k] system_call_fastpath
   0.06%  [kernel]             [k] getnstimeofday64
   0.05%  scylla               [.] futurize<future<std::tuple<future<> > > >::apply<future<std::tuple<future<>> > when_all<>(future<>&&)::{lambda(future<>&&)#1}::operator()(future<>&&)::{lambda(future<std::tuple<> >&&)#1}, st
   0.05%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED2Ev
   0.05%  libc-2.21.so         [.] 0x0000000000103197

perf top -C 6

  31.65%  scylla               [.] futurize<future<> >::apply<future<> future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()() const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
  26.13%  [kernel]             [k] pvclock_clocksource_read
   8.60%  scylla               [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_push_back_aux<smp_message_queue::work_item* const&>
   6.92%  scylla               [.] reactor::stop()::{lambda()#1}::operator()
   6.67%  scylla               [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
   2.66%  [kernel]             [k] system_call_after_swapgs
   1.76%  scylla               [.] _ZN12continuationIZN6futureIJEE4thenIZ28do_recursive_touch_directory13basic_sstringIcjLj15EES4_EUlvE_S1_EET0_OT_EUlS8_E_JEE3runEv
   1.36%  [kernel]             [k] __fget
   1.32%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.13%  [vdso]               [.] __vdso_clock_gettime
   1.13%  libc-2.21.so         [.] __libc_enable_asynccancel
   1.10%  [kernel]             [k] __fget_light
   0.67%  libc-2.21.so         [.] 0x0000000000103193
   0.66%  [kernel]             [k] copy_user_enhanced_fast_string
   0.62%  [kernel]             [k] sys_clock_gettime
   0.57%  [kernel]             [k] sys_epoll_wait
   0.54%  [kernel]             [k] fput
   0.50%  libc-2.21.so         [.] __libc_disable_asynccancel
   0.47%  [kernel]             [k] xen_clocksource_get_cycles
   0.37%  [kernel]             [k] __getnstimeofday64
   0.35%  scylla               [.] __static_initialization_and_destruction_0
   0.32%  scylla               [.] sstables::sstable::prepare_write_components
   0.29%  scylla               [.] __static_initialization_and_destruction_0
   0.27%  scylla               [.] validate
   0.25%  [kernel]             [k] ep_poll
   0.25%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.24%  scylla               [.] reactor::~reactor


perf top -C 5

  32.51%  scylla               [.] futurize<future<> >::apply<future<> future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()() const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
  22.95%  [kernel]             [k] pvclock_clocksource_read
  10.55%  scylla               [.] reactor::stop()::{lambda()#1}::operator()
   7.87%  scylla               [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_push_back_aux<smp_message_queue::work_item* const&>
   6.18%  scylla               [.] _GLOBAL__sub_I__ZN11cql3_parser19CqlParserTokenNamesE
   2.04%  [kernel]             [k] system_call_after_swapgs
   1.91%  [kernel]             [k] __fget
   1.60%  scylla               [.] _ZN12continuationIZN6futureIJEE4thenIZ28do_recursive_touch_directory13basic_sstringIcjLj15EES4_EUlvE_S1_EET0_OT_EUlS8_E_JEE3runEv
   1.40%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.26%  [vdso]               [.] __vdso_clock_gettime
   1.08%  libc-2.21.so         [.] __libc_enable_asynccancel
   1.03%  libaio.so.1.0.1      [.] io_getevents
   1.00%  [kernel]             [k] sys_epoll_wait
   0.77%  libc-2.21.so         [.] 0x0000000000103193
   0.67%  [kernel]             [k] copy_user_enhanced_fast_string
   0.63%  [kernel]             [k] sys_clock_gettime
   0.56%  scylla               [.] _ZN8futurizeI6futureIJ16temporary_bufferIcEEEE5applyIZNS3_7finallyIZ7do_withI12input_streamIcEZN8sstables7sstable9data_readEmmEUlRT_E_EDaOSC_OT0_EUlvE_EES3_SF_EUlS3_E_JS3_EEES3_SF_DpOT0_.isr
   0.51%  libc-2.21.so         [.] __libc_disable_asynccancel
   0.49%  [kernel]             [k] fput
   0.45%  [kernel]             [k] __getnstimeofday64
   0.37%  scylla               [.] sstables::sstable::prepare_write_components
   0.32%  [kernel]             [k] ep_poll
   0.30%  scylla               [.] validate
   0.29%  scylla               [.] reactor::~reactor
   0.26%  [kernel]             [k] __fget_light
   0.21%  scylla               [.] std::fill
   0.21%  scylla               [.] __static_initialization_and_destruction_0
   0.20%  [kernel]             [k] _copy_to_user
   0.18%  libstdc++.so.6.0.21  [.] std::chrono::_V2::system_clock::now
   0.17%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN15posix_file_impl5flushEvEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.17%  scylla               [.] continuation<future<> future<>::finally<auto do_with<file, do_flush_directory(basic_sstring<char, unsigned int, 15u>)::{lambda(file)#1}::operator()(file) const::{lambda(file&)#1}>(file&&, do
   0.15%  scylla               [.] _GLOBAL__sub_I__ZN3api18set_stream_managerERNS_12http_contextERN5httpd6routesE
   0.14%  [kernel]             [k] _raw_spin_lock_irqsave
   0.13%  scylla               [.] __static_initialization_and_destruction_0
   0.12%  scylla               [.] __static_initialization_and_destruction_0
   0.10%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor14make_directoryE13basic_sstringIcjLj15EEEUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED0Ev
   0.09%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN7reactor11rename_fileE13basic_sstringIcjLj15EES7_EUlS2_E0_S0_IJEEEET0_OT_EUlSC_E_JS2_EED2Ev
   0.08%  [kernel]             [k] xen_clocksource_get_cycles
   0.08%  scylla               [.] _ZN12continuationIZN6futureIJ14syscall_resultIiEEE4thenIZN18blockdev_file_impl7discardEmmEUlS2_E0_S0_IJEEEET0_OT_EUlSA_E_JS2_EED0Ev
   0.08%  [kernel]             [k] posix_clock_realtime_get
   0.08%  libc-2.21.so         [.] __clock_gettime
   0.07%  [kernel]             [k] __fdget
   0.06%  [kernel]             [k] getnstimeofday64
   0.06%  scylla               [.] futurize<future<std::tuple<future<> > > >::apply<future<std::tuple<future<>> > when_all<>(future<>&&)::{lambda(future<>&&)#1}::operator()(future<>&&)::{lambda(future<std::tuple<> >&&)#1}, st
   0.06%  [kernel]             [k] system_call_fastpath

sudo perf record --call-graph dwarf -p `pgrep scylla` && sudo perf report

-   65.34%     1.69%  scylla   scylla               [.] reactor::run                                                                                                                                                            ▒
   - reactor::run                                                                                                                                                                                                               ▒
      + 96.67% smp::configure(boost::program_options::variables_map)::{lambda()#1}::operator()                                                                                                                                  ▒
      + 3.33% app_template::run_deprecated                                                                                                                                                                                      ▒
-   63.16%     0.00%  scylla   libc-2.21.so         [.] __clone                                                                                                                                                                 ▒
     __clone                                                                                                                                                                                                                    ◆
-   63.16%     0.00%  scylla   libpthread-2.21.so   [.] start_thread                                                                                                                                                            ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   63.16%     0.00%  scylla   scylla               [.] posix_thread::start_routine                                                                                                                                             ▒
     posix_thread::start_routine                                                                                                                                                                                                ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   63.16%     0.00%  scylla   scylla               [.] smp::configure(boost::program_options::variables_map)::{lambda()#1}::operator()                                                                                         ▒
     smp::configure(boost::program_options::variables_map)::{lambda()#1}::operator()                                                                                                                                            ▒
     posix_thread::start_routine                                                                                                                                                                                                ▒
     start_thread                                                                                                                                                                                                               ▒
     __clone                                                                                                                                                                                                                    ▒
-   52.75%     9.20%  scylla   scylla               [.] smp::poll_queues                                                                                                                                                        ▒
     smp::poll_queues                                                                                                                                                                                                           ▒
   + reactor::run                                                                                                                                                                                                               ▒
-   32.39%     0.06%  scylla   [kernel.kallsyms]    [k] system_call_fastpath                                                                                                                                                    ▒
   - system_call_fastpath                                                                                                                                                                                                       ▒
      + 0x7f409d07c193                                                                                                                                                                                                          ▒
-   28.92%    28.92%  scylla   scylla               [.] smp_message_queue::move_pending                                                                                                                                         ▒
     smp_message_queue::move_pending                                                                                                                                                                                            ▒
     smp::poll_queues                                                                                                                                                                                                           ▒
   + reactor::run                                                                                                                                                                                                               ▒
-   26.99%     0.60%  scylla   [kernel.kallsyms]    [k] sys_clock_gettime                                                                                                                                                       ▒
   - sys_clock_gettime                                                                                                                                                                                                          ▒
        system_call_fastpath                                                                                                                                                                                                    ▒
-   26.43%     0.11%  scylla   [kernel.kallsyms]    [k] posix_clock_realtime_get                                                                                                                                                ▒
   - posix_clock_realtime_get                                                                                                                                                                                                   ▒
      + sys_clock_gettime                                                                                                                                                                                                       ▒
+   26.33%     0.06%  scylla   [kernel.kallsyms]    [k] getnstimeofday64                                                                                                                                                        ▒
+   26.26%     0.43%  scylla   [kernel.kallsyms]    [k] __getnstimeofday64                                                                                                                                                      ▒
+   25.84%     0.12%  scylla   [kernel.kallsyms]    [k] xen_clocksource_get_cycles                                                                                                                                              ▒
+   25.75%    25.75%  scylla   [kernel.kallsyms]    [k] pvclock_clocksource_read                                                                                                                                                ▒
+    7.54%     0.22%  scylla   scylla               [.] reactor_backend_epoll::wait_and_process                                                                                                                                 ▒
-    6.34%     6.34%  scylla   scylla               [.] smp_message_queue::flush_response_batch                                                                                                                                 ▒
     smp_message_queue::flush_response_batch                                                                                                                                                                                    ▒
     smp::poll_queues                                                                                                                                                                                                           ▒
   + reactor::run                                                                                                                                                                                                               ▒
+    5.47%     0.70%  scylla   libc-2.21.so         [.] 0x0000000000103193                                                                                                                                                      ▒
+    4.15%     4.15%  scylla   scylla               [.] smp_message_queue::process_queue<2ul, smp_message_queue::process_incoming()::{lambda(smp_message_queue::work_item*)#1}>                                                 ▒
+    4.12%     4.12%  scylla   scylla               [.] smp_message_queue::process_queue<4ul, smp_message_queue::process_completions()::{lambda(smp_message_queue::work_item*)#1}>                                              ▒
+    4.10%     0.45%  scylla   [kernel.kallsyms]    [k] sys_epoll_wait                                                                                                                                                          ▒
+    3.18%     0.00%  scylla   libc-2.21.so         [.] __start_context                                                                                                                                                         ▒
+    3.18%     0.00%  scylla   scylla               [.] 0x00000000000eec02                                                                                                                                                      ▒
+    3.18%     0.00%  scylla   scylla               [.] seastar::thread_context::main                                                                                                                                           ▒
+    3.18%     0.00%  scylla   scylla               [.] std::_Function_handler<void (), row_cache::update(memtable&, std::function<partition_presence_checker_result (partition_key const&)>)::{lambda()#1}>::_M_invoke         ▒

system log: CPU 0 to CPU 30 (--cpuset 1-31 --smp 31). CPU 5 is not responding.

Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - After do_status_check
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - Before invoke_on_all
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - CPU0 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 8] gossip - CPU8 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 10] gossip - CPU10 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 28] gossip - CPU28 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 16] gossip - CPU16 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 2] gossip - CPU2 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 15] gossip - CPU15 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 4] gossip - CPU4 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 1] gossip - CPU1 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 3] gossip - CPU3 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 6] gossip - CPU6 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 12] gossip - CPU12 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 11] gossip - CPU11 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 13] gossip - CPU13 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 22] gossip - CPU22 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 29] gossip - CPU29 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 30] gossip - CPU30 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 14] gossip - CPU14 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 7] gossip - CPU7 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 17] gossip - CPU17 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 18] gossip - CPU18 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 20] gossip - CPU20 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 19] gossip - CPU19 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 9] gossip - CPU9 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 21] gossip - CPU21 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 24] gossip - CPU24 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 23] gossip - CPU23 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 25] gossip - CPU25 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 26] gossip - CPU26 replicated
Sep 29 00:56:16 ip-172-31-44-93 scylla[1694]:  [shard 27] gossip - CPU27 replicated
Sep 29 00:56:26 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - Fail to replicate gossip data to other nodes: semaphore_timed_out (Semaphore timedout)
Sep 29 00:56:26 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - After invoke_on_all
Sep 29 00:56:26 ip-172-31-44-93 scylla[1694]:  [shard 0] gossip - === Gossip SET TIMER

@gleb-cloudius
Copy link
Contributor Author

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

 perf top

  28.32%  scylla               [.] futurize<future<> >::apply<future<> future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()() const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
  25.05%  [kernel]             [k] pvclock_clocksource_read
  10.27%  scylla               [.] std::deque<smp_message_queue::work_item*, std::allocator<smp_message_queue::work_item*> >::_M_push_back_aux<smp_message_queue::work_item* const&>
   8.32%  scylla               [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have you
tried to stop scylla?

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 29, 2015

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov notifications@github.com
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>
future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()
const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used unstripped
binary this time.

Gleb.


Reply to this email directly or view it on GitHub
#403 (comment).

@gleb-cloudius
Copy link
Contributor Author

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov notifications@github.com
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>
future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()
const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():
25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 29, 2015

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov notifications@github.com
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()
const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Gleb.


Reply to this email directly or view it on GitHub
#403 (comment).

@gleb-cloudius
Copy link
Contributor Author

On Tue, Sep 29, 2015 at 01:15:53AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov notifications@github.com
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()
const::{lambda()#1} ()>::type>::type smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Can you reproduce again? It look like cpu is in endless loop somewhere,
so we should easily see it in the debugger.

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 29, 2015

On Tue, Sep 29, 2015 at 4:21 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:15:53AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()

const::{lambda()#1} ()>::type>::type
smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have
you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used
unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Can you reproduce again? It look like cpu is in endless loop somewhere,
so we should easily see it in the debugger.

Gleb, can you do it yourself this time, I'm not familiar with the code with
regard to the seastar smp thing.
When you reproduce it you can get whatever you want to check?

Gleb.


Reply to this email directly or view it on GitHub
#403 (comment).

@gleb-cloudius
Copy link
Contributor Author

On Tue, Sep 29, 2015 at 01:32:03AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:21 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:15:53AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()

const::{lambda()#1} ()>::type>::type
smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf? Have
you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used
unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Can you reproduce again? It look like cpu is in endless loop somewhere,
so we should easily see it in the debugger.

Gleb, can you do it yourself this time, I'm not familiar with the code with
regard to the seastar smp thing.
When you reproduce it you can get whatever you want to check?

I can certainly try. Any specific instructions that may help? Patch that
adds debug output? But there is no need to be familiar with smp thing, just
check what cpu that is not responding is doing, it will probably be in
an endless loop somewhere.

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 29, 2015

On Tue, Sep 29, 2015 at 4:36 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:32:03AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:21 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:15:53AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()

const::{lambda()#1} ()>::type>::type
smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf?
Have
you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used
unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Can you reproduce again? It look like cpu is in endless loop somewhere,
so we should easily see it in the debugger.

Gleb, can you do it yourself this time, I'm not familiar with the code
with
regard to the seastar smp thing.
When you reproduce it you can get whatever you want to check?

I can certainly try. Any specific instructions that may help? Patch that
adds debug output? But there is no need to be familiar with smp thing, just
check what cpu that is not responding is doing, it will probably be in
an endless loop somewhere.

The code is here:

    dev  asias/improve_gossip

.

I think you see this problem originally. Anyway, start 3 scylla nodes using
3 c3.8xlarge instance. start 2 c4.8xlarge instance for c-s.

I will send you my scripts to generate the load.

If it turns out it is not an endless loop in gdb, you might want to check
more stuff.

Gleb.


Reply to this email directly or view it on GitHub
#403 (comment).

@gleb-cloudius
Copy link
Contributor Author

On Tue, Sep 29, 2015 at 01:49:41AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:36 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:32:03AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:21 PM, Gleb Natapov notifications@github.com
wrote:

On Tue, Sep 29, 2015 at 01:15:53AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 4:03 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Tue, Sep 29, 2015 at 12:51:57AM -0700, Asias He wrote:

On Tue, Sep 29, 2015 at 3:49 PM, Gleb Natapov <
notifications@github.com>
wrote:

On Mon, Sep 28, 2015 at 06:16:20PM -0700, Asias He wrote:

perf top

28.32% scylla [.] futurize<future<> >::apply<future<>

future<>::finally<futurize<std::result_of<reactor::stop()::{lambda()#1}::operator()()

const::{lambda()#1} ()>::type>::type
smp::submit_to<reactor::stop():

25.05% [kernel] [k] pvclock_clocksource_read
10.27% scylla [.] std::deque<smp_message_queue::work_item*,
std::allocator<smp_message_queue::work_item*>
::_M_push_back_aux<smp_message_queue::work_item* const&>
8.32% scylla [.] reactor::stop()::{lambda()#1}::operator()
Why there are reactor::stop() related function is the perf?
Have
you
tried to stop scylla?

No, I did not. It looks like the symbol is bogus. BTW, I used
unstripped
binary this time.

Can you connect to it with gdb and see what CPU5 is busy with?

no, i can not. The node is terminated.

Can you reproduce again? It look like cpu is in endless loop somewhere,
so we should easily see it in the debugger.

Gleb, can you do it yourself this time, I'm not familiar with the code
with
regard to the seastar smp thing.
When you reproduce it you can get whatever you want to check?

I can certainly try. Any specific instructions that may help? Patch that
adds debug output? But there is no need to be familiar with smp thing, just
check what cpu that is not responding is doing, it will probably be in
an endless loop somewhere.

The code is here:

    dev  asias/improve_gossip

.

I think you see this problem originally. Anyway, start 3 scylla nodes using
3 c3.8xlarge instance. start 2 c4.8xlarge instance for c-s.

Yes, I saw it only once.

        Gleb.

@gleb-cloudius
Copy link
Contributor Author

Here is the stack of infinity loop:

#0  get_next (n=<synthetic pointer>) at /usr/include/boost/intrusive/detail/list_node.hpp:58
#1  operator++ (this=<synthetic pointer>) at /usr/include/boost/intrusive/detail/list_iterator.hpp:76
#2  boost::heap::binomial_heap<logalloc::segment*, boost::heap::compare<logalloc::segment_occupancy_descending_less_compare>, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::merge_and_clear_nodes (this=this@entry=0x601000193030, rhs=...) at /usr/include/boost/heap/binomial_heap.hpp:730
#3  0x00000000007a0a2d in boost::heap::binomial_heap<logalloc::segment*, boost::heap::compare<logalloc::segment_occupancy_descending_less_compare>, boost::parameter::void_, boost::parameter::void_, boost::parameter::void_>::pop (this=this@entry=0x601000193030) at /usr/include/boost/heap/binomial_heap.hpp:410
#4  0x00000000007a0fb0 in erase (handle=..., this=0x601000193030) at /usr/include/boost/heap/binomial_heap.hpp:581
#5  logalloc::region_impl::free (this=0x601000193010, obj=<optimized out>) at utils/logalloc.cc:819
#6  0x00000000007ad916 in ~managed_bytes (this=0x60100e599ed8, __in_chrg=<optimized out>) at utils/managed_bytes.hh:111
#7  managed_bytes::operator=(managed_bytes&&) (this=this@entry=0x60100e599ed8, o=o@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x1bec27, DIE 0x1beccd>)
    at utils/managed_bytes.hh:119
#8  0x00000000007a31e3 in operator= (this=0x60100e599ed8) at ./atomic_cell.hh:265
#9  merge_column(column_definition const&, atomic_cell_or_collection&, atomic_cell_or_collection&&) (def=..., old=..., 
    neww=neww@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x1d30283, DIE 0x1d74999>) at mutation_partition.cc:497
#10 0x00000000007a8bf8 in row::apply(column_definition const&, atomic_cell_or_collection&&) (this=this@entry=0x60100e599e90, column=..., 
    value=value@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x1d30283, DIE 0x1d74999>) at mutation_partition.cc:525
#11 0x00000000007a91af in operator() (cell=..., id=4, __closure=<optimized out>) at mutation_partition.cc:872
#12 for_each_cell_until<row::merge(const schema&, column_kind, row&&)::<lambda(column_id, atomic_cell_or_collection&)> > (func=<optimized out>, this=0x60100822b140)
    at mutation_partition.hh:200
#13 row::merge(schema const&, column_kind, row&&) (this=0x60100e599e90, s=..., kind=kind@entry=column_kind::regular_column, 
    other=other@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x1d30283, DIE 0x1db4896>) at mutation_partition.cc:874
#14 0x00000000007a93e2 in mutation_partition::apply(schema const&, mutation_partition&&) (this=this@entry=0x6010136426b8, s=..., 
    p=p@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x1d30283, DIE 0x1db5b56>) at mutation_partition.cc:99
#15 0x0000000000788531 in operator() (__closure=<optimized out>) at row_cache.cc:265
#16 operator()<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()>::<lambda()> > (func=<optimized out>, r=..., this=0x601000398530) at utils/logalloc.hh:294
#17 operator() (__closure=<optimized out>) at row_cache.cc:279
#18 with_allocator<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()> > (func=<optimized out>, alloc=...) at utils/allocation_strategy.hh:204
#19 operator() (__closure=0x601000617540) at row_cache.cc:295
#20 std::_Function_handler<void(), row_cache::update(memtable&, partition_presence_checker)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/5.1.1/functional:1871
#21 0x000000000041a24e in seastar::thread_context::main (this=0x601025025400) at /usr/include/c++/5.1.1/functional:2271
#22 0x00000000004f5802 in seastar::thread_context::s_main (lo=<optimized out>, hi=<optimized out>) at core/thread.cc:130
#23 0x00007f0d08293fa0 in ?? () from /lib64/libc.so.6
#24 0x0000000000000000 in ?? ()

        Gleb.

@slivne slivne assigned tgrabiec and unassigned asias Sep 29, 2015
@slivne slivne added this to the barracuda milestone Oct 1, 2015
@tgrabiec
Copy link
Contributor

tgrabiec commented Oct 6, 2015

Looking at the GDB dump, it looks like some sort of a segment heap corruption. This happens when a segment is freed. The segment descriptor looks sane, I ruled out double-free. I'm not yet sure what could cause the corruption.

I tried to reproduce it on EC2 using 3 scylla machines and 5 cassandra-stress machines (config as Asias mentioned). It survived a 5 hour run of RF=3 CL=ALL, and also a couple of shorter 1-2 hour runs.

Maybe it's related to how c-s clients are started, @asias @gleb-cloudius , how exactly do you start the load? Was it on a fresh database, or an already populated one?

@gleb-cloudius
Copy link
Contributor Author

On Tue, Oct 06, 2015 at 12:45:04AM -0700, Tomasz Grabiec wrote:

Looking at the GDB dump, it looks like some sort of a segment heap corruption. This happens when a segment is freed. The segment descriptor looks sane, I ruled out double-free. I'm not yet sure what could cause the corruption.

I tried to reproduce it on EC2 using 3 scylla machines and 5 cassandra-stress machines (config as Asias mentioned). It survived a 5 hour run of RF=3 CL=ALL, and also a couple of shorter 1-2 hour runs.

Maybe it's related to how c-s clients are started, @asias @gleb-cloudius , how exactly do you start the load? Was it on a fresh database, or an already populated one?

I am starting two clients from two different loaders on empty DB like
that:

cassandra-stress write no-warmup duration=1h cl=ALL -mode native cql3
-rate threads=1000 -node $IP -schema
replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=3)
-pop seq=1..10000000

cassandra-stress write no-warmup duration=1h cl=ALL -mode native cql3
-rate threads=1000 -node $IP -schema
replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=3)
-pop seq=10000000..20000000

        Gleb.

@tgrabiec
Copy link
Contributor

tgrabiec commented Oct 6, 2015

@gleb-cloudius Is $IP is an IP of one of the servers, or a list of IPs ?

@gleb-cloudius
Copy link
Contributor Author

On Tue, Oct 06, 2015 at 01:00:17AM -0700, Tomasz Grabiec wrote:

@gleb-cloudius Is $IP is an IP of one of the servers, or a list of IPs ?

$IP is ip of one of the servers. I use same ip in both loaders.

        Gleb.

@slivne slivne modified the milestones: catfish, barracuda Oct 15, 2015
@tgrabiec
Copy link
Contributor

@gleb-cloudius @asias Which AMIs did you use for Scylla and c-s ?

@gleb-cloudius
Copy link
Contributor Author

On Mon, Oct 19, 2015 at 05:12:14AM -0700, Tomasz Grabiec wrote:

@gleb-cloudius @asias Which AMIs did you use for Scylla and c-s ?

c3.8xlarge for scylla. c3.4xlarge for c-s.

        Gleb.

@tgrabiec
Copy link
Contributor

19 paź 2015 2:15 PM "Gleb Natapov" notifications@github.com napisał(a):

On Mon, Oct 19, 2015 at 05:12:14AM -0700, Tomasz Grabiec wrote:

@gleb-cloudius @asias Which AMIs did you use for Scylla and c-s ?

c3.8xlarge for scylla. c3.4xlarge for c-s.

That's instance type, I'm asking about the image.

Gleb.


Reply to this email directly or view it on GitHub.

@gleb-cloudius
Copy link
Contributor Author

On Mon, Oct 19, 2015 at 05:31:06AM -0700, Tomasz Grabiec wrote:

19 paź 2015 2:15 PM "Gleb Natapov" notifications@github.com napisał(a):

On Mon, Oct 19, 2015 at 05:12:14AM -0700, Tomasz Grabiec wrote:

@gleb-cloudius @asias Which AMIs did you use for Scylla and c-s ?

c3.8xlarge for scylla. c3.4xlarge for c-s.

That's instance type, I'm asking about the image.

Ah, sorry, this will be harder to recall unless AWS has some kind of
logs. I terminated all instances since they are not reusable currently.

        Gleb.

@asias
Copy link
Contributor

asias commented Oct 19, 2015

On Mon, Oct 19, 2015 at 8:35 PM, Gleb Natapov notifications@github.com
wrote:

On Mon, Oct 19, 2015 at 05:31:06AM -0700, Tomasz Grabiec wrote:

19 paź 2015 2:15 PM "Gleb Natapov" notifications@github.com
napisał(a):

On Mon, Oct 19, 2015 at 05:12:14AM -0700, Tomasz Grabiec wrote:

@gleb-cloudius @asias Which AMIs did you use for Scylla and c-s ?

c3.8xlarge for scylla. c3.4xlarge for c-s.

That's instance type, I'm asking about the image.

Ah, sorry, this will be harder to recall unless AWS has some kind of
logs. I terminated all instances since they are not reusable currently.

I'm not 100% sure, but it probably is:
us-west-2
ami-79100849

Gleb.


Reply to this email directly or view it on GitHub
#403 (comment).

@tgrabiec
Copy link
Contributor

I managed to reproduce it with 2ccb5fe when starting clients exactly like Gleb mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants