Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM only: test_topology_smp.test_nodes_with_different_smp fails in debug mode #14093

Closed
avikivity opened this issue May 31, 2023 · 10 comments · Fixed by #14178
Closed

ARM only: test_topology_smp.test_nodes_with_different_smp fails in debug mode #14093

avikivity opened this issue May 31, 2023 · 10 comments · Fixed by #14178
Assignees
Labels
area/test Issues related to the testing system code and environment P1 Urgent symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Milestone

Comments

@avikivity
Copy link
Member

https://jenkins.scylladb.com/view/nexts/job/scylla-master/job/next/lastCompletedBuild/consoleFull

05:26:44  ------------------------------------------------------------------------------
05:26:44  The following test(s) have failed: topology_custom/test_topology_smp
05:26:44  Output of pytest --manager-api=/tmp/manager-smgzqzq9/api -s --log-level=DEBUG -o junit_family=xunit2 --junit-xml=/jenkins/workspace/scylla-master/next/scylla/testlog/x86_64/debug/xml/topology_custom.test_topology_smp.1.xunit.xml test/topology_custom/test_topology_smp.py:
05:26:44  === TEST.PY STARTING TEST topology_custom.test_topology_smp.1 ===
05:26:44  export UBSAN_OPTIONS='halt_on_error=1:abort_on_error=1:suppressions=/jenkins/workspace/scylla-master/next/scylla/ubsan-suppressions.supp'
05:26:44  export ASAN_OPTIONS='disable_coredump=0:abort_on_error=1:detect_stack_use_after_return=1'
05:26:44  pytest --manager-api=/tmp/manager-smgzqzq9/api -s --log-level=DEBUG -o junit_family=xunit2 --junit-xml=/jenkins/workspace/scylla-master/next/scylla/testlog/x86_64/debug/xml/topology_custom.test_topology_smp.1.xunit.xml test/topology_custom/test_topology_smp.py
05:26:44  === TEST.PY TEST topology_custom.test_topology_smp.1 OUTPUT ===
05:26:44  Driver name Scylla Python Driver, version 3.26.1
05:26:44  ============================= test session starts ==============================
05:26:44  platform linux -- Python 3.11.3, pytest-7.1.3, pluggy-1.0.0
05:26:44  rootdir: /jenkins/workspace/scylla-master/next/scylla/test/topology_custom, configfile: pytest.ini
05:26:44  plugins: asyncio-0.19.0
05:26:44  asyncio: mode=Mode.AUTO
05:26:44  collected 1 item
05:26:44  
05:26:44  test/topology_custom/test_topology_smp.py::test_nodes_with_different_smp 
05:26:44  -------------------------------- live log setup --------------------------------
05:26:44  04:57:33.928 DEBUG> Using selector: EpollSelector
05:26:44  04:57:33.929 DEBUG> Using selector: EpollSelector
05:26:44  04:57:33.930 DEBUG> before_test for test_nodes_with_different_smp
05:26:44  04:57:33.930 DEBUG> RESTClient fetching GET http+unix://api/cluster/is-dirty
05:26:44  04:57:33.932 DEBUG> RESTClient fetching GET http+unix://api/cluster/before-test/test_nodes_with_different_smp
05:26:44  04:57:33.934 INFO> Using cluster: ScyllaCluster(name: 81f7feec-ff56-11ed-83dd-1272aa0d8b0d, running: , stopped: ) for test test_nodes_with_different_smp
05:26:44  04:57:33.934 DEBUG> RESTClient fetching GET http+unix://api/cluster/running-servers
05:26:44  -------------------------------- live log call ---------------------------------
05:26:44  04:57:33.936 INFO> Adding --smp=3 server
05:26:44  04:57:33.936 DEBUG> RESTClient fetching PUT http+unix://api/cluster/addserver
05:26:44  04:58:05.590 DEBUG> ManagerClient added Server(416, 127.207.166.32)
05:26:44  04:58:05.590 DEBUG> RESTClient fetching GET http+unix://api/cluster/running-servers
05:26:44  04:58:05.710 DEBUG> driver connecting to ['127.207.166.32']
05:26:44  04:58:05.718 DEBUG> Connecting to cluster, contact points: ['127.207.166.32']; protocol version: 4
05:26:44  04:58:05.718 DEBUG> Host 127.207.166.32:9042 is now marked up
05:26:44  04:58:05.719 DEBUG> [control connection] Opening new connection to 127.207.166.32:9042
05:26:44  04:58:05.720 DEBUG> Connection 139733517298000 127.0.0.1:56098 -> 127.207.166.32:9042
05:26:44  04:58:05.720 DEBUG> Sending initial options message for new connection (139733517298000) to 127.207.166.32:9042
05:26:44  04:58:05.720 DEBUG> Starting libev event loop
05:26:44  04:58:05.723 DEBUG> Received options response on new connection (139733517298000) from 127.207.166.32:9042
05:26:44  04:58:05.723 DEBUG> No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['lz4', 'snappy']
05:26:44  04:58:05.723 DEBUG> Sending StartupMessage on <LibevConnection(139733517298000) 127.207.166.32:9042>
05:26:44  04:58:05.723 DEBUG> Sent StartupMessage on <LibevConnection(139733517298000) 127.207.166.32:9042>
05:26:44  04:58:05.725 DEBUG> Got ReadyMessage on new connection (139733517298000) from 127.207.166.32:9042
05:26:44  04:58:05.726 DEBUG> [control connection] Established new connection <LibevConnection(139733517298000) 127.207.166.32:9042>, registering watchers and refreshing schema and topology
05:26:44  04:58:05.762 DEBUG> [control connection] Refreshing node list and token map using preloaded results
05:26:44  04:58:05.762 DEBUG> [control connection] Removing host not found in peers metadata: <Host: 127.207.166.32:9042 datacenter1>
05:26:44  04:58:05.762 DEBUG> [control connection] Finished fetching ring info
05:26:44  04:58:05.763 DEBUG> [control connection] Rebuilding token map due to topology changes
05:26:44  04:58:06.358 DEBUG> Control connection created
05:26:44  04:58:06.360 DEBUG> Initializing connection for host 127.207.166.32:9042
05:26:44  04:58:06.360 DEBUG> Connection 139733499506896 127.0.0.1:56114 -> 127.207.166.32:9042
05:26:44  04:58:06.361 DEBUG> Sending initial options message for new connection (139733499506896) to 127.207.166.32:9042
05:26:44  04:58:06.364 DEBUG> Received options response on new connection (139733499506896) from 127.207.166.32:9042
05:26:44  04:58:06.365 DEBUG> No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['lz4', 'snappy']
05:26:44  04:58:06.365 DEBUG> Sending StartupMessage on <LibevConnection(139733499506896) 127.207.166.32:9042>
05:26:44  04:58:06.365 DEBUG> Sent StartupMessage on <LibevConnection(139733499506896) 127.207.166.32:9042>
05:26:44  04:58:06.368 DEBUG> Got ReadyMessage on new connection (139733499506896) from 127.207.166.32:9042
05:26:44  04:58:06.371 DEBUG> First connection created to 127.207.166.32:9042 for shard_id=2
05:26:44  04:58:06.371 DEBUG> Finished initializing connection for host 127.207.166.32:9042
05:26:44  04:58:06.372 DEBUG> shard_aware_endpoint=None
05:26:44  04:58:06.372 DEBUG> Added pool for host 127.207.166.32:9042 to session
05:26:44  04:58:06.374 DEBUG> shard_aware_endpoint=None
05:26:44  04:58:06.375 DEBUG> Not starting MonitorReporter thread for Insights; not supported by server version 3.0.8 on ControlConnection host 127.207.166.32:9042
05:26:44  04:58:06.375 DEBUG> Started Session with client_id fba8f33e-74db-4512-bfec-424dd56d5f76 and session_id 5aa0e3d8-7564-4b62-a098-8057a06f742b
05:26:44  04:58:06.376 INFO> Adding --smp=4 server
05:26:44  04:58:06.376 DEBUG> RESTClient fetching PUT http+unix://api/cluster/addserver
05:26:44  04:58:06.378 DEBUG> Connection 139733518069264 127.0.0.1:56122 -> 127.207.166.32:9042
05:26:44  04:58:06.378 DEBUG> Connection 139733518072272 127.0.0.1:56118 -> 127.207.166.32:9042
05:26:44  04:58:06.378 DEBUG> Sending initial options message for new connection (139733518069264) to 127.207.166.32:9042
05:26:44  04:58:06.379 DEBUG> Sending initial options message for new connection (139733518072272) to 127.207.166.32:9042
05:26:44  04:58:06.381 DEBUG> Received options response on new connection (139733518069264) from 127.207.166.32:9042
05:26:44  04:58:06.382 DEBUG> No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['lz4', 'snappy']
05:26:44  04:58:06.382 DEBUG> Sending StartupMessage on <LibevConnection(139733518069264) 127.207.166.32:9042>
05:26:44  04:58:06.382 DEBUG> Sent StartupMessage on <LibevConnection(139733518069264) 127.207.166.32:9042>
05:26:44  04:58:06.383 DEBUG> Received options response on new connection (139733518072272) from 127.207.166.32:9042
05:26:44  04:58:06.384 DEBUG> No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['lz4', 'snappy']
05:26:44  04:58:06.384 DEBUG> Sending StartupMessage on <LibevConnection(139733518072272) 127.207.166.32:9042>
05:26:44  04:58:06.384 DEBUG> Sent StartupMessage on <LibevConnection(139733518072272) 127.207.166.32:9042>
05:26:44  04:58:06.384 DEBUG> Got ReadyMessage on new connection (139733518069264) from 127.207.166.32:9042
05:26:44  04:58:06.385 DEBUG> Received a connection 139733518069264 for shard_id=0 on host 127.207.166.32:9042
05:26:44  04:58:06.385 DEBUG> New connection (139733518069264) created to shard_id=0 on host 127.207.166.32:9042
05:26:44  04:58:06.386 DEBUG> Connected to 2/3 shards on host 127.207.166.32:9042 (1 missing or needs replacement)
05:26:44  04:58:06.386 DEBUG> Got ReadyMessage on new connection (139733518072272) from 127.207.166.32:9042
05:26:44  04:58:06.387 DEBUG> Received a connection 139733518072272 for shard_id=1 on host 127.207.166.32:9042
05:26:44  04:58:06.387 DEBUG> New connection (139733518072272) created to shard_id=1 on host 127.207.166.32:9042
05:26:44  04:58:06.388 DEBUG> Connected to 3/3 shards on host 127.207.166.32:9042 (0 missing or needs replacement)
05:26:44  04:58:06.388 DEBUG> All shards of host 127.207.166.32:9042 have at least one connection, closing 0 excess connections
05:26:44  04:59:06.360 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  04:59:06.360 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  04:59:06.360 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  04:59:06.361 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  04:59:06.363 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  04:59:06.364 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  04:59:06.382 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  04:59:06.387 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  04:59:36.361 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  04:59:36.362 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  04:59:36.362 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  04:59:36.362 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  04:59:36.365 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  04:59:36.366 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  04:59:36.367 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  04:59:36.369 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:00:06.363 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:00:06.363 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:00:06.364 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:00:06.364 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:00:06.366 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:00:06.367 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:00:06.367 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:00:06.368 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:00:36.364 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:00:36.365 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:00:36.365 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:00:36.366 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:00:36.368 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:00:36.368 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:00:36.368 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:00:36.369 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:01:06.366 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:01:06.367 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:01:06.367 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:01:06.367 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:01:06.369 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:01:06.371 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:01:06.377 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:01:06.382 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:01:36.368 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:01:36.368 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:01:36.368 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:01:36.369 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:01:36.371 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:01:36.372 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:01:36.375 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:01:36.377 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:02:06.369 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:02:06.370 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:02:06.370 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:02:06.370 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:02:06.374 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:02:06.375 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:02:06.378 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:02:06.383 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:02:36.371 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:02:36.372 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:02:36.372 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:02:36.372 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:02:36.375 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:02:36.375 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:02:36.377 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:02:36.382 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:03:06.373 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:03:06.373 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:03:06.373 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:03:06.374 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:03:06.376 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:03:06.377 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:03:06.378 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:03:06.383 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:03:36.374 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:03:36.375 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:03:36.375 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:03:36.375 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:03:36.378 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:03:36.378 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:03:36.381 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:03:36.383 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:04:06.376 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:04:06.376 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:04:06.377 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:04:06.377 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:04:06.381 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:04:06.381 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:04:06.383 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:04:06.385 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:04:36.378 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:04:36.378 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:04:36.379 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:04:36.379 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:04:36.381 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:04:36.382 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:04:36.385 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:04:36.386 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:05:06.379 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:05:06.380 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:05:06.380 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:05:06.380 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:05:06.383 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:05:06.384 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:05:06.394 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:05:06.398 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:05:36.381 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:05:36.382 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:05:36.382 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:05:36.382 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:05:36.386 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:05:36.387 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:05:36.390 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:05:36.393 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:06:06.383 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:06:06.383 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:06:06.384 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:06:06.384 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:06:06.387 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:06:06.388 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:06:06.393 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:06:06.395 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:06:36.384 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:06:36.385 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:06:36.385 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:06:36.386 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:06:36.389 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:06:36.390 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:06:36.391 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:06:36.393 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:07:06.386 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:07:06.387 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:07:06.387 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:07:06.387 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:07:06.390 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:07:06.391 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:07:06.392 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:07:06.398 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:07:36.388 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:07:36.388 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:07:36.388 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:07:36.389 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:07:36.391 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:07:36.392 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:07:36.393 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:07:36.395 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:08:06.389 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:08:06.390 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:08:06.390 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:08:06.390 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:08:06.393 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:08:06.393 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:08:06.394 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:08:06.395 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:08:36.391 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:08:36.391 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:08:36.391 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:08:36.392 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:08:36.396 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:08:36.396 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:08:36.404 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:08:36.410 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:09:06.392 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:09:06.393 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:09:06.393 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:09:06.393 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:09:06.396 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:09:06.397 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:09:06.397 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:09:06.397 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:09:36.394 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:09:36.394 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:09:36.394 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:09:36.395 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:09:36.397 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:09:36.398 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:09:36.398 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:09:36.402 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:10:06.395 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:10:06.396 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:10:06.396 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:10:06.396 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:10:06.399 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:10:06.400 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:10:06.405 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:10:06.405 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:10:36.397 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:10:36.397 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:10:36.397 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:10:36.398 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:10:36.401 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:10:36.402 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:10:36.402 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:10:36.407 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:11:06.398 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:11:06.399 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:11:06.399 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:11:06.399 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:11:06.402 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:11:06.403 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:11:06.403 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:11:06.404 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:11:36.400 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:11:36.400 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:11:36.401 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:11:36.401 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:11:36.404 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:11:36.404 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:11:36.405 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:11:36.405 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:12:06.401 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:12:06.402 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:12:06.402 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:12:06.402 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:12:06.405 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:12:06.405 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:12:06.406 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:12:06.406 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:12:36.403 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:12:36.403 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:12:36.404 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:12:36.404 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:12:36.406 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:12:36.407 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:12:36.412 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:12:36.461 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:13:06.404 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:13:06.405 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:13:06.405 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:13:06.405 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:13:06.408 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:13:06.408 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:13:06.408 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:13:06.410 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:13:36.406 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:13:36.407 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:13:36.407 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:13:36.407 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:13:36.410 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:13:36.411 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:13:36.411 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:13:36.411 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:14:06.408 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:14:06.408 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:14:06.408 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:14:06.409 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:14:06.411 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:14:06.412 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:14:06.415 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  05:14:06.418 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:14:36.410 DEBUG> Sending options message heartbeat on idle connection (139733499506896) 127.207.166.32:9042
05:26:44  05:14:36.410 DEBUG> Sending options message heartbeat on idle connection (139733518069264) 127.207.166.32:9042
05:26:44  05:14:36.411 DEBUG> Sending options message heartbeat on idle connection (139733518072272) 127.207.166.32:9042
05:26:44  05:14:36.411 DEBUG> Sending options message heartbeat on idle connection (139733517298000) 127.207.166.32:9042
05:26:44  05:14:36.413 DEBUG> Received options response on connection (139733499506896) from 127.207.166.32:9042
05:26:44  05:14:36.414 DEBUG> Received options response on connection (139733518072272) from 127.207.166.32:9042
05:26:44  05:14:36.416 DEBUG> Received options response on connection (139733518069264) from 127.207.166.32:9042
05:26:44  05:14:36.417 DEBUG> Received options response on connection (139733517298000) from 127.207.166.32:9042
05:26:44  FAILED
05:26:44  ------------------------------ live log teardown -------------------------------
05:26:44  05:14:47.173 DEBUG> after_test for test_nodes_with_different_smp (success: False)
05:26:44  05:14:47.174 DEBUG> RESTClient fetching GET http+unix://api/cluster/after-test/False
05:26:44  05:14:47.184 INFO> Cluster after test test_nodes_with_different_smp: ScyllaCluster(name: 81f7feec-ff56-11ed-83dd-1272aa0d8b0d, running: ScyllaServer(416, 127.207.166.32, 106854d8-7e3e-472d-8504-54e2604cd411), stopped: )
05:26:44  05:14:47.184 DEBUG> shutting down driver
05:26:44  05:14:47.186 DEBUG> Shutting down Cluster Scheduler
05:26:44  05:14:47.187 DEBUG> Shutting down control connection
05:26:44  05:14:47.187 DEBUG> Closing connection (139733517298000) to 127.207.166.32:9042
05:26:44  05:14:47.187 DEBUG> Closed socket to 127.207.166.32:9042
05:26:44  05:14:47.188 DEBUG> Shutting down connections to 127.207.166.32:9042
05:26:44  05:14:47.188 DEBUG> Closing connection (139733499506896) to 127.207.166.32:9042
05:26:44  05:14:47.189 DEBUG> Closing connection (139733499506896) to 127.207.166.32:9042
05:26:44  05:14:47.190 DEBUG> Closed socket to 127.207.166.32:9042
05:26:44  05:14:47.190 DEBUG> Closing connection (139733518069264) to 127.207.166.32:9042
05:26:44  05:14:47.190 DEBUG> Closing connection (139733518069264) to 127.207.166.32:9042
05:26:44  05:14:47.191 DEBUG> Closed socket to 127.207.166.32:9042
05:26:44  05:14:47.191 DEBUG> Closing connection (139733518072272) to 127.207.166.32:9042
05:26:44  05:14:47.191 DEBUG> Closing connection (139733518072272) to 127.207.166.32:9042
05:26:44  05:14:47.191 DEBUG> Closed socket to 127.207.166.32:9042
05:26:44  05:14:47.192 DEBUG> All Connections currently closed, event loop ended
05:26:44  05:14:47.195 DEBUG> Using selector: EpollSelector
05:26:44  
05:26:44  
05:26:44  =================================== FAILURES ===================================
05:26:44  ________________________ test_nodes_with_different_smp _________________________
05:26:44  
05:26:44  self = <test.pylib.manager_client.ManagerClient object at 0x7f163ec7b950>
05:26:44  replace_cfg = None, cmdline = ['--smp', '4'], config = None, start = True
05:26:44  
05:26:44      async def server_add(self, replace_cfg: Optional[ReplaceConfig] = None, cmdline: Optional[List[str]] = None, config: Optional[dict[str, Any]] = None, start: bool = True) -> ServerInfo:
05:26:44          """Add a new server"""
05:26:44          try:
05:26:44              data: dict[str, Any] = {'start': start}
05:26:44              if replace_cfg:
05:26:44                  data['replace_cfg'] = replace_cfg._asdict()
05:26:44              if cmdline:
05:26:44                  data['cmdline'] = cmdline
05:26:44              if config:
05:26:44                  data['config'] = config
05:26:44  >           server_info = await self.client.put_json("/cluster/addserver", data, response_type="json",
05:26:44                                                       timeout=ScyllaServer.TOPOLOGY_TIMEOUT)
05:26:44  
05:26:44  test/pylib/manager_client.py:174: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <test.pylib.rest_client.UnixRESTClient object at 0x7f163ec7ba10>
05:26:44  resource_uri = '/cluster/addserver'
05:26:44  data = {'cmdline': ['--smp', '4'], 'start': True}, host = None, port = None
05:26:44  params = None, response_type = 'json', timeout = 1000
05:26:44  
05:26:44      async def put_json(self, resource_uri: str, data: Mapping, host: Optional[str] = None,
05:26:44                         port: Optional[int] = None, params: Optional[dict[str, str]] = None,
05:26:44                         response_type: Optional[str] = None, timeout: Optional[float] = None) -> Any:
05:26:44  >       ret = await self._fetch("PUT", resource_uri, response_type = response_type, host = host,
05:26:44                                  port = port, params = params, json = data, timeout = timeout)
05:26:44  
05:26:44  test/pylib/rest_client.py:104: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <test.pylib.rest_client.UnixRESTClient object at 0x7f163ec7ba10>
05:26:44  method = 'PUT', resource = '/cluster/addserver', response_type = 'json'
05:26:44  host = None, port = None, params = None
05:26:44  json = {'cmdline': ['--smp', '4'], 'start': True}, timeout = 1000
05:26:44  
05:26:44      async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
05:26:44                       host: Optional[str] = None, port: Optional[int] = None,
05:26:44                       params: Optional[Mapping[str, str]] = None,
05:26:44                       json: Optional[Mapping] = None, timeout: Optional[float] = None) -> Any:
05:26:44          # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
05:26:44          assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
05:26:44          assert response_type is None or response_type in ["text", "json"], \
05:26:44                  f"Invalid response type requested {response_type} (expected 'text' or 'json')"
05:26:44          # Build the URI
05:26:44          port = port if port else self.default_port if hasattr(self, "default_port") else None
05:26:44          port_str = f":{port}" if port else ""
05:26:44          assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
05:26:44                  "{method} {resource}"
05:26:44          host_str = host if host is not None else self.default_host
05:26:44          uri = self.uri_scheme + "://" + host_str + port_str + resource
05:26:44          logging.debug(f"RESTClient fetching {method} {uri}")
05:26:44      
05:26:44          client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
05:26:44  >       async with request(method, uri,
05:26:44                             connector = self.connector if hasattr(self, "connector") else None,
05:26:44                             params = params, json = json, timeout = client_timeout) as resp:
05:26:44  
05:26:44  test/pylib/rest_client.py:64: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <aiohttp.client._SessionRequestContextManager object at 0x7f163eac6880>
05:26:44  
05:26:44      async def __aenter__(self) -> ClientResponse:
05:26:44          try:
05:26:44  >           self._resp = await self._coro
05:26:44  
05:26:44  /usr/lib64/python3.11/site-packages/aiohttp/client.py:1189: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <aiohttp.client.ClientSession object at 0x7f163eac6fd0>, method = 'PUT'
05:26:44  str_or_url = 'http+unix://api/cluster/addserver'
05:26:44  
05:26:44      async def _request(
05:26:44          self,
05:26:44          method: str,
05:26:44          str_or_url: StrOrURL,
05:26:44          *,
05:26:44          params: Optional[Mapping[str, str]] = None,
05:26:44          data: Any = None,
05:26:44          json: Any = None,
05:26:44          cookies: Optional[LooseCookies] = None,
05:26:44          headers: Optional[LooseHeaders] = None,
05:26:44          skip_auto_headers: Optional[Iterable[str]] = None,
05:26:44          auth: Optional[BasicAuth] = None,
05:26:44          allow_redirects: bool = True,
05:26:44          max_redirects: int = 10,
05:26:44          compress: Optional[str] = None,
05:26:44          chunked: Optional[bool] = None,
05:26:44          expect100: bool = False,
05:26:44          raise_for_status: Optional[bool] = None,
05:26:44          read_until_eof: bool = True,
05:26:44          proxy: Optional[StrOrURL] = None,
05:26:44          proxy_auth: Optional[BasicAuth] = None,
05:26:44          timeout: Union[ClientTimeout, object] = sentinel,
05:26:44          verify_ssl: Optional[bool] = None,
05:26:44          fingerprint: Optional[bytes] = None,
05:26:44          ssl_context: Optional[SSLContext] = None,
05:26:44          ssl: Optional[Union[SSLContext, bool, Fingerprint]] = None,
05:26:44          proxy_headers: Optional[LooseHeaders] = None,
05:26:44          trace_request_ctx: Optional[SimpleNamespace] = None,
05:26:44          read_bufsize: Optional[int] = None,
05:26:44      ) -> ClientResponse:
05:26:44      
05:26:44          # NOTE: timeout clamps existing connect and read timeouts.  We cannot
05:26:44          # set the default to None because we need to detect if the user wants
05:26:44          # to use the existing timeouts by setting timeout to None.
05:26:44      
05:26:44          if self.closed:
05:26:44              raise RuntimeError("Session is closed")
05:26:44      
05:26:44          ssl = _merge_ssl_params(ssl, verify_ssl, ssl_context, fingerprint)
05:26:44      
05:26:44          if data is not None and json is not None:
05:26:44              raise ValueError(
05:26:44                  "data and json parameters can not be used at the same time"
05:26:44              )
05:26:44          elif json is not None:
05:26:44              data = payload.JsonPayload(json, dumps=self._json_serialize)
05:26:44      
05:26:44          if not isinstance(chunked, bool) and chunked is not None:
05:26:44              warnings.warn("Chunk size is deprecated #1615", DeprecationWarning)
05:26:44      
05:26:44          redirects = 0
05:26:44          history = []
05:26:44          version = self._version
05:26:44      
05:26:44          # Merge with default headers and transform to CIMultiDict
05:26:44          headers = self._prepare_headers(headers)
05:26:44          proxy_headers = self._prepare_headers(proxy_headers)
05:26:44      
05:26:44          try:
05:26:44              url = self._build_url(str_or_url)
05:26:44          except ValueError as e:
05:26:44              raise InvalidURL(str_or_url) from e
05:26:44      
05:26:44          skip_headers = set(self._skip_auto_headers)
05:26:44          if skip_auto_headers is not None:
05:26:44              for i in skip_auto_headers:
05:26:44                  skip_headers.add(istr(i))
05:26:44      
05:26:44          if proxy is not None:
05:26:44              try:
05:26:44                  proxy = URL(proxy)
05:26:44              except ValueError as e:
05:26:44                  raise InvalidURL(proxy) from e
05:26:44      
05:26:44          if timeout is sentinel:
05:26:44              real_timeout: ClientTimeout = self._timeout
05:26:44          else:
05:26:44              if not isinstance(timeout, ClientTimeout):
05:26:44                  real_timeout = ClientTimeout(total=timeout)  # type: ignore[arg-type]
05:26:44              else:
05:26:44                  real_timeout = timeout
05:26:44          # timeout is cumulative for all request operations
05:26:44          # (request, redirects, responses, data consuming)
05:26:44          tm = TimeoutHandle(self._loop, real_timeout.total)
05:26:44          handle = tm.start()
05:26:44      
05:26:44          if read_bufsize is None:
05:26:44              read_bufsize = self._read_bufsize
05:26:44      
05:26:44          traces = [
05:26:44              Trace(
05:26:44                  self,
05:26:44                  trace_config,
05:26:44                  trace_config.trace_config_ctx(trace_request_ctx=trace_request_ctx),
05:26:44              )
05:26:44              for trace_config in self._trace_configs
05:26:44          ]
05:26:44      
05:26:44          for trace in traces:
05:26:44              await trace.send_request_start(method, url.update_query(params), headers)
05:26:44      
05:26:44          timer = tm.timer()
05:26:44          try:
05:26:44              with timer:
05:26:44                  while True:
05:26:44                      url, auth_from_url = strip_auth_from_url(url)
05:26:44                      if auth and auth_from_url:
05:26:44                          raise ValueError(
05:26:44                              "Cannot combine AUTH argument with "
05:26:44                              "credentials encoded in URL"
05:26:44                          )
05:26:44      
05:26:44                      if auth is None:
05:26:44                          auth = auth_from_url
05:26:44                      if auth is None:
05:26:44                          auth = self._default_auth
05:26:44                      # It would be confusing if we support explicit
05:26:44                      # Authorization header with auth argument
05:26:44                      if (
05:26:44                          headers is not None
05:26:44                          and auth is not None
05:26:44                          and hdrs.AUTHORIZATION in headers
05:26:44                      ):
05:26:44                          raise ValueError(
05:26:44                              "Cannot combine AUTHORIZATION header "
05:26:44                              "with AUTH argument or credentials "
05:26:44                              "encoded in URL"
05:26:44                          )
05:26:44      
05:26:44                      all_cookies = self._cookie_jar.filter_cookies(url)
05:26:44      
05:26:44                      if cookies is not None:
05:26:44                          tmp_cookie_jar = CookieJar()
05:26:44                          tmp_cookie_jar.update_cookies(cookies)
05:26:44                          req_cookies = tmp_cookie_jar.filter_cookies(url)
05:26:44                          if req_cookies:
05:26:44                              all_cookies.load(req_cookies)
05:26:44      
05:26:44                      if proxy is not None:
05:26:44                          proxy = URL(proxy)
05:26:44                      elif self._trust_env:
05:26:44                          with suppress(LookupError):
05:26:44                              proxy, proxy_auth = get_env_proxy_for_url(url)
05:26:44      
05:26:44                      req = self._request_class(
05:26:44                          method,
05:26:44                          url,
05:26:44                          params=params,
05:26:44                          headers=headers,
05:26:44                          skip_auto_headers=skip_headers,
05:26:44                          data=data,
05:26:44                          cookies=all_cookies,
05:26:44                          auth=auth,
05:26:44                          version=version,
05:26:44                          compress=compress,
05:26:44                          chunked=chunked,
05:26:44                          expect100=expect100,
05:26:44                          loop=self._loop,
05:26:44                          response_class=self._response_class,
05:26:44                          proxy=proxy,
05:26:44                          proxy_auth=proxy_auth,
05:26:44                          timer=timer,
05:26:44                          session=self,
05:26:44                          ssl=ssl,
05:26:44                          proxy_headers=proxy_headers,
05:26:44                          traces=traces,
05:26:44                      )
05:26:44      
05:26:44                      # connection timeout
05:26:44                      try:
05:26:44                          async with ceil_timeout(real_timeout.connect):
05:26:44                              assert self._connector is not None
05:26:44                              conn = await self._connector.connect(
05:26:44                                  req, traces=traces, timeout=real_timeout
05:26:44                              )
05:26:44                      except asyncio.TimeoutError as exc:
05:26:44                          raise ServerTimeoutError(
05:26:44                              "Connection timeout " "to host {}".format(url)
05:26:44                          ) from exc
05:26:44      
05:26:44                      assert conn.transport is not None
05:26:44      
05:26:44                      assert conn.protocol is not None
05:26:44                      conn.protocol.set_response_params(
05:26:44                          timer=timer,
05:26:44                          skip_payload=method.upper() == "HEAD",
05:26:44                          read_until_eof=read_until_eof,
05:26:44                          auto_decompress=self._auto_decompress,
05:26:44                          read_timeout=real_timeout.sock_read,
05:26:44                          read_bufsize=read_bufsize,
05:26:44                      )
05:26:44      
05:26:44                      try:
05:26:44                          try:
05:26:44                              resp = await req.send(conn)
05:26:44                              try:
05:26:44  >                               await resp.start(conn)
05:26:44  
05:26:44  /usr/lib64/python3.11/site-packages/aiohttp/client.py:560: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <ClientResponse(http+unix://api/cluster/addserver) [None None]>
05:26:44  None
05:26:44  
05:26:44  connection = Connection<ConnectionKey(host='api', port=None, is_ssl=False, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=None)>
05:26:44  
05:26:44      async def start(self, connection: "Connection") -> "ClientResponse":
05:26:44          """Start response processing."""
05:26:44          self._closed = False
05:26:44          self._protocol = connection.protocol
05:26:44          self._connection = connection
05:26:44      
05:26:44  >       with self._timer:
05:26:44  
05:26:44  /usr/lib64/python3.11/site-packages/aiohttp/client_reqrep.py:894: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <aiohttp.helpers.TimerContext object at 0x7f163ea97210>
05:26:44  exc_type = <class 'asyncio.exceptions.CancelledError'>
05:26:44  exc_val = CancelledError(), exc_tb = <traceback object at 0x7f163eac7880>
05:26:44  
05:26:44      def __exit__(
05:26:44          self,
05:26:44          exc_type: Optional[Type[BaseException]],
05:26:44          exc_val: Optional[BaseException],
05:26:44          exc_tb: Optional[TracebackType],
05:26:44      ) -> Optional[bool]:
05:26:44          if self._tasks:
05:26:44              self._tasks.pop()
05:26:44      
05:26:44          if exc_type is asyncio.CancelledError and self._cancelled:
05:26:44  >           raise asyncio.TimeoutError from None
05:26:44  E           TimeoutError
05:26:44  
05:26:44  /usr/lib64/python3.11/site-packages/aiohttp/helpers.py:721: TimeoutError
05:26:44  
05:26:44  The above exception was the direct cause of the following exception:
05:26:44  
05:26:44  request = <FixtureRequest for <Function test_nodes_with_different_smp>>
05:26:44  manager = <test.pylib.manager_client.ManagerClient object at 0x7f163ec7b950>
05:26:44  
05:26:44      @pytest.mark.asyncio
05:26:44      async def test_nodes_with_different_smp(request: FixtureRequest, manager: ManagerClient) -> None:
05:26:44          # In this test it's more convenient to start with a fresh cluster.
05:26:44      
05:26:44          # When the node starts it tries to communicate with others
05:26:44          # by sending group0_peer_exchange message to them.
05:26:44          # This message can be handled on arbitrary shard of the target node.
05:26:44          # The method manager.server_add waits for node to start, which can only happen
05:26:44          # if this message has been handled correctly.
05:26:44          #
05:26:44          # Note: messaging_service is initialized with server_socket::load_balancing_algorithm::port
05:26:44          # policy, this means that the shard for message will be chosen as client_port % smp::count.
05:26:44          # The client port in turn is chosen as rand() * smp::count + current_shard
05:26:44          # (posix_socket_impl::find_port_and_connect).
05:26:44          # If this succeeds to occupy a free port in 5 tries and smp::count is the same
05:26:44          # on both nodes, then it's guaranteed that the message will be
05:26:44          # processed on the same shard as the calling code.
05:26:44          # In the general case, we cannot assume that this same shard guarantee holds.
05:26:44          logger.info(f'Adding --smp=3 server')
05:26:44          await manager.server_add(cmdline=['--smp', '3'])
05:26:44          logger.info(f'Adding --smp=4 server')
05:26:44  >       await manager.server_add(cmdline=['--smp', '4'])
05:26:44  
05:26:44  test/topology_custom/test_topology_smp.py:43: 
05:26:44  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
05:26:44  
05:26:44  self = <test.pylib.manager_client.ManagerClient object at 0x7f163ec7b950>
05:26:44  replace_cfg = None, cmdline = ['--smp', '4'], config = None, start = True
05:26:44  
05:26:44      async def server_add(self, replace_cfg: Optional[ReplaceConfig] = None, cmdline: Optional[List[str]] = None, config: Optional[dict[str, Any]] = None, start: bool = True) -> ServerInfo:
05:26:44          """Add a new server"""
05:26:44          try:
05:26:44              data: dict[str, Any] = {'start': start}
05:26:44              if replace_cfg:
05:26:44                  data['replace_cfg'] = replace_cfg._asdict()
05:26:44              if cmdline:
05:26:44                  data['cmdline'] = cmdline
05:26:44              if config:
05:26:44                  data['config'] = config
05:26:44              server_info = await self.client.put_json("/cluster/addserver", data, response_type="json",
05:26:44                                                       timeout=ScyllaServer.TOPOLOGY_TIMEOUT)
05:26:44          except Exception as exc:
05:26:44  >           raise Exception("Failed to add server") from exc
05:26:44  E           Exception: Failed to add server
05:26:44  
05:26:44  test/pylib/manager_client.py:177: Exception
@avikivity avikivity added area/test Issues related to the testing system code and environment symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework labels May 31, 2023
@gusev-p
Copy link

gusev-p commented May 31, 2023

extremely slow boostrap via repair when starting second node, this log is for keyspace=system_auth:

INFO  2023-05-31 05:11:33,940 [shard 0] repair - bootstrap_with_repair: started with keyspace=system_auth, nr_ranges=51
...
INFO  2023-05-31 05:13:50,582 [shard 1] repair - repair[4bc6b5a2-1468-4c66-89f3-3a9c1e4bb93e]: stats: repair_reason=bootstrap, keyspace=system_auth, tables={roles, role_members, role_attributes}, ranges_nr=17, round_nr=51, round_nr_fast_path_already_synced=51, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=255, tx_hashes_nr=0, rx_hashes_nr=0, duration=136.60667 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.207.166.23, 0}, {127.207.166.32, 0}}, row_from_disk_nr={{127.207.166.23, 0}, {127.207.166.32, 0}}, row_from_disk_bytes_per_sec={{127.207.166.23, 0}, {127.207.166.32, 0}} MiB/s, row_from_disk_rows_per_sec={{127.207.166.23, 0}, {127.207.166.32, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
INFO  2023-05-31 05:13:50,582 [shard 1] repair - repair[4bc6b5a2-1468-4c66-89f3-3a9c1e4bb93e]: completed successfully, keyspace=system_auth
INFO  2023-05-31 05:13:50,585 [shard 0] repair - repair[4bc6b5a2-1468-4c66-89f3-3a9c1e4bb93e]: completed successfully
INFO  2023-05-31 05:13:50,586 [shard 0] repair - repair[4bc6b5a2-1468-4c66-89f3-3a9c1e4bb93e]: sync data for keyspace=system_auth, status=succeeded
INFO  2023-05-31 05:13:50,587 [shard 0] repair - bootstrap_with_repair: finished with keyspace=system_auth, nr_ranges=17

nr_ranges and other numbers are small, but duration=136.60667 seconds

ran this locally:

INFO  2023-05-31 10:01:49,899 [shard 0] repair - bootstrap_with_repair: started with keyspace=system_auth, nr_ranges=51

INFO  2023-05-31 10:02:07,173 [shard 1] repair - repair[052c0540-651e-495e-af0c-2983cf17044d]: stats: repair_reason=bootstrap, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=17, round_nr=51, round_nr_fast_path_already_synced=51, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=255, tx_hashes_nr=0, rx_hashes_nr=0, duration=17.268816 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.127.120.2, 0}, {127.127.120.3, 0}}, row_from_disk_nr={{127.127.120.2, 0}, {127.127.120.3, 0}}, row_from_disk_bytes_per_sec={{127.127.120.2, 0}, {127.127.120.3, 0}} MiB/s, row_from_disk_rows_per_sec={{127.127.120.2, 0}, {127.127.120.3, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
INFO  2023-05-31 10:02:07,173 [shard 1] repair - repair[052c0540-651e-495e-af0c-2983cf17044d]: completed successfully, keyspace=system_auth
INFO  2023-05-31 10:02:07,173 [shard 0] repair - repair[052c0540-651e-495e-af0c-2983cf17044d]: completed successfully
INFO  2023-05-31 10:02:07,174 [shard 0] repair - repair[052c0540-651e-495e-af0c-2983cf17044d]: sync data for keyspace=system_auth, status=succeeded
INFO  2023-05-31 10:02:07,174 [shard 0] repair - bootstrap_with_repair: finished with keyspace=system_auth, nr_ranges=17

duration=17 seconds. This log is from the second, bootstrapping node. It's CPU usage is low, but CPU usage of the first (source) node is huge (248.2%). Should run it under perf probably.

@gusev-p
Copy link

gusev-p commented May 31, 2023

debug, two nodes, --smp=2 on both, starting second node
    13:57:45.198 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.80.247.3 in scylla-2...
    13:58:03.826 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.80.247.3 in scylla-2, pid 22532

18 seconds
debug, two nodes, --smp=3 on the first, --smp=4 on the second, starting second node
    14:03:16.618 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.231.213.3 in scylla-2...
    14:05:18.298 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.231.213.3 in scylla-2, pid 23183

2 minutes

@gusev-p
Copy link

gusev-p commented Jun 7, 2023

`multishard_combining_reader_v2 -> shard_reader_v2 -> (smp::submit_to) -> evictable_reader_v2 -> (_lifecycle_policy) -> table.make_streaming_reader`

INFO  2023-06-06 23:30:57,311 [shard 1] repair - REPAIR_GET_SYNC_BOUNDARY
stage #0, duration 283342603, count 1137
    register_repair_get_sync_boundary handler, was called 1137 times, tool a lot of time
stage #1, duration 283207080, count 1137
    all the time in repair_meta::get_sync_boundary
stage #2, duration 282604366, count 1137
    all the time in repair_meta::read_rows_from_disk
stage #3, duration 281519155, count 1175
    _repair_reader.read_mutation_fragment()

   called more times because of the repeat, sometimes is called multiple times
stage #4, duration 204514177, count 138922
    multishard_combining_reader_v2::fill_buffer
        number of calls can't be smaller than the number of ranges after selective_token_range_sharder

    the time is smaller, perhaps we spend a lot of time on fast_forward_to when we jump between the ranges

stage #5, duration 174128091, count 231346, value #0: 120
    shard_reader_v2::do_fill_buffer

    120 - number of fragments, strange that smaller than on the outside

stage #6, duration 735342, count 866
    creating underlying-reader and its wrapper evictable_reader_v2 on the other shard

    less time here, since we are here only once per repair_get_sync_boundary handler call

stage #7, duration 604651, count 866
stage #8, duration 603345, count 866
    first reader from the underlying reader on the other shard

stage #9, duration 62431970, count 92619
    evictable_reader_v2::fill_buffer

stage #10, duration 0, count 0
stage #11, duration 57816223, count 92619
    reading from underlying reader inside evictable_reader_v2
stage #12, duration 0, count 0
stage #13, duration 167650, count 92619
stage #14, duration 163691423, count 229408
    call smp::submit_to inside shard_reader_v2::do_fill_buffer when the underlying reader has already beed created

stage #15, duration 64585145, count 91753
    _reader->fill_buffer(); inside shard_reader_v2::do_fill_buffer on the other shard

stage #16, duration 281337445, count 1137, value #0: 38, value #1: 137759
    flat_multi_range_mutation_reader::fill_buffer
        this thingie just unites the data from different ranges

    all the time from the register_repair_get_sync_boundary handler is here

    38 mutation_fragments

    touched 137759 different partition_range-s
      it's a lot

questions
    why 1137 calls on repair_get_sync_boundary?

    why 137759 shard ranges after selective_token_range_sharder?

@gusev-p
Copy link

gusev-p commented Jun 7, 2023

In row_level_repair::run we do set_estimated_partitions, which updates repair_meta::_estimated_partitions on all peers,
but repair_meta::_estimated_partitions is used only in repair_meta::repair_meta to construct a repair_writer and it’s always zero at that moment. Do we need this _estimated_partitions? Should we create the writer later when we know the actual value for _estimated_partitions?
@asias @bhalevy

@gusev-p
Copy link

gusev-p commented Jun 20, 2023

New failure by the same reason (aarch64):

07:17:50.891 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.102.79.22 in scylla-157...
...
07:34:30.977 ERROR> [debug/topology_custom.test_topology_smp.1] last line of /jenkins/workspace/scylla-enterprise/build/scylla/testlog/aarch64/debug/scylla-157.log:
INFO  2023-06-20 07:34:29,228 [shard 2] repair - repair[4e46a00a-8440-451c-b2eb-ba58b7a731ce]: completed successfully, keyspace=system_replicated_keys

@bhalevy
Copy link
Member

bhalevy commented Jun 28, 2023

In row_level_repair::run we do set_estimated_partitions, which updates repair_meta::_estimated_partitions on all peers, but repair_meta::_estimated_partitions is used only in repair_meta::repair_meta to construct a repair_writer and it’s always zero at that moment. Do we need this _estimated_partitions? Should we create the writer later when we know the actual value for _estimated_partitions? @asias @bhalevy

I'm not familiar enough in this area...
@asias?

@DoronArazii DoronArazii added the P1 Urgent label Jun 28, 2023
@DoronArazii DoronArazii modified the milestones: 5.4, 5.3 Jun 28, 2023
@mykaul mykaul changed the title test_topology_smp.test_nodes_with_different_smp fails in debug mode ARM only: test_topology_smp.test_nodes_with_different_smp fails in debug mode Jun 29, 2023
@DoronArazii
Copy link

@asias ping

@DoronArazii
Copy link

@gusev-p we need it urgently it's failing master

@gusev-p
Copy link

gusev-p commented Jul 5, 2023

@gusev-p we need it urgently it's failing master

the PR is approved, waiting for last CI/merge

@DoronArazii DoronArazii modified the milestones: 5.3, 5.4 Jul 9, 2023
avikivity added a commit that referenced this issue Jul 9, 2023
…m Gusev Petr

Consider a cluster with no data, e.g. in tests. When a new node is bootstrapped with repair we iterate over all (shard, table, range), read data from all the peer nodes for the range, look for any discrepancies and heal them. Even for small num_tokens (16 in the tests) the number of affected ranges (those we need to consider) amounts to total number of tokens in the cluster, which is 32 for the second node and 48 for the third. Multiplying this by the number of shards and the number of tables in each keyspace gives thousands of ranges. For each of them we need to follow some row level repair protocol, which includes several RPC exchanges between the peer nodes and creating some data structures on them. These exchanges are processed sequentially for each shard, there are `parallel_for_each` in code, but they are throttled by the choosen memory constraints and in fact execute sequentially.

When the bootstrapping node (master) reaches a peer node and asks for data in the specific range and master shard, two options exist. If sharder parameters (primarily, `--smp`) are the same on the master and on the peer, we can just read one local shard, this is fast. If, on the other hand, `--smp` is different, we need to do a multishard query. The given range from the master can contain data from different peer shards, so we split this range into a number of subranges such that each of them contain data only from the given master shard (`dht::selective_token_range_sharder`). The number of these subranges can be quite big (300 in the tests). For each of these subranges we do `fast_forward_to` on the `multishard_reader`, and this incurs a lot of overhead, mainly becuse of `smp::submit_to`.

In this series we optimize this case. Instead of splitting the master range and reading only what's needed, we read all the data in the range and then apply the filter by the master shard. We do this if the estimated number of partitions is small (<=100).

This is the logs of starting a second node with `--smp 4`, first node was `--smp 3`:

```
with this patch
    20:58:49.644 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.222.46.3 in scylla-2...
    20:59:22.713 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.222.46.3 in scylla-2, pid 1132859

without this patch
    21:04:06.424 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.181.31.3 in scylla-2...
    21:06:01.287 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.181.31.3 in scylla-2, pid 1134140
```

Fixes: #14093

Closes #14178

* github.com:scylladb/scylladb:
  repair_test: add test_reader_with_different_strategies
  repair: extract repair_reader declaration into reader.hh
  repair_meta: get_estimated_partitions fix
  repair_meta: use multishard_filter reader if the number of partitions is small
  repair_meta: delay _repair_reader creation
  database.hh: make_multishard_streaming_reader with range parameter
  database.cc: extract streaming_reader_lifecycle_policy
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 17, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 18, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 18, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 22, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 24, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 24, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 24, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Aug 24, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Sep 7, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Sep 7, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Sep 8, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
k0machi added a commit to k0machi/scylla-cluster-tests that referenced this issue Sep 13, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
roydahan pushed a commit to scylladb/scylla-cluster-tests that referenced this issue Sep 13, 2023
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.

Task: scylladb/qa-tasks#1416
@avikivity
Copy link
Member Author

Fix is a performance optimization, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test Issues related to the testing system code and environment P1 Urgent symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants