Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestUpdateClusterLayout::test_simple_decommission_node_while_query_info failed #17903

Closed
Annamikhlin opened this issue Mar 19, 2024 · 30 comments
Closed
Assignees
Labels
area/gossip symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Milestone

Comments

@Annamikhlin
Copy link

seen on https://jenkins.scylladb.com/job/scylla-master/job/next/7407/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Build___x86___dtest___test_simple_decommission_node_while_query_info_1_/

self = <update_cluster_layout_tests.TestUpdateClusterLayout object at 0x7f63aaf49b50>
rf = 1

    @pytest.mark.parametrize("rf", [1, 2])
    def test_simple_decommission_node_while_query_info(self, rf):
        """
        Test decommissioning node streams all data
        1. Create a cluster with a three nodes with rf, insert data
        2. Decommission node, while node is decommissioning query data
        3. Check that the cluster returns all
        """
        cluster = self.cluster
        consistency = {1: ConsistencyLevel.ONE, 2: ConsistencyLevel.TWO}[rf]
    
        # Disable hinted handoff and set batch commit log so this doesn't
        # interfer with the test (this must be after the populate)
        cluster.set_configuration_options(values=self.default_config_options(), batch_commitlog=True)
        cluster.populate(3).start()
        node1, node2, node3 = cluster.nodelist()
    
        ep = ExecutionProfile(speculative_execution_policy=ConstantSpeculativeExecutionPolicy(delay=5, max_attempts=10))
        session = self.patient_cql_connection(node1, execution_profiles={"speculative": ep})
        create_ks(session, "ks", rf)
        create_cf(session, "cf", read_repair=0.0, columns={"c1": "text", "c2": "text"})
    
        num_keys = 2000
        insert_c1c2(session, keys=range(num_keys), consistency=consistency)
    
        def run(stop_run):
            logger.debug("Background SELECT loop starting")
            query = SimpleStatement("SELECT * FROM cf", is_idempotent=True)
            query.consistency_level = consistency
            while not stop_run.is_set():
                # stabilize the flakiness of  cassandra.OperationTimedOut error
                result = list(session.execute(query, timeout=60, execution_profile="speculative"))
                assert len(result) == num_keys
                time.sleep(0.01)
            logger.debug("Background SELECT loop done")
    
        executor = ThreadPoolExecutor(max_workers=1)
        stop_run = Event()
        t = executor.submit(run, stop_run)
    
        logger.debug("Decommissioning node2")
        node2.decommission()
    
        stop_run.set()
>       t.result()

update_cluster_layout_tests.py:1355: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.11/concurrent/futures/_base.py:449: in result
    return self.__get_result()
/usr/lib64/python3.11/concurrent/futures/_base.py:401: in __get_result
    raise self._exception
/usr/lib64/python3.11/concurrent/futures/thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
update_cluster_layout_tests.py:1342: in run
    result = list(session.execute(query, timeout=60, execution_profile="speculative"))
cassandra/cluster.py:2729: in cassandra.cluster.Session.execute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   cassandra.Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl ONE. Requires 1, alive 0" info={'consistency': 'ONE', 'required_replicas': 1, 'alive_replicas': 0}

cassandra/cluster.py:5120: Unavailable

link to full log:
https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/8608/artifact/logs-full.release.001/dtest-gw3.log/*view*/

@Annamikhlin Annamikhlin added the triage/master Looking for assignee label Mar 19, 2024
@fruch
Copy link
Contributor

fruch commented Mar 20, 2024

@xemul

Looks like the speculative from
#17124

Isn't enough, and we still have issues with this case

@xemul
Copy link
Contributor

xemul commented Mar 21, 2024

@fruch , that's different. This time it's RF=1 case and, respectively, some other error.
Though the root-cause is the same -- node being decommissioned drops in the middle of coordinator reading data from it.

@fruch
Copy link
Contributor

fruch commented Mar 21, 2024

@fruch , that's different. This time it's RF=1 case and, respectively, some other error.
Though the root-cause is the same -- node being decommissioned drops in the middle of coordinator reading data from it.

So any ideas what should be done here ?

Retries on the test level too are needed?

@xemul
Copy link
Contributor

xemul commented Mar 21, 2024

Yes, this particular test case (RF=1) also needs retry policy

@fruch
Copy link
Contributor

fruch commented Mar 21, 2024

Yes, this particular test case (RF=1) also needs retry policy

By default there retry, but it does it with any wait time between each try

In the latest driver I've introduced retry with exponential backoffs, it might help.

Worst case we'll use the @retrying decorator

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/9003/artifact/logs-full.release.001/dtest-gw3.log

11:25:16,510 802     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:1317 | test_simple_decommission_node_while_query_info[2]: Background SELECT loop starting
11:25:16,511 802     update_cluster_layout_tests    DEBUG    update_cluster_layout_tests.py:1331 | test_simple_decommission_node_while_query_info[2]: Decommissioning node2
11:25:16,512 802     ccm                            DEBUG    cluster.py          :762  | test_simple_decommission_node_while_query_info[2]: node2: nodetool cmd=['/jenkins/workspace/scylla-master/gating-dtest-release/scylla/.ccm/scylla-repository/eb9ba914a3b5dad74b390bd56cb5e0b0b221f3a3/share/cassandra/bin/nodetool', '-h', '127.0.19.2', '-p', '7199', '-Dcom.sun.jndi.rmiURLParsing=legacy', '-Dcom.scylladb.apiPort=10000', 'decommission'] wait=True timeout=None
11:25:26,775 802     cassandra.cluster              INFO     thread.py           :58   | test_simple_decommission_node_while_query_info[2]: Cassandra host 127.0.19.2:9042 removed
11:25:37,100 802     errors                         ERROR    conftest.py         :201  | test_simple_decommission_node_while_query_info[2]: test failed: 
self = <update_cluster_layout_tests.TestUpdateClusterLayout object at 0x7f06c69742d0>
rf = 2

    @pytest.mark.parametrize("rf", [1, 2])
    def test_simple_decommission_node_while_query_info(self, rf):
...
>       t.result()

update_cluster_layout_tests.py:1335: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python3.11/concurrent/futures/_base.py:449: in result
    return self.__get_result()
/usr/lib64/python3.11/concurrent/futures/_base.py:401: in __get_result
    raise self._exception
/usr/lib64/python3.11/concurrent/futures/thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
update_cluster_layout_tests.py:1322: in run
    result = list(session.execute(query, timeout=60, execution_profile="speculative"))
cassandra/cluster.py:2729: in cassandra.cluster.Session.execute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   cassandra.Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl TWO. Requires 2, alive 1" info={'consistency': 'TWO', 'required_replicas': 2, 'alive_replicas': 1}

We don't know exactly when the failure happened, but there's is some time skew between the gossiper state on the different nodes, maybe it could be related.

https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/9003/artifact/logs-full.release.001/1713266740135_update_cluster_layout_tests.py%3A%3ATestUpdateClusterLayout%3A%3Atest_simple_decommission_node_while_query_info%5B2%5D/node1.log

INFO  2024-04-16 11:25:16,761 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Added node=127.0.19.2 as leaving node, coordinator=127.0.19.2
INFO  2024-04-16 11:25:26,769 [shard 0: gms] gossip - Node 127.0.19.2 will be removed from gossip at [2024-04-19 11:25:26]: (expire = 1713525926650808194, now = 1713266726769557352, diff = 259199 seconds)
INFO  2024-04-16 11:25:26,769 [shard 0: gms] storage_service - Removing tokens {5891315356779712207, 6644659875106048232, -2031446522143542976, -1339657334030648931, 5838800144473790431, -1087334779704532933, -8383015161740972323, 8773471205176591131, 6912798132461832631, 3717260124234998954, 8135580524606499218, 4933321621651553896, -3529854142061287875, 944908903197350322, 2323213516530356194, 5210426810099403453, -843934660261095933, 2759279211832045506, -1786456376184600316, 5025060517413836954, -4349969502544043667, -6428069016026695978, -7757793885581674551, 5053358897542410693, -9123492370604733927, -3906728608793360141, 8987579983231929028, 2166186930472097772, -7290137992606841054, -1605963558749861388, 6905126352450554444, -8250833397122565434, -2666532607924357436, 1215248352812595075, -8554757200446758708, 4000325743140511997, -7256096156876373137, 4754961149759252595, 4356042801330801630, -3585952018635665764, -681108285369060237, -890634243769899107, 1290479945171787923, 490508265641925696, -7638162489423952013, 5991668076577058186, -6846446430853575501, -418122622487981631, 1163362551414476681, 8341748621233819274, -4357689942768086381, -4793763540871442734, 4454469103699165864, 884970651562637249, -7995142503368220435, -8512729302308700154, 2184526208529264499, -2210124173410267824, -6001434137684958872, 7563710321617009371, -5213978271186008579, 281782964130013334, 2575262936716339675, 7288510930298791853, -3695537609538691868, -2026841578939377143, 1689254937207465260, 6769179491252691833, -1710928851820482709, -7495698103938377309, 1335835558472006077, -6608429648366051397, -6982370174209044761, -4892838996119274779, 4319900748812173419, 2351099028959321612, 4136573589411882082, -483737536017999223, 7508019147268439240, -3227366782462686254, -8493504934087279512, 5445728252211477690, 5596356869627243435, -3581469472219835472, 5765144158164561348, 3171612364226044170, -1069037103026158528, 4932196898354234264, 4841676013308949158, 3805248283753355501, -1660693761005670849, -6475166866379544332, 1283885037852735622, 73237807841118136, 7344535293637662646, -555840834402098957, -3220061649498783275, -5260705750725696223, 5065273683178989740, -3132056941341919104, -2128448015943753105, -6237895571494309166, -8257467152724812505, 4809056495046093060, 5286163765150215785, 2517480053976404036, 7934588651401386120, 8863440582040505883, -5355642581755262732, 8867323084617227568, -2861824630524274521, 5776956868809591593, 536605655663666810, 4839466488747731164, 346274002625390229, -1412568537117720854, 6703377118113130965, -8753625470583795148, -2482528345926185687, 4473861638386060030, -761785685472326945, -4653016610283253804, -7307002206290252570, 6790578762033027203, 7365873025645016848, -1308882598114554236, 2881939345991343725, 330356054951637239, -7471650378831701210, -3718044853112813470, -5237155506983993904, -4322707272056588769, 7274163241398874658, -4328273820866165713, 6233951190839803636, -5347988187907652033, -6998784162274967568, 6841645083700731738, 7531068057029355097, 7506190771825138757, -3443357348592608456, -3867369555498443551, -3606032365505199453, 458612298264247947, -990911556236338444, -3567908088463334192, 7450165448553565473, -5963647082639245007, -7938419236006887180, -3178626983328396412, -2366858862616935868, 6914926159380859275, 1483879124364805465, -6547351477720796458, -6968615697949295180, -5900065862775838322, -3983697879061704664, -6542949937169553779, -3414495331200089013, 2642610129339370100, -2858548199820433014, 211383771780500332, 8601797220760160316, 172300028043658342, 6320587992748016843, -6012958947976741772, 973205644783078422, -4591334571100566658, -395646771075481983, -2405206232743394646, -929858687441420751, -8493989283486966391, -663672797910092149, 1664604425903602073, -4817812307515838538, 4127886456153855100, 3433960951770185831, -8140251277961778339, 1632311276169607909, -4518318492681372027, 5971437662375852051, -4415798652866769798, 1569845083638992367, -7985849684787394815, 2129824080546893913, 7098018678586638120, 780378183898194417, 6103429096176593197, -7050085566522277913, -4826328894877873266, -642162754761390133, 6556572881174326460, 6804672528331881225, 7904245421341331333, -800063994131700890, -6154417049109997506, -4027547119579080126, -3680723863647929642, 8714754348719589938, 7325452533743630083, 905034297321147957, 3288697737534049979, -7467134825333908308, -4830353164696999746, -4786359414579141008, -2851168185663217572, -1849303305513063199, 5747264797394444525, -1097201187066386653, 210168207066501541, -8771184042031226931, -2246085644763286912, 2727789141367505640, 6922218549077074152, 6665375491264621087, -421796115288114185, 5910505856771488586, 1221858477319857334, -5427294909339199760, 6691496144871184403, 9130255957642153722, -6014878435421761686, 8236329181209882066, 3008113070067010506, -585398302636352438, -4723506843766988094, 9103986645464215128, 5176536278660852104, -5294923803900852767, -7730001888207452277, -8984702201626944438, 9219017292486173348, 9117346164178134408, -689079972408618492, 6753651768033787538, 5634550712798799984, 2576851616749061304, -3647332396282185934, -6400779486513499350, -4714519167820546956, 4988427383081601709, 5104585783524001938, -2371312252419132585, -6172130783907659898, -6363797426307627053, -6423718704098759614, 5672886711910360127, -3470367200531792276, 2217990690910410207, 1683205516478513101, -5762845774329330223, 8745112632757820255, 2122883438539877718, 728664244813829160, -4431523980487374147, 8907576002763265837} for 127.0.19.2
INFO  2024-04-16 11:25:26,771 [shard 0: gms] gossip - Removed endpoint 127.0.19.2
INFO  2024-04-16 11:25:26,772 [shard 0: gms] gossip - InetAddress 4f70ff29-f312-451a-9701-3c25f84b1063/127.0.19.2 is now DOWN, status = LEFT
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Started to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Finished to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Marked ops done from coordinator=127.0.19.2

https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/9003/artifact/logs-full.release.001/1713266740135_update_cluster_layout_tests.py%3A%3ATestUpdateClusterLayout%3A%3Atest_simple_decommission_node_while_query_info%5B2%5D/node2.log

INFO  2024-04-16 11:25:26,645 [shard 0:strm] storage_service - DECOMMISSIONING: unbootstrap done
INFO  2024-04-16 11:25:26,645 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: becoming a group 0 non-voter
INFO  2024-04-16 11:25:26,645 [shard 0:strm] raft_group0 - becoming a non-voter (my id = 4f70ff29-f312-451a-9701-3c25f84b1063)...
INFO  2024-04-16 11:25:26,647 [shard 0:strm] raft_group0 - became a non-voter.
INFO  2024-04-16 11:25:26,647 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: became a group 0 non-voter
INFO  2024-04-16 11:25:26,648 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: leaving token ring
INFO  2024-04-16 11:25:26,651 [shard 0:strm] gossip - Node 127.0.19.2 will be removed from gossip at [2024-04-19 11:25:26]: (expire = 1713525926650808194, now = 1713266726651274423, diff = 259199 seconds)
INFO  2024-04-16 11:25:26,651 [shard 0:strm] storage_service - Removing tokens {5891315356779712207, 6644659875106048232, -2031446522143542976, -1339657334030648931, 5838800144473790431, -1087334779704532933, -8383015161740972323, 8773471205176591131, 6912798132461832631, 3717260124234998954, 8135580524606499218, 4933321621651553896, -3529854142061287875, 944908903197350322, 2323213516530356194, 5210426810099403453, -843934660261095933, 2759279211832045506, -1786456376184600316, 5025060517413836954, -4349969502544043667, -6428069016026695978, -7757793885581674551, 5053358897542410693, -9123492370604733927, -3906728608793360141, 8987579983231929028, 2166186930472097772, -7290137992606841054, -1605963558749861388, 6905126352450554444, -8250833397122565434, -2666532607924357436, 1215248352812595075, -8554757200446758708, 4000325743140511997, -7256096156876373137, 4754961149759252595, 4356042801330801630, -3585952018635665764, -681108285369060237, -890634243769899107, 1290479945171787923, 490508265641925696, -7638162489423952013, 5991668076577058186, -6846446430853575501, -418122622487981631, 1163362551414476681, 8341748621233819274, -4357689942768086381, -4793763540871442734, 4454469103699165864, 884970651562637249, -7995142503368220435, -8512729302308700154, 2184526208529264499, -2210124173410267824, -6001434137684958872, 7563710321617009371, -5213978271186008579, 281782964130013334, 2575262936716339675, 7288510930298791853, -3695537609538691868, -2026841578939377143, 1689254937207465260, 6769179491252691833, -1710928851820482709, -7495698103938377309, 1335835558472006077, -6608429648366051397, -6982370174209044761, -4892838996119274779, 4319900748812173419, 2351099028959321612, 4136573589411882082, -483737536017999223, 7508019147268439240, -3227366782462686254, -8493504934087279512, 5445728252211477690, 5596356869627243435, -3581469472219835472, 5765144158164561348, 3171612364226044170, -1069037103026158528, 4932196898354234264, 4841676013308949158, 3805248283753355501, -1660693761005670849, -6475166866379544332, 1283885037852735622, 73237807841118136, 7344535293637662646, -555840834402098957, -3220061649498783275, -5260705750725696223, 5065273683178989740, -3132056941341919104, -2128448015943753105, -6237895571494309166, -8257467152724812505, 4809056495046093060, 5286163765150215785, 2517480053976404036, 7934588651401386120, 8863440582040505883, -5355642581755262732, 8867323084617227568, -2861824630524274521, 5776956868809591593, 536605655663666810, 4839466488747731164, 346274002625390229, -1412568537117720854, 6703377118113130965, -8753625470583795148, -2482528345926185687, 4473861638386060030, -761785685472326945, -4653016610283253804, -7307002206290252570, 6790578762033027203, 7365873025645016848, -1308882598114554236, 2881939345991343725, 330356054951637239, -7471650378831701210, -3718044853112813470, -5237155506983993904, -4322707272056588769, 7274163241398874658, -4328273820866165713, 6233951190839803636, -5347988187907652033, -6998784162274967568, 6841645083700731738, 7531068057029355097, 7506190771825138757, -3443357348592608456, -3867369555498443551, -3606032365505199453, 458612298264247947, -990911556236338444, -3567908088463334192, 7450165448553565473, -5963647082639245007, -7938419236006887180, -3178626983328396412, -2366858862616935868, 6914926159380859275, 1483879124364805465, -6547351477720796458, -6968615697949295180, -5900065862775838322, -3983697879061704664, -6542949937169553779, -3414495331200089013, 2642610129339370100, -2858548199820433014, 211383771780500332, 8601797220760160316, 172300028043658342, 6320587992748016843, -6012958947976741772, 973205644783078422, -4591334571100566658, -395646771075481983, -2405206232743394646, -929858687441420751, -8493989283486966391, -663672797910092149, 1664604425903602073, -4817812307515838538, 4127886456153855100, 3433960951770185831, -8140251277961778339, 1632311276169607909, -4518318492681372027, 5971437662375852051, -4415798652866769798, 1569845083638992367, -7985849684787394815, 2129824080546893913, 7098018678586638120, 780378183898194417, 6103429096176593197, -7050085566522277913, -4826328894877873266, -642162754761390133, 6556572881174326460, 6804672528331881225, 7904245421341331333, -800063994131700890, -6154417049109997506, -4027547119579080126, -3680723863647929642, 8714754348719589938, 7325452533743630083, 905034297321147957, 3288697737534049979, -7467134825333908308, -4830353164696999746, -4786359414579141008, -2851168185663217572, -1849303305513063199, 5747264797394444525, -1097201187066386653, 210168207066501541, -8771184042031226931, -2246085644763286912, 2727789141367505640, 6922218549077074152, 6665375491264621087, -421796115288114185, 5910505856771488586, 1221858477319857334, -5427294909339199760, 6691496144871184403, 9130255957642153722, -6014878435421761686, 8236329181209882066, 3008113070067010506, -585398302636352438, -4723506843766988094, 9103986645464215128, 5176536278660852104, -5294923803900852767, -7730001888207452277, -8984702201626944438, 9219017292486173348, 9117346164178134408, -689079972408618492, 6753651768033787538, 5634550712798799984, 2576851616749061304, -3647332396282185934, -6400779486513499350, -4714519167820546956, 4988427383081601709, 5104585783524001938, -2371312252419132585, -6172130783907659898, -6363797426307627053, -6423718704098759614, 5672886711910360127, -3470367200531792276, 2217990690910410207, 1683205516478513101, -5762845774329330223, 8745112632757820255, 2122883438539877718, 728664244813829160, -4431523980487374147, 8907576002763265837} for 127.0.19.2
INFO  2024-04-16 11:25:26,653 [shard 0:strm] gossip - Removed endpoint 127.0.19.2
INFO  2024-04-16 11:25:26,655 [shard 0:main] service_level_controller - update_from_distributed_data: configuration polling loop aborted
INFO  2024-04-16 11:25:26,655 [shard 0:strm] storage_service - Announcing that I have left the ring for 10000ms
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: left token ring
INFO  2024-04-16 11:25:36,655 [shard 0:strm] node_ops - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Stopped heartbeat_updater
INFO  2024-04-16 11:25:36,655 [shard 0:strm] node_ops - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Started decommission_done[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: ignore_nodes={}, leaving_nodes={127.0.19.2}, replace_nodes={}, bootstrap_nodes={}, repair_tables={}
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Started to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Finished to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Marked ops done from coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,657 [shard 0:strm] node_ops - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Finished decommission_done[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: ignore_nodes={}, leaving_nodes={127.0.19.2}, replace_nodes={}, bootstrap_nodes={}, repair_tables={}
INFO  2024-04-16 11:25:36,657 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: leaving Raft group 0
INFO  2024-04-16 11:25:36,657 [shard 0:strm] raft_group0 - leaving group 0 (my id = 4f70ff29-f312-451a-9701-3c25f84b1063)...
INFO  2024-04-16 11:25:36,658 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: left Raft group 0

https://jenkins.scylladb.com/job/scylla-master/job/gating-dtest-release/9003/artifact/logs-full.release.001/1713266740135_update_cluster_layout_tests.py%3A%3ATestUpdateClusterLayout%3A%3Atest_simple_decommission_node_while_query_info%5B2%5D/node3.log

INFO  2024-04-16 11:25:16,766 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Added node=127.0.19.2 as leaving node, coordinator=127.0.19.2
INFO  2024-04-16 11:25:26,798 [shard 0: gms] gossip - Node 127.0.19.2 will be removed from gossip at [2024-04-19 11:25:26]: (expire = 1713525926650808194, now = 1713266726798554526, diff = 259199 seconds)
INFO  2024-04-16 11:25:26,798 [shard 0: gms] storage_service - Removing tokens {5891315356779712207, 6644659875106048232, -2031446522143542976, -1339657334030648931, 5838800144473790431, -1087334779704532933, -8383015161740972323, 8773471205176591131, 6912798132461832631, 3717260124234998954, 8135580524606499218, 4933321621651553896, -3529854142061287875, 944908903197350322, 2323213516530356194, 5210426810099403453, -843934660261095933, 2759279211832045506, -1786456376184600316, 5025060517413836954, -4349969502544043667, -6428069016026695978, -7757793885581674551, 5053358897542410693, -9123492370604733927, -3906728608793360141, 8987579983231929028, 2166186930472097772, -7290137992606841054, -1605963558749861388, 6905126352450554444, -8250833397122565434, -2666532607924357436, 1215248352812595075, -8554757200446758708, 4000325743140511997, -7256096156876373137, 4754961149759252595, 4356042801330801630, -3585952018635665764, -681108285369060237, -890634243769899107, 1290479945171787923, 490508265641925696, -7638162489423952013, 5991668076577058186, -6846446430853575501, -418122622487981631, 1163362551414476681, 8341748621233819274, -4357689942768086381, -4793763540871442734, 4454469103699165864, 884970651562637249, -7995142503368220435, -8512729302308700154, 2184526208529264499, -2210124173410267824, -6001434137684958872, 7563710321617009371, -5213978271186008579, 281782964130013334, 2575262936716339675, 7288510930298791853, -3695537609538691868, -2026841578939377143, 1689254937207465260, 6769179491252691833, -1710928851820482709, -7495698103938377309, 1335835558472006077, -6608429648366051397, -6982370174209044761, -4892838996119274779, 4319900748812173419, 2351099028959321612, 4136573589411882082, -483737536017999223, 7508019147268439240, -3227366782462686254, -8493504934087279512, 5445728252211477690, 5596356869627243435, -3581469472219835472, 5765144158164561348, 3171612364226044170, -1069037103026158528, 4932196898354234264, 4841676013308949158, 3805248283753355501, -1660693761005670849, -6475166866379544332, 1283885037852735622, 73237807841118136, 7344535293637662646, -555840834402098957, -3220061649498783275, -5260705750725696223, 5065273683178989740, -3132056941341919104, -2128448015943753105, -6237895571494309166, -8257467152724812505, 4809056495046093060, 5286163765150215785, 2517480053976404036, 7934588651401386120, 8863440582040505883, -5355642581755262732, 8867323084617227568, -2861824630524274521, 5776956868809591593, 536605655663666810, 4839466488747731164, 346274002625390229, -1412568537117720854, 6703377118113130965, -8753625470583795148, -2482528345926185687, 4473861638386060030, -761785685472326945, -4653016610283253804, -7307002206290252570, 6790578762033027203, 7365873025645016848, -1308882598114554236, 2881939345991343725, 330356054951637239, -7471650378831701210, -3718044853112813470, -5237155506983993904, -4322707272056588769, 7274163241398874658, -4328273820866165713, 6233951190839803636, -5347988187907652033, -6998784162274967568, 6841645083700731738, 7531068057029355097, 7506190771825138757, -3443357348592608456, -3867369555498443551, -3606032365505199453, 458612298264247947, -990911556236338444, -3567908088463334192, 7450165448553565473, -5963647082639245007, -7938419236006887180, -3178626983328396412, -2366858862616935868, 6914926159380859275, 1483879124364805465, -6547351477720796458, -6968615697949295180, -5900065862775838322, -3983697879061704664, -6542949937169553779, -3414495331200089013, 2642610129339370100, -2858548199820433014, 211383771780500332, 8601797220760160316, 172300028043658342, 6320587992748016843, -6012958947976741772, 973205644783078422, -4591334571100566658, -395646771075481983, -2405206232743394646, -929858687441420751, -8493989283486966391, -663672797910092149, 1664604425903602073, -4817812307515838538, 4127886456153855100, 3433960951770185831, -8140251277961778339, 1632311276169607909, -4518318492681372027, 5971437662375852051, -4415798652866769798, 1569845083638992367, -7985849684787394815, 2129824080546893913, 7098018678586638120, 780378183898194417, 6103429096176593197, -7050085566522277913, -4826328894877873266, -642162754761390133, 6556572881174326460, 6804672528331881225, 7904245421341331333, -800063994131700890, -6154417049109997506, -4027547119579080126, -3680723863647929642, 8714754348719589938, 7325452533743630083, 905034297321147957, 3288697737534049979, -7467134825333908308, -4830353164696999746, -4786359414579141008, -2851168185663217572, -1849303305513063199, 5747264797394444525, -1097201187066386653, 210168207066501541, -8771184042031226931, -2246085644763286912, 2727789141367505640, 6922218549077074152, 6665375491264621087, -421796115288114185, 5910505856771488586, 1221858477319857334, -5427294909339199760, 6691496144871184403, 9130255957642153722, -6014878435421761686, 8236329181209882066, 3008113070067010506, -585398302636352438, -4723506843766988094, 9103986645464215128, 5176536278660852104, -5294923803900852767, -7730001888207452277, -8984702201626944438, 9219017292486173348, 9117346164178134408, -689079972408618492, 6753651768033787538, 5634550712798799984, 2576851616749061304, -3647332396282185934, -6400779486513499350, -4714519167820546956, 4988427383081601709, 5104585783524001938, -2371312252419132585, -6172130783907659898, -6363797426307627053, -6423718704098759614, 5672886711910360127, -3470367200531792276, 2217990690910410207, 1683205516478513101, -5762845774329330223, 8745112632757820255, 2122883438539877718, 728664244813829160, -4431523980487374147, 8907576002763265837} for 127.0.19.2
INFO  2024-04-16 11:25:26,801 [shard 0: gms] gossip - removed 127.0.19.2 from _seeds, updated _seeds list = {127.0.19.1}
INFO  2024-04-16 11:25:26,803 [shard 0: gms] gossip - Removed endpoint 127.0.19.2
INFO  2024-04-16 11:25:26,803 [shard 0: gms] gossip - InetAddress 4f70ff29-f312-451a-9701-3c25f84b1063/127.0.19.2 is now DOWN, status = LEFT
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Started to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Finished to check if nodes={127.0.19.2} have left the cluster, coordinator=127.0.19.2
INFO  2024-04-16 11:25:36,655 [shard 0:strm] storage_service - decommission[e322f230-5e23-41a6-9370-d2c01a1eb5f4]: Marked ops done from coordinator=127.0.19.2

Maybe without consistent topology change this test is bound to be flaky...
@kbr-scylla WDYT?

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

@Annamikhlin the recent case is with rf=2 and it could be a distinct issue (comparing to rf=1)

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

A wild guess:
The coordinator is racing with decommission and it uses a stale effective replication map expecting a replica on node2 which is decommissioned concurrently and by the time storage_proxy tried to read the data from node2, the node already completed decommission.

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

@xemul mentions that #17124 (comment) could be related.

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

This seems like a race between the node state in e_r_m and the gossip state:

inet_address_vector_replica_set storage_proxy::get_endpoints_for_reading(const sstring& ks_name, const locator::effective_replication_map& erm, const dht::token& token) const {
    auto endpoints = erm.get_endpoints_for_reading(token);
    auto it = boost::range::remove_if(endpoints, std::not_fn(std::bind_front(&storage_proxy::is_alive, this)));
    endpoints.erase(it, endpoints.end());

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

Maybe we should reconsider return unavailable error in this case, when we don't have enough live nodes to satisfy cl.
It might be to attempt to contact even nodes that we think are down and let the rpc fail or timeout and wait for client to retry.

@bhalevy
Copy link
Member

bhalevy commented Apr 17, 2024

Also, as @xemul suggest, we could keep the node liveliness state in locator::node (available via locator::topology) and update it via the storage_service so it would stick to the effective_replication_map / token_metadata / topology version in use.

Cc @avikivity @tgrabiec @kbr-scylla @gleb-cloudius @kostja

@kbr-scylla
Copy link
Contributor

#17903 (comment)

similar situation here: #17786 (comment)
decommission racing with coordinated write. The write uses "wrong" replica set (in that case there was a crash)

@kbr-scylla
Copy link
Contributor

kbr-scylla commented Apr 18, 2024

@bhalevy

Reads to decommissioning node should be drained before we mark it as dead or purge it from the cluster.

The correct decommissioning procedure is as follows:

  • start double writing -- to old and new replicas
  • stream data from old to new replicas
  • start reading from new replicas. Drain reads that use old replicas.
  • start writing only to new replicas. Drain writes that go to both.
  • Only now we can mark the node as dead and/or purge it from cluster.

We don't need any hacks like storing liveness in locator::topology or retrying in driver whatever.
We just need to follow the correct order of operations, and then all reads and writes will naturally be successful.
There should be no races.

I believe we did follow the correct order of operations. There had to be a regression at some point, which caused this issue to pop up. And issues like #17786 (comment).

The problem is that it's very easy to break the order if we are not extra careful. We need good, precise tests in test.py that would not become flaky once we introduce a regression like that, but catch the problem instantly.

@tgrabiec
Copy link
Contributor

@kbr-scylla This is a run without consistent topology, so there's no synchronization.

@kbr-scylla
Copy link
Contributor

There is -- through node_ops RPCs and ring_delay sleeps.

It is not fault tolerant and it is not reliable, but in the happy case, when there are no network problems and no node failures, it should work. We should not be observing data path failures.

It worked at some point. This issue did not exist forever. There was a regression.

@yaronkaikov
Copy link
Contributor

Seen again in https://jenkins.scylladb.com/job/scylla-master/job/next/7549/ (still running)

@tchaikov
Copy link
Contributor

@kbr-scylla kbr-scylla added this to the 6.0 milestone May 7, 2024
@kbr-scylla kbr-scylla added symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework and removed triage/master Looking for assignee labels May 7, 2024
@avikivity
Copy link
Member

@xemul
Copy link
Contributor

xemul commented May 14, 2024

@mykaul mykaul added the P1 Urgent label May 15, 2024
@mykaul mykaul modified the milestones: 6.0, 6.1 May 15, 2024
@yaronkaikov
Copy link
Contributor

@gleb-cloudius
Copy link
Contributor

There is -- through node_ops RPCs and ring_delay sleeps.

Where exactly? There is no ring delay with node_ops between write_both_read_old and write_both_read_new state because there is no write_both_read_new state. It goes directly from write_both_read_old to write_new_read_new (AKA normal).

The problem here happens during a table scan ("select * from..."). This is a long running operation that hold an erm with a topology. If a node is decommission while a scan read is running the read will still get it from the topology it is working with, but the node will be declared as dead (non existing nodes are dead in gossiper::is_alive impl), so the read will fail.

It worked at some point. This issue did not exist forever. There was a regression.

At what point do you think it worked?

@gleb-cloudius
Copy link
Contributor

Reproduces more easily with:

diff --git a/service/storage_proxy.cc b/service/storage_proxy.cc
index 21ec5f8ba3..f9f23c17ee 100644
--- a/service/storage_proxy.cc
+++ b/service/storage_proxy.cc
@@ -5842,6 +5842,7 @@ storage_proxy::query_partition_key_range_concurrent(storage_proxy::clock_type::t
                     co_await coroutine::maybe_yield();
                 }
             }
+            co_await sleep(std::chrono::milliseconds(100));
             slogger.trace("creating range read executor for range {} in table {}.{} with targets {}",
                         range, schema->ks_name(), schema->cf_name(), filtered_endpoints);
             try {

@Annamikhlin
Copy link
Author

@kbr-scylla
Copy link
Contributor

So it looks like this is not a regression, the problem was there since forever, the test at some point got modified and started catching it: #17124 (comment)

@fruch mentioned:

It is known that during charging topology we can get client side errors (in c-s we have default retry of 10 times), so I can think we need better retries in this case.

There were attempts to fix the flakiness, but they succeeded only partially. https://github.com/scylladb/scylla-dtest/pull/4036.

There's no point in fixing it for gossip mode. We moved to raft-based topology to fix issues like this one among other things.

The most sensible way out, IMO, is to disable the test in gossip mode. @temichus reassigning to you.

And taking off P1 label.

@kbr-scylla kbr-scylla assigned kostja and temichus and unassigned kbr-scylla Jun 4, 2024
@kbr-scylla kbr-scylla added area/gossip and removed P1 Urgent labels Jun 4, 2024
@temichus
Copy link
Contributor

temichus commented Jun 4, 2024

So it looks like this is not a regression, the problem was there since forever, the test at some point got modified and started catching it: #17124 (comment)

@fruch mentioned:

It is known that during charging topology we can get client side errors (in c-s we have default retry of 10 times), so I can think we need better retries in this case.

There were attempts to fix the flakiness, but they succeeded only partially. scylladb/scylla-dtest#4036.

There's no point in fixing it for gossip mode. We moved to raft-based topology to fix issues like this one among other things.

The most sensible way out, IMO, is to disable the test in gossip mode. @temichus reassigning to you.

And taking off P1 label.

PR to disable this test for gossip-topology-changes https://github.com/scylladb/scylla-dtest/pull/4398

@kbr-scylla
Copy link
Contributor

Test was disabled in gossip mode, should not be flaky in raft-topo mode --- closing.

We can reopen if we observe in raft-topo.

@kbr-scylla kbr-scylla closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gossip symptom/ci stability Issues that failed in ScyllaDB CI - tests and framework
Projects
None yet
Development

No branches or pull requests