Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node.start(wait_other_notice=True, ...) may return too early because implementation of wait_other_notice may observe past state #563

Closed
kbr-scylla opened this issue Mar 8, 2024 · 4 comments · Fixed by #564
Assignees

Comments

@kbr-scylla
Copy link

wait_other_notice in ccmlib/scylla_node.py currently works by looking at the /gossiper/live endpoint API

if wait_other_notice:

def watch_rest_for_alive(self, nodes, timeout=120, wait_normal_token_owner=True):

If other nodes see the node as UP, and the node sees other nodes as UP, the check will succeed.

But consider the following:

  • 2 nodes A, B
  • kill node A
  • start node A with wait_other_notice=True

The start call will often return quickly, but not because node B noticed that A restarted
but because node B didn't even notice that A was killed yet!

So node B may still consider A as UP because it was UP before it was killed; B's failure detector just didn't react to that event yet.

That's why the old implementation in ccmlib/node.py used log markers, it watched other nodes' logs from a point after the node was restarted, effectively serializing the events.

This bug was introduced in #462.

cc @temichus

@kbr-scylla
Copy link
Author

We could fix it by combining the old behavior (with log markers) with the new behavior (watching REST). For example -- we could wait for the "is now UP" message first, then use watch_rest_for_alive.

Although, looking at Scylla code, I don't really understand why "is now UP" is not sufficient; it happens strictly after modifying live_endpoints set in gossiper, which is used by the /gossiper/live REST API.
Perhaps it's necessary for some older versions of Scylla.
And I see that watch_rest_for_alive contains additional logic for watching for NORMAL too, so that is still useful.

@kbr-scylla
Copy link
Author

Example failure:
https://github.com/scylladb/scylla-dtest/pull/4042/files

the test sometimes fails with

        # starting node1 - it should reconnect and run as is
        node1.start(wait_other_notice=True, wait_for_binary_proto=True)
        #  due to CL=1 it does not make sense to read while not all nodes are UP
        # for node in cluster.nodelist():
        #     wait_for_nodes_status(node, ["UN", "UN", "UN"])
    
>       result = list(session.execute("SELECT * FROM cf"))

update_cluster_layout_tests.py:1136: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cassandra/cluster.py:2726: in cassandra.cluster.Session.execute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   cassandra.Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl ONE. Requires 1, alive 0" info={'consistency': 'ONE', 'required_replicas': 1, 'alive_replicas': 0}

(https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/artsiom_mishuta/job/dtest-pytest-gating/240/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/test_simple_kill_remained_node_while_decommissioning_96_100_/)

because we execute SELECT too fast. start returned too early, and then node 2 marked node 1 as DOWN, and then we executed SELECT before node2 marked node 1 as UP again.

bhalevy added a commit to bhalevy/scylla-ccm that referenced this issue Mar 12, 2024
…r_alive

scylladb#462 introduced
watch_rest_for_alive that replaced the calls to watch_log_for_alive
on the scylla node(s) start path.

But a node is killed and then restarted, and other nodes
miss the kill event, `watch_rest_for_alive` will consider
that node already as up as seen by the other nodes,
while previously, watch_log_for_alive, waited until
other nodes discovered this node as up again, based
on markes taken right before (re)starting that node.

This change brings this call back.

Fixes scylladb#563

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@nyh
Copy link
Contributor

nyh commented Mar 14, 2024

But consider the following:

* 2 nodes A, B
* kill node A
* start node A with `wait_other_notice=True`

The start call will often return quickly, but not because node B noticed that A restarted but because node B didn't even notice that A was killed yet!

I see that you and Benny liked #564 so I'll merge it, but I want to point out that I think the real problem in your example above is that "kill node A" neglects to wait for node A to be killed. If you want to do "kill node A" followed in quick succession by "start node A", it doesn't make sense to not wait for node A to be killed before starting it again.

The benefit of #564 is that it will allow the test to succeed despite the test bug that it forgets to wait for the "kill node A" to finish, so in that sense it is a good patch because it makes tests less easy to write incorrectly.

@kbr-scylla
Copy link
Author

but I want to point out that I think the real problem in your example above is that "kill node A" neglects to wait for node A to be killed. If you want to do "kill node A" followed in quick succession by "start node A", it doesn't make sense to not wait for node A to be killed before starting it again.

I see it differently. The semantics of wait_other_notice should be as the name suggests: we wait until other nodes notice this particular start event. (Not some earlier start event.)

The test is correct when written using this assumption, this semantics. (And it was indeed so -- before #462)

fruch pushed a commit that referenced this issue Mar 15, 2024
…r_alive

#462 introduced
watch_rest_for_alive that replaced the calls to watch_log_for_alive
on the scylla node(s) start path.

But a node is killed and then restarted, and other nodes
miss the kill event, `watch_rest_for_alive` will consider
that node already as up as seen by the other nodes,
while previously, watch_log_for_alive, waited until
other nodes discovered this node as up again, based
on markes taken right before (re)starting that node.

This change brings this call back.

Fixes #563

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants