-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage_service: Replicate and advertise tokens early in the boot up process #4710
Conversation
I don't understand this explanation. What's |
Sorry, I meant storage_proxy.cc. I will give you an example of the race: n1, n2 n2 complains:
|
On Mon, Jul 15, 2019 at 07:20:43PM -0700, Asias He wrote:
> I don't understand this explanation. What's `proxy_service`? Why a race between gossip and token replication is causing problems? Replica side is not supposed to need token metadata. Maybe the problem is that there's a race between token metadata replication and CQL server init?
Sorry, I meant storage_proxy.cc.
I will give you an example of the race:
n1, n2
n2 is down, n1 think n2 is down
n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends reads/writes to n2, but n2 hasn't replicated the token_metadata to all the shards.
Regular read writes do not consult token_metadata on a replica. Counters
and MV updates do though.
… n2 complains:
```
[shard 52] token_metadata - sorted_tokens is empty in first_token_index!
[shard 45] token_metadata - sorted_tokens is empty in first_token_index!
[shard 38] token_metadata - sorted_tokens is empty in first_token_index!
[shard 41] token_metadata - sorted_tokens is empty in first_token_index!
[shard 7] token_metadata - sorted_tokens is empty in first_token_index!
[shard 7] token_metadata - sorted_tokens is empty in first_token_index!
[shard 47] storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error (sorted_tokens is empty in first_token_index!)
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#4710 (comment)
--
Gleb.
|
…process When a node is restarted, there is a race between gossip starts (other nodes will mark this node up again and send requests) and the tokens are replicated to other shards. Here is an example: - n1, n2 - n2 is down, n1 think n2 is down - n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends reads/writes to n2, but n2 hasn't replicated the token_metadata to all the shards. - n2 complains: token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! token_metadata - sorted_tokens is empty in first_token_index! storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error (sorted_tokens is empty in first_token_index!) The code path looks like below: 0 stoarge_service::init_server 1 prepare_to_join() 2 add gossip application state of NET_VERSION, SCHEMA and so on. 3 _gossiper.start_gossiping().get() 4 join_token_ring() 5 _token_metadata.update_normal_tokens(tokens, get_broadcast_address()); 6 replicate_to_all_cores().get() 7 storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS The race talked above is at line 3 and line 6. To fix, we can replicate the token_metadata early after it is filled with the tokens read from system table before gossip starts. So that when other nodes think this restarting node is up, the tokens are already replicated to all the shards. In addition, this patch also fixes the issue that other nodes might see a node miss the TOKENS and STATUS application state in gossip if that node failed in the middle of a restarting process, i.e., it is killed after line 3 and before line 7. As a result we could not replace the node. Tests: update_cluster_layout_tests.py Fixes: scylladb#4709 Fixes: scylladb#4723
89c4525
to
e645b03
Compare
I pushed a new branch.
|
@tgrabiec please re-review |
The fix for #4723 looks good. I am not convinced that #4709 is fixed though. What prevents requests from other nodes to arrive to the node before you load and replica the tokens? You do it earlier, but the server is already up, right? The other nodes may have not noticed that the node is down before they sent the requests. |
If other nodes do not notice this node is down, they can send requests at only time during this node restarts and this node might not be fully initialized. It won't be a big problem because the request will fail since the node is not fully initialized. This patch completely eliminates the race if other nodes have noticed the node is down which is very likely because restarts will send SHUTDOWN message. Even if they do not notice, this patch reduces the big race window to a tiny window. gossip starts |
Is it guaranteed to fail? Or is it undefined behavior? Isn't the fact that the request fails the problem which this pull request is trying to fix?
Well, I think we should eliminate the race completely rather than add complexity to reduce the likelihood of failure. |
If the request queries the token_metadata, it is guaranteed to fail because we throw if ti is not initialized.
It tries to fix the restart node case.
This patch tries to fix the restart node case and it fixes it 100%. For the case a node does not notice the node is down, it also helped but not 100%. We can add a new issue to track. |
On Fri, Jul 26, 2019 at 6:06 PM Asias He ***@***.***> wrote:
The fix for #4723 <#4723> looks
good.
I am not convinced that #4709
<#4709> is fixed though. What
prevents requests from other nodes to arrive to the node before you load
and replica the tokens? You do it earlier, but the server is already up,
right? The other nodes may have not noticed that the node is down before
they sent the requests.
If other nodes do not notice this node is down, they can send requests at
only time during this node restarts and this node might not be fully
initialized. It won't be a big problem because the request will fail since
the node is not fully initialized.
Is it guaranteed to fail? Or is it undefined behavior?
If the request queries the token_metadata, it is guaranteed to fail
because we throw if ti is not initialized.
Yes. But it may rely on something else which may result in more subtle
errors. In general, the system may not behave correctly if you allow
requests to act on an uninitialized system. The proper fix is to not let
requests in when the system is in such a state.
Initializing things sooner, as this PR does, may reduce the likelihood of
error but doesn't eliminate it. We should solve the problem completely now,
when we identified it, and not cover up some cases and wait until a
disaster happens.
|
On Mon, Jul 29, 2019, 18:36 Tomasz Grabiec ***@***.***> wrote:
On Fri, Jul 26, 2019 at 6:06 PM Asias He ***@***.***> wrote:
> The fix for #4723 <#4723> looks
> good.
> I am not convinced that #4709
> <#4709> is fixed though. What
> prevents requests from other nodes to arrive to the node before you load
> and replica the tokens? You do it earlier, but the server is already up,
> right? The other nodes may have not noticed that the node is down before
> they sent the requests.
>
> If other nodes do not notice this node is down, they can send requests at
> only time during this node restarts and this node might not be fully
> initialized. It won't be a big problem because the request will fail
since
> the node is not fully initialized.
>
> Is it guaranteed to fail? Or is it undefined behavior?
>
> If the request queries the token_metadata, it is guaranteed to fail
> because we throw if ti is not initialized.
>
Yes. But it may rely on something else which may result in more subtle
errors. In general, the system may not behave correctly if you allow
requests to act on an uninitialized system. The proper fix is to not let
requests in when the system is in such a state.
Initializing things sooner, as this PR does, may reduce the likelihood of
error but doesn't eliminate it. We should solve the problem completely now,
when we identified it, and not cover up some cases and wait until a
disaster happens.
As I mentioned, we should create a new issue to track proxy service startup
and early requests race in general. This patch fixes the original issue
already.
—
… You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4710>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACOETFRIEUCBCTBHCEI3W3QB3B33ANCNFSM4IDU24IQ>
.
|
I created #4771 to track storage proxy initialization issue. |
When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:
reads/writes to n2, but n2 hasn't replicated the token_metadata to all
the shards.
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
(sorted_tokens is empty in first_token_index!)
The code path looks like below:
The race talked above is at line 3 and line 6.
To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.
In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.
Tests: update_cluster_layout_tests.py
Fixes: #4709
Fixes: #4723