Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Later updates to EpochSlots can overwrite earlier updates #17711

Closed
carllin opened this issue Jun 3, 2021 · 1 comment
Closed

Later updates to EpochSlots can overwrite earlier updates #17711

carllin opened this issue Jun 3, 2021 · 1 comment
Assignees

Comments

@carllin
Copy link
Contributor

carllin commented Jun 3, 2021

Problem

We currently track 255 EpochSlots bitvectors that are updated via push_epoch_slots(slots) where we:

  1. Read out all the individually indexed EpochSlots here:
    let epoch_slots = gossip.crds.get(&label)?.value.epoch_slots()?;
  2. Update the appropriate EpochSlots, then push them to the pending push queue
    self.local_message_pending_push_queue

An issue arises when there are two updates to the same indexed EpochSlots structure (say index 0), before the asynchronous flush of the pending flush queue happens in the gossip thread, so the updated value is not yet observable in the Crds table.

In this case, the second update will read out a stale EpochSlots from the Crds table in step 1) above that doesn't include the first update, updates that value with the second update, then pushes that item to the pending flush queue. Then when the push happens later, the later update doesn't have the first update, thereby overwriting the first update when both values are pushed to the network.

We could probably simulate the problem in this test if we got rid of the calls to flush_push_queue() in between the updates

cluster_info.push_epoch_slots(&range[..16000]);
cluster_info.flush_push_queue();
.

Proposed Solution

Should we also check the push queue for more recent versions of the same message label in crds.get()?

behzadnouri added a commit to behzadnouri/solana that referenced this issue Jun 3, 2021
epoch-slots may be overwritten before they are written to crds table:
solana-labs#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
carllin pushed a commit to carllin/solana that referenced this issue Jun 3, 2021
epoch-slots may be overwritten before they are written to crds table:
solana-labs#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
carllin pushed a commit to carllin/solana that referenced this issue Jun 3, 2021
epoch-slots may be overwritten before they are written to crds table:
solana-labs#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
behzadnouri added a commit that referenced this issue Jun 4, 2021
epoch-slots may be overwritten before they are written to crds table:
#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
mergify bot pushed a commit that referenced this issue Jun 4, 2021
epoch-slots may be overwritten before they are written to crds table:
#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.

(cherry picked from commit 60b0a13)
mergify bot added a commit that referenced this issue Jun 4, 2021
epoch-slots may be overwritten before they are written to crds table:
#17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.

(cherry picked from commit 60b0a13)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@carllin carllin closed this as completed Jun 9, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2022

This issue has been automatically locked since there has not been any activity in past 7 days after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants