Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure partition manifest consistency in the presence of concurrent archivers #3273

Closed
ztlpn opened this issue Dec 15, 2021 · 1 comment · Fixed by #7707
Closed

Ensure partition manifest consistency in the presence of concurrent archivers #3273

ztlpn opened this issue Dec 15, 2021 · 1 comment · Fixed by #7707
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working

Comments

@ztlpn
Copy link
Contributor

ztlpn commented Dec 15, 2021

Similarly to #3272 manifest file for the same ntp can be overwritten by concurrent archivers running on different nodes. This will be perceived by external observers by the manifest "going back in time". We must determine the consequences of this behavior and mitigate it. The manifest should guarantee that it includes all data from the ntp up to some last_offset and this offset is only growing.

@ztlpn ztlpn self-assigned this Dec 15, 2021
@jcsp jcsp added area/cloud-storage Shadow indexing subsystem and removed area/archival labels Nov 21, 2022
jcsp added a commit to jcsp/redpanda that referenced this issue Dec 12, 2022
The consistency rules for manifests are just that the true
leader eventually wins: we do not have a way to fence
stale leaders.  We can improve the behavior of read replicas
by having them ignore apparently time-travelling manifests.

Fixes redpanda-data#3273
@jcsp
Copy link
Contributor

jcsp commented Dec 12, 2022

We do have the last_offset and insync_offset these days, which is handy.

We definitely still run the risk of old leaders racing + overwriting a manifest, causing time travel from the point of view of read replica clusters.

I think we can cheaply improve the handling of that by doing an explicit check for the last offset going backwards: #7707

@jcsp jcsp added the kind/bug Something isn't working label Dec 12, 2022
@jcsp jcsp closed this as completed in 5d91b12 Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants