Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent segment overwrite in the cloud storage by archivers running on different nodes #3272

Closed
ztlpn opened this issue Dec 15, 2021 · 0 comments · Fixed by #3365
Closed
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working

Comments

@ztlpn
Copy link
Contributor

ztlpn commented Dec 15, 2021

It is possible that more than one archiver for the same NTP is running concurrently (say, if the leadership quickly transferred to another node and back again). If the ntp log is split into segments differently on these nodes, we can end up in a situation when a segment is overwritten with another segment with the same start offset (and thus the same name) but different last offset. This can lead to a discrepancy between the offset range in the manifest and the actual offset range of the segment data and thus to data loss.

To solve that, we can append a unique suffix to each segment name. A good candidate for this suffix is the term when the archiver acquired leadership (note that it will in general be different from the segment term). Raft guarantees that there is only one leader in each term so this prefix is unique. It is also a small incremental integer that points to the node that initiated the upload (in contrast to e.g. a random suffix which can also guarantee uniqueness). This suffix is then included in the segment metadata in the partition manifest. This way we can calculate the unique segment key from the manifest and the manifest and segment data found at keys derived from that manifest are always consistent.

Backwards compatibility: there won't be archiver term ids in the manifests for the segments uploaded by old redpanda versions. This means that the segment key must be generated the old way.

@ztlpn ztlpn added kind/bug Something isn't working area/archival labels Dec 15, 2021
@ztlpn ztlpn self-assigned this Dec 15, 2021
ztlpn added a commit to ztlpn/redpanda that referenced this issue Dec 28, 2021
See redpanda-data#3272. To ensure that
segments in the cloud storage are not overwritten by concurrent archivers
running on different nodes, we append archiver term id as a suffix for
segment paths. As raft guarantees that in each term there will be only
one leader, this ensures segment path uniqueness.
ztlpn added a commit to ztlpn/redpanda that referenced this issue Dec 29, 2021
See redpanda-data#3272. To ensure that
segments in the cloud storage are not overwritten by concurrent archivers
running on different nodes, we append archiver term id as a suffix for
segment paths. As raft guarantees that in each term there will be only
one leader, this ensures segment path uniqueness.
ztlpn added a commit that referenced this issue Dec 29, 2021
See #3272. To ensure that
segments in the cloud storage are not overwritten by concurrent archivers
running on different nodes, we append archiver term id as a suffix for
segment paths. As raft guarantees that in each term there will be only
one leader, this ensures segment path uniqueness.
ztlpn added a commit to ztlpn/redpanda that referenced this issue Jan 11, 2022
See redpanda-data#3272. To ensure that
segments in the cloud storage are not overwritten by concurrent archivers
running on different nodes, we append archiver term id as a suffix for
segment paths. As raft guarantees that in each term there will be only
one leader, this ensures segment path uniqueness.

(cherry picked from commit 88a9935)
@jcsp jcsp added area/cloud-storage Shadow indexing subsystem and removed area/archival labels Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants