Skip to content

Conversation

@lhoguin
Copy link
Contributor

@lhoguin lhoguin commented Apr 24, 2024

Crashes could happen because compaction would wrongly write over valid messages, or truncate over valid messages, because when looking for messages into the files it would encounter leftover data that made it look like there was a message, which prompted compaction to not look for the real messages hidden within.

To avoid this we ensure that there can't be leftover data as a result of compaction. We get this guarantee by blanking data in the holes in the file before we start copying messages closer to the start of the file. This requires us to do a few more writes but we know that the only data in the files at any point are valid messages.

Related discussion: #10902

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)

Crashes could happen because compaction would wrongly write
over valid messages, or truncate over valid messages, because
when looking for messages into the files it would encounter
leftover data that made it look like there was a message,
which prompted compaction to not look for the real messages
hidden within.

To avoid this we ensure that there can't be leftover data
as a result of compaction. We get this guarantee by blanking
data in the holes in the file before we start copying messages
closer to the start of the file. This requires us to do a few
more writes but we know that the only data in the files at any
point are valid messages.

Note that it's possible that some of the messages in the files
are no longer referenced; that's OK. We filter them out after
scanning the file.

This was also a good time to merge two almost identical scan
functions, and be more explicit about what messages should be
dropped after scanning the file (the messages no longer in the
ets index and the fan-out messages that ended up re-written in
a more recent file).
@essen essen force-pushed the loic-fix-cq-shared-store-crashes branch from 4d39cd7 to fcd011f Compare April 26, 2024 11:08
@lhoguin lhoguin changed the title WIP: CQ: Fix shared store crashes DO NOT MERGE CQ: Fix shared store crashes Apr 26, 2024
@lhoguin lhoguin marked this pull request as ready for review April 26, 2024 15:47
@lhoguin lhoguin merged commit 9fbc0fb into main Apr 29, 2024
@lhoguin lhoguin deleted the loic-fix-cq-shared-store-crashes branch April 29, 2024 07:59
lhoguin added a commit that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants