Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/inconsistent flushed offset #16105

Merged
merged 5 commits into from
Feb 6, 2024

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Jan 15, 2024

Fixes #15201

This PR improves logging to provide more information. It also fixes the problem in the consensus::flush_log method. The method is trying to detect the situation when the truncation happens concurrently. But the existing check fails in situation when there is not just concurrent truncation but truncation + log append.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Bug Fixes

  • Fix assertion triggered by interleaving of log flush and log truncation followed by append

@emaxerrno
Copy link
Contributor

@Lazin - should we add this to the opfuzz class, that was the whole point of that class, to drive sequence of events in append/truncate/flush/etc - though i suppose maybe not concurrent? curious on your take.

@Lazin
Copy link
Contributor Author

Lazin commented Jan 16, 2024

@Lazin - should we add this to the opfuzz class, that was the whole point of that class, to drive sequence of events in append/truncate/flush/etc - though i suppose maybe not concurrent? curious on your take.

The append/truncate operations are often happen on the follower when the follower receives the message with higher term and offset which is smaller than committed offset. The problem is that opfuzz runs storage code. And storage works correctly in this case. The problem is that the consensus assumes something about the storage which is not always the case. I think that without the assertion the code would work just fine.

if (flushed_up_to > lstats.dirty_offset) {
if (
flushed_up_to > lstats.dirty_offset
|| flushed_offset_term < lstats.dirty_offset_term) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible for the term to have stayed the same, but we still see the issue? I'm thinking something like this?

                    v committed  v dirty
... term 8 offset 31] t9: [32, 35]

... term 8 offset 31] t9: [32, 35]
flush called, flushed_up_to=35, flushed_offset_term=9
here we have a scheduling point

... term 8 offset 31]
truncate after 31

                    v committed               v dirty
... term 8 offset 31] t8: [32, 32] t9: [33, 36]
append records from term 8 and term 9

                    v committed               v dirty
... term 8 offset 31] t8: [32, 32] t9: [33, 36]
flush continues, lstats.dirty_offset=36, lstats.dirty_offset_term=9

flushed_up_to(35) <= lstats.dirty_offset(36)
flushed_offset_term(9) >= lstats.dirty_offset_term(9)
... so we make it past the early returns

committed offset(31) < _flushed_offset(35)
... so we hit the assertion

I'm wondering what happens if we change the condition to if flushed_up_to > lstats.committed_offset: co_return flushed::yes (assuming the storage layer flush() works as expected)? Sure it's a naive approach to avoiding the assertion, but it's not obvious to me why it wouldn't Just Work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible. Why would Raft truncate mid-term?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, agreed this shouldn't be possible with Raft: when a record is replaced via truncation, it should only be replaced with records of a higher term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncation in the same term is still possible if the request is redelivered/reordered

@Lazin Lazin requested a review from andrwng January 24, 2024 13:04
add extra logging to flush and truncate methods
Method returns the counter which is guaranteed to be incremented after
suffix truncation. It's not guaranteed to be incremented exactly once
per logical truncation. It can only be used to detect that truncation
actually happened.
Use get_log_truncation_counter method in read_write_truncate test.
Use get_truncation_counter method in write_truncate_compact method.
The 'flush_log' method checks the dirty offset before and after flush.
If the truncation happened concurrently it detects this by comparing
offsets (the offset after flush is smaller than the offset before the
flush in this case). Then it uses assertion to check if the flushed
offset is not smaller than the committed offset.

This may not work as expected in case if there is a concurrent
truncation and append. For instance, before the truncation the dirty
offset is 32. The 'flush_log' method records flushed offset 32 and calls
log->flush. Before 'flush_log' is resumed the log gets truncated and
then new value gets added. So the committed offset of the log is now 31
but dirty offset is 32. Because of that the method won't be able to
detect the concurrent flush operation will trigger an assertion.

This commit fixes this problem by checking not only dirty offset but
also a truncation counter. If the counter was incremented then the
log was truncated and new data was added to the end of the log.
@Lazin Lazin force-pushed the fix/inconsistent-flushed-offset branch from 5ef73e3 to 8a4b612 Compare January 26, 2024 11:06
@Lazin
Copy link
Contributor Author

Lazin commented Jan 26, 2024

Update: added log counter and used it to detect concurrent truncation.

@Lazin
Copy link
Contributor Author

Lazin commented Jan 29, 2024

CI failures: #16308 and #15679

@piyushredpanda
Copy link
Contributor

/ci-repeat

@Lazin Lazin merged commit 23c0ce6 into redpanda-data:dev Feb 6, 2024
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

assertion in controller log flush
6 participants