cloud_storage: protect atime writes with a mutex #16648
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes a potential cloud storage cache access time tracker file corruption which can happen during redpanda shutdown.
This corruption is negligible as the file doesn't store any important data. We ignore parsing failures during startup. Only the file header contains variable sized fields and corruption in this section of the file is impossible. The remaining contents of the file are read using fixed size buffers. At most we would have read a bogus file hash or a bogus timestamp. None of which could do harm.
cache::save_access_time_tracker
can run concurrently during redpanda shutdown sequence.cache::start
starts a timer which callscache::maybe_save_access_time_tracker
, thencache::stop
callcache::save_access_time_tracker
unconditionally.It is possible that stop invokes
cache::save_access_time_tracker
while this method is running in a fiber started by the timer. If that's the case, insave_access_time_tracker
method we didn't have any mutual exclusion mechanism when writing to the temporary file and the subsequent rename.During a particular ordering of events, the following error gets logged:
The best explanation is the following ordering of events:
1: f1, f2: calls save_access_time_tracker
2: f1, f2: opens accesstime.tmp
3: f1, f2: calls _save_access_time_tracker
4: f1, f2: co_await ss::rename_file(tmp_path.string(),
final_path.string())
In the step 4, only 1 fiber can succeed and the other will fail with the exception.
This is a benign race-condition. However, another less pleasant race-condition is also possible:
1: f1, f2: calls save_access_time_tracker
2: f1, f2: opens accesstime.tmp
3: f1, f2: calls _save_access_time_tracker
4: f1, f2: auto out = co_await ss::make_file_output_stream(std::move(f))
note: make_file_out_stream creates a buffered stream with 8K buffer
5: f1: co_await _access_time_tracker.write(out) note: write does acquire
a exclusive lock so nothing to worry about there
6: f2: co_await _access_time_tracker.write(out);
7: f2: co_await out.flush();
8: f1: co_await out.flush();
Notice that f1 wrote first to the file, then f2 did and flushed the in-memory buffers, then (!) f1 flushed its in-memory buffers.
It is possible that the flush invoked by f1 will overwrite up to 8K bytes at an arbitrary position in the file initially written by f2. Thus, f2 will become corrupted.
Backports Required
Release Notes
Bug Fixes