StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

andreipaduroiu · 2021-03-05T22:27:41Z

Describe the bug
Each StorageWriter iteration works along these lines

Fetch a sequence of operations from DurableLog
Process each operation in sequence.
When processing them, determine which Segment they belong to and initialize an appropriate SegmentAggregator for that segment if necessary.
Flush anything if needed
Ack.

The problem is that if, in step 3, the SegmentAggregator.initialize errors out (it needs to do a Storage.getStreamSegmentInfo on LTS), then the entire iteration is aborted and all non-processed operations are lost. They have already been fetched out of the DurableLog and now there's nowhere to get them from again.

Fortunately there are plenty of safeguards and sanity checks in the StorageWriter and SegmentAggregator classes to detect situations like these before any actual data loss or corruption happens (the data itself will still be in Tier 1 since it cannot be truncated out).

To Reproduce
See above.

Expected behavior
StorageWriter should be resilient in the face of SegmentAggregator.initialize errors and not "lose" in-flight operations.

The text was updated successfully, but these errors were encountered:

Cherry-picking these PRs: #5841: Issue #5840: (SegmentStore) Fixed a deadlock in SegmentKeyCache. #5851: Issue #5850: (SegmentStore) Fixed a bug in WriterTableProcessor where it would attempt to flush to a deleted segment. #5586: Issue #5581: (SegmentStore) Disabling non-essential cache inserts if cache utilization is high #5804: Issue #5789: (SegmentStore) Improving stability during Segment Container Recoveries #5811: Issue #5810: (SegmentStore) Fixed a StorageWriter bug that could lead to data loss #5783: Issue #5771: (SegmentStore) Reducing the amount of heap memory used when doing Table Segment Reads. Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com>

andreipaduroiu added kind/bug Correctness issue area/segmentstore version/0.9.1 labels Mar 5, 2021

andreipaduroiu self-assigned this Mar 5, 2021

andreipaduroiu mentioned this issue Mar 5, 2021

Issue 5810: (SegmentStore) Fixed a StorageWriter bug that could lead to data loss #5811

Merged

sachin-j-joshi closed this as completed in #5811 Mar 8, 2021

This was referenced Mar 17, 2021

Issue 5859: (v0.9.1) Cherry-pick various SegmentStore bug fixes #5860

Merged

Cherry-pick Segment Store bug fixes into 0.9.1 #5859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

andreipaduroiu commented Mar 5, 2021

StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

Comments

andreipaduroiu commented Mar 5, 2021