Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageWriter may lose in-memory operations if unable to initialize a SegmentAggregator. #5810

Closed
andreipaduroiu opened this issue Mar 5, 2021 · 0 comments · Fixed by #5811
Assignees

Comments

@andreipaduroiu
Copy link
Member

Describe the bug
Each StorageWriter iteration works along these lines

  1. Fetch a sequence of operations from DurableLog
  2. Process each operation in sequence.
  3. When processing them, determine which Segment they belong to and initialize an appropriate SegmentAggregator for that segment if necessary.
  4. Flush anything if needed
  5. Ack.

The problem is that if, in step 3, the SegmentAggregator.initialize errors out (it needs to do a Storage.getStreamSegmentInfo on LTS), then the entire iteration is aborted and all non-processed operations are lost. They have already been fetched out of the DurableLog and now there's nowhere to get them from again.

Fortunately there are plenty of safeguards and sanity checks in the StorageWriter and SegmentAggregator classes to detect situations like these before any actual data loss or corruption happens (the data itself will still be in Tier 1 since it cannot be truncated out).

To Reproduce
See above.

Expected behavior
StorageWriter should be resilient in the face of SegmentAggregator.initialize errors and not "lose" in-flight operations.

@andreipaduroiu andreipaduroiu self-assigned this Mar 5, 2021
sachin-j-joshi pushed a commit that referenced this issue Mar 19, 2021
Cherry-picking these PRs:

#5841: Issue #5840: (SegmentStore) Fixed a deadlock in SegmentKeyCache.
#5851: Issue #5850: (SegmentStore) Fixed a bug in WriterTableProcessor where it would attempt to flush to a deleted segment.
#5586: Issue #5581: (SegmentStore) Disabling non-essential cache inserts if cache utilization is high
#5804: Issue #5789: (SegmentStore) Improving stability during Segment Container Recoveries
#5811: Issue #5810: (SegmentStore) Fixed a StorageWriter bug that could lead to data loss
#5783: Issue #5771: (SegmentStore) Reducing the amount of heap memory used when doing Table Segment Reads.

Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant