Multiprocess encryption can sometimes read stale values #7743

sync-by-unito · 2024-05-28T18:09:16Z

There are at least two scenarios where multiprocess encryption can incorrectly consider stale values to be up to date. This one is hit by our tests once the shared mapping is removed and they're able to test the multiprocess code paths within one process:

Process 1 reads page X
Process 2 writes to one byte range in page X
Process 1 refreshes the reader mapping and marks the page as StaleIV
Process 1 writes to a different byte range in page X
This byte range is copied to the read mapping and the page is marked as UpToDate
Process 1 reads from the byte range written by process 2 and gets garbage data

This one is more theoretical and it's unlikely anyone has actually hit it:

Process 1 reads page X from the Realm on threads A and B.
Process 2 writes to page X and also grows the Realm file.
Process 1 refreshes the Realm on thread A. Extending the existing reader mapping fails and it creates a new one. This marks all mappings as StaleIV.
Process 1 reads page X on thread B without refreshing. This rereads the IV block and then rereads page X because the IV has changed.
Process 1 closes the Realm on thread B.
More stuff happens and the old reader mapping gets clean up, but importantly no more writes to page X happen.
Thread A reads page X. It first checks if any other mappings have an up-to-date copy of the page, and none do as the mapping with it has been discarded. The IV recheck reports no change as it's the same as when thread B checked, so the page is marked up-to-date and Process 2's write is discarded.

sync-by-unito · 2024-05-28T18:09:45Z

➤ PM Bot commented:

Jira ticket: RCORE-2141

ironage · 2024-05-28T18:52:08Z

I don't want to undermine the existence of a bug in this complicated code. But could you clarify a few things for the sake of my understanding?

In scenario 1, between steps 3-4, the entire page (all byte ranges) should have been refreshed from disk due to advancing versions and the read barrier that happens before writing.

tgoyne · 2024-05-28T19:35:01Z

Reading and writing is always done via different mappings. Step 3 marks all of the pages as StaleIV, but doesn't reread anything. The read barrier on the write mapping brings that mapping fully up to date, but doesn't update the reader mapping. The write barrier on the write mapping copies just the modified bytes over to the read mapping, but not the rest of the page, and then clears StaleIV.

ironage · 2024-05-28T20:21:42Z

Got it, thanks for this analysis. Having multiple mappings of the same data doesn't make things simple.

tgoyne mentioned this issue May 28, 2024

RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

Merged

tgoyne closed this as completed in #7698 Jun 6, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocess encryption can sometimes read stale values #7743

Multiprocess encryption can sometimes read stale values #7743

sync-by-unito bot commented May 28, 2024

sync-by-unito bot commented May 28, 2024

ironage commented May 28, 2024 •

edited

Loading

tgoyne commented May 28, 2024

ironage commented May 28, 2024

Multiprocess encryption can sometimes read stale values #7743

Multiprocess encryption can sometimes read stale values #7743

Comments

sync-by-unito bot commented May 28, 2024

sync-by-unito bot commented May 28, 2024

ironage commented May 28, 2024 • edited Loading

tgoyne commented May 28, 2024

ironage commented May 28, 2024

ironage commented May 28, 2024 •

edited

Loading