Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocess encryption can sometimes read stale values #7743

Closed
sync-by-unito bot opened this issue May 28, 2024 · 4 comments · Fixed by #7698
Closed

Multiprocess encryption can sometimes read stale values #7743

sync-by-unito bot opened this issue May 28, 2024 · 4 comments · Fixed by #7698

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented May 28, 2024

There are at least two scenarios where multiprocess encryption can incorrectly consider stale values to be up to date. This one is hit by our tests once the shared mapping is removed and they're able to test the multiprocess code paths within one process:

  1. Process 1 reads page X
  2. Process 2 writes to one byte range in page X
  3. Process 1 refreshes the reader mapping and marks the page as StaleIV
  4. Process 1 writes to a different byte range in page X
  5. This byte range is copied to the read mapping and the page is marked as UpToDate
  6. Process 1 reads from the byte range written by process 2 and gets garbage data

This one is more theoretical and it's unlikely anyone has actually hit it:

  1. Process 1 reads page X from the Realm on threads A and B.
  2. Process 2 writes to page X and also grows the Realm file.
  3. Process 1 refreshes the Realm on thread A. Extending the existing reader mapping fails and it creates a new one. This marks all mappings as StaleIV.
  4. Process 1 reads page X on thread B without refreshing. This rereads the IV block and then rereads page X because the IV has changed.
  5. Process 1 closes the Realm on thread B.
  6. More stuff happens and the old reader mapping gets clean up, but importantly no more writes to page X happen.
  7. Thread A reads page X. It first checks if any other mappings have an up-to-date copy of the page, and none do as the mapping with it has been discarded. The IV recheck reports no change as it's the same as when thread B checked, so the page is marked up-to-date and Process 2's write is discarded.
Copy link
Author

sync-by-unito bot commented May 28, 2024

➤ PM Bot commented:

Jira ticket: RCORE-2141

@ironage
Copy link
Contributor

ironage commented May 28, 2024

I don't want to undermine the existence of a bug in this complicated code. But could you clarify a few things for the sake of my understanding?

In scenario 1, between steps 3-4, the entire page (all byte ranges) should have been refreshed from disk due to advancing versions and the read barrier that happens before writing.

@tgoyne
Copy link
Member

tgoyne commented May 28, 2024

Reading and writing is always done via different mappings. Step 3 marks all of the pages as StaleIV, but doesn't reread anything. The read barrier on the write mapping brings that mapping fully up to date, but doesn't update the reader mapping. The write barrier on the write mapping copies just the modified bytes over to the read mapping, but not the rest of the page, and then clears StaleIV.

@ironage
Copy link
Contributor

ironage commented May 28, 2024

Got it, thanks for this analysis. Having multiple mappings of the same data doesn't make things simple.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants