Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh cache before writing contents to bundle #9123

Merged
merged 5 commits into from Jul 19, 2023
Merged

Conversation

lettertwo
Copy link
Member

@lettertwo lettertwo commented Jul 5, 2023

Fixes #9121

Background

As described in #9121 and comments, a race condition exists for our LMDB cache backend when used in a multi threaded way. The condition is:

  1. Thread A performs some task that necessitates an LMDB read transaction
  2. Concurrently, Thread B performs a packaging request and caches the result
  3. Subsequently, Thread A is tasked with using the packaging result, but fails because it has a live read transaction with a stale snapshot that does not include the write from Thread B.

The fix is to manually reset the read transaction in Thread A so that the next read sees a fresh snapshot of the db.

Note that this is not a bug in LMDB, and there is a documented API for dealing with this scenario.

Implementation

Because requests deal with a Cache abstraction rather than dealing directly with the cache backend, this PR opts to add a Cache.refresh() method to the interface. It is a no-op for all of the existing backends except for LMDBCache, which simply calls resetReadTxn() on the backing store.

The cache.refresh() method is currently called right before running a WriteBundleRequest (and only when the preceeding PackageRequest was not run in the main thread), which is the only place we've seen this type of failure so far, but there may be other places where we should add it.

馃毃 Test instructions

Still trying to work out if we can add a test for this, as it is not easy to replicate in the wild.

// This only occurs if the reading thread has a transaction that was created before the writing thread committed,
// and the transaction is still live when the reading thread attempts to get the written value.
// See https://github.com/parcel-bundler/parcel/issues/9121
options.cache.refresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a write of the sourcemap down lower (starting ~line 174 in your changed code) - while there are a few more async calls in between here and there, and so it's more likely that LMDB will have reset the transaction by then, should we move this refresh outside of this condition so that even if we're dealing with a large blob we refresh the cache anyway to avoid (potential) failures when (maybe) getting the sourcemap from the cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While thinking through this a bit more, i realized that WriteBundleRequest will always run in the main thread (it doesn't ever create a handle with the worker farm like the PackageRequest does). I think this means that we actually can do something closer to the 'correct' approach by moving the reset to WriteBundlesRequest right after we've received the result from the PackageRequest.

I pushed an update that does this. The overview is now (starting at Parcel => run(buildRequest) => run(writeBundlesRequest)):

  • [main thread] for each bundle:
    • if useMainThread is false
      • [worker thread] run(packageRequest)
      • [main thread] cache.refresh()
    • else
      • [main thread] run(packageRequest)
    • [main thread] run(writeBundleRequest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense.. I dd consider this but I had confused myself about what was running where.

@marcins
Copy link
Contributor

marcins commented Jul 5, 2023

In terms of testing this - while we have an artificial reproduction with the same shape (https://github.com/marcins/lmdb-testbed/blob/testable-version/index.js), I don't really think there's much benefit to adding that to Parcel since all that does is show that adding a call to resetReadTxn avoids the issue in a similarly shaped scenario.. and we know that's true already.

The ideal reproduction would be using Packaging/WriteBundle but I think that would be overly complicated (and difficult to make reliably fail as a failing test case).

I guess TL;DR - I don't think there's an effective way to test this fix in Parcel - we can demonstrate that it fixes out internal (complex) repro case with visual regression tests, and I think that should be enough?

..only if the packaging request didn't run in the main thread.
@mischnic
Copy link
Member

Would it make sense to call the method something along the lines of flushReaders or invalidateReaders? refresh sounds very general (this is part of the public API) and maybe we need another similar helper method in the future

@marcins
Copy link
Contributor

marcins commented Jul 10, 2023

Naming is hard! We originally thought about calling it sync but were worried about confusion with async/sync.

That's effectively what's happening - ensuring the current cache instance is "synced" with the persistent store. I don't think "flush" is an accurate representation, though the api is undocumented the assumption is any "put" is committed before resolving.

.ensureThisInstanceIsSeeingTheLatestState 馃槄

@marcins
Copy link
Contributor

marcins commented Jul 10, 2023

Maybe another way of thinking about it that might help to stimulate the right name to come to mind is to write what the API docs would look like for this method in the interface. I would expect it's something like:

In a multi-threaded environment, where there are potentially multiple Cache instances writing to the cache, ensure that this instance has the latest view of the changes that may have been written to the cache in other threads.

Unfortunately, I think it's difficult to get away from the fact that this is a bit of a leaky abstraction of how LMDB works, rather than a design decision for the cache API.

@mischnic
Copy link
Member

Oh, the reader has to call this, not the writer. Now I see why it's not "flushing"

And do add that comment to the definition in the Cache interface 馃槈

I think another perspective here is that when reading, lmdb behaves like a two layer cache (the top layer being determined by the read transaction). And the new method brings in changes from the bottom layer into the top layer.

@lettertwo
Copy link
Member Author

I added @marcins suggested documentation verbatim 馃槈

If i understand correctly, we've decided to stick with refresh()?

@lettertwo lettertwo merged commit 7ff54c7 into v2 Jul 19, 2023
15 of 16 checks passed
@lettertwo lettertwo deleted the 9121-fix-cache-io-race branch July 19, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Under load, the cache can fail to read a key it has just written
4 participants