Fix deadlock in auth cache (solves Tobira freezing problem) #1141
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1129
See first commit description for the technical details on how this was caused. But the gist is: I incorrectly used
DashMap
, holding locks across await points. This causes a deadlock if the timing is right and two specific tasks are scheduled to run in the same thread. I could have fixed the code around the await point, but since this is a very easy and subtle mistake to make, I decided to use a different concurrent hashmap that can better deal with these scenarios.The second commit also fixes the cache dealing with a 0 cache duration (which is supposed to disable the cache). See commits for more details.
On the ETH test system I deployed v2.6 plus this patch and verified that the freeze is not happening anymore in the only situation where I could reliably reproduce it before: starting Tobira and immediately making an authenticated request. Since the cache_duration was set to 0, the timing somehow worked out most of the time. Doing that does not freeze Tobira anymore with this patch (I tried several times).
Some additional tests/details
The
scc
hashmap has no problem when a lock is held on the same thread thatretain
is called. I tested it like this:This outputs:
Of course a single test doesn't guarantee that this is supported by the library, but all docs indicate as well that the library can deal with these situations. "async" is used everywhere and these kinds of situations regularly occur in async code.
Edit: thinking about it more, I suppose the key feature here is that
retain_async
can yield, whereas theretain
from dashmap could only block when it couldn't make any progress.