Introduce safe_get option which ensures key:value integrity even with socket corruption #959
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Edit] - Note that while the maintainer is focused on errors in my original issue submission, I am focused on the fact that multiple users have reported that Dalli sometimes returns incorrect values, and the get implementation does not validate against this. It is a pretty unacceptable failure case so we have pursued an implementation similar to this PR which rules it out, see aha-app@f6da276
--
Thinking more about #956 - I have high confidence that socket corruption explains the incorrect behavior I observed. But I do not have high confidence I found the exact place socket corruption occurred (unfortunately I do not have the stack trace) or that there are no other potential places. It seems in general that connections do not get locked between write ops and reading responses; an error or timeout between any of those could potentially lead to socket corruption. That may be worth fixing and I think my first PR is still worth considering, but I have thought of a more robust approach.
With
safe_get: true
, we will issuegetk
instead ofget
ops to memcached, and ensure that the returned key matches the requested key. This guarantees that even in the case of socket corruption, we cannot return incorrect values for requested keys. The connection is closed if keys do not match, so that the connection manager will eventually re-establish a connection and recover gracefully.Using
getk
vsget
comes at the cost of some performance overhead due to key retrieval and comparison. This performance cost would be most significant when caching a large number of small values with comparatively large keys. That is why I lean towards making this an opt-in change - but certainly for our purposes and likely for many other teams, key:value integrity and safety would far outweigh the marginal performance cost.