-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid creating an entry whose key is null
at construction time.
#6198
Conversation
In the case of an entry whose key is stored in a `WeakReference`, the resulting entry will never be enqueued in the `ReferenceQueue`, so I'm not sure that it will necessarily ever be removed from the cache. Compare a similar change to `MapMakerInternalMap` that was part of cl/479157599. I get the impression that there may be a similar situation with weak _values_, but I'm not biting that off right now. As always, we [recommend](https://guava.dev/CacheBuilder) that you use [Caffeine](https://github.com/ben-manes/caffeine/wiki) instead of Guava's `cache` library. RELNOTES=n/a PiperOrigin-RevId: 479318112
cb9bf09
to
51456ae
Compare
@ben-manes FYI I don't think you have code quite like this in Caffeine, in part because you delegate the work of implementing a hash table to |
Thanks for the ping! This is an interesting and subtle bug! Like you said, it shouldn't impact Caffeine since we don't have a custom hash table where such mistakes can happen. I understood why Guava decided to fork and disagreed with that decision back then, so took the easier path of delegation. That is less memory optimal for reference caching by wrapping the key/value instead of inlining the fields onto the hash entry. Caffeine optimizes towards size eviction so reference caching is second class as rarer, whereas Guava initially took a big bet on soft references as the best policy and got bit hard when that did not pan out. Caffeine's tests include a large number of reference tests and it automatically performs an internal state check after every successful test case. All the tests run against Guava as well for a compatibility check, so we could try to write a similar internal state checker if helpful. Unfortunately, I'm afraid that any effort we put into finding more bugs here won't translate into anyone having the time to fix them. |
Yes, please never invest your time in anything that would rely on us to address cache bugs in response :( The problem here wasn't something that we learned about because a user encountered and reported it; instead, a developer happened to notice the identical bug in the As a fun twist, the developer was modifying (The context for all that is weak references and interning. We do seem to be gradually training people to avoid soft references, thankfully.) |
If they care about more memory than concurrency, then maybe forking |
Thanks. I think they care about concurrency, too. It's possible that what they'll actually fork in the end is |
Maybe they can look at Spring's ConcurrentReferenceHashMap which is custom and uses a segmented locking. The JBoss implementation is similar to MapMaker's so might not be much of a difference (Google Collections had |
Hi, I'm the mysterious unnamed developer. You may remember me from ben-manes/caffeine#568. The project in question is to remove the memory overhead of interning in https://github.com/bazelbuild/bazel. I didn't bother open sourcing the design document, but in summary, the plan is to use existing bazel data structures to find canonical instances, and for various reasons (e.g. to support retrieval of map keys), we are considering maintaining our own concurrent map implementation. Thanks for the suggestion of Spring's implementation. I'm still weighing the pros and cons of each. |
I had a feeling it was you 😄 I believe NonBlockingHashMap was slimmer than the JDK’s, but had footguns. It lacks reference caching, but is a different hash table design for consideration. I think maintaining a generic hash table as a micro optimization is cumbersome, but specializing when truly needed is reasonable. Good luck in finding a nice fit. |
Actually, we don't need weak/soft reference support. So that gives us some more options. |
Oh what do you need exactly? |
A concurrent map that supports:
We might also need the ability to synchronize between two of these maps so that a key never exists in both of the two. I haven't yet decided whether this will require anything special or whether we can get away with e.g. a side effect in |
NBHM offers getk and does not store the hash code. It should be fast for traversal as it does not chain on collision, so might be trivially parallelized. However the key may not be modified even after removal due to tombstoning, among other gotchas. Cliff Click might be approachable if you find it a fit. |
Thanks, Ben. And ooo, my mistake about weak keys, thanks. That makes NBHM look appealing for another reason (on top of the lack of chaining): If you don't need a |
My mistake actually. We might need weak keys after. I was thinking we could just use the Guava interner, but we need to manually remove from it, and the synchronization about ensuring the key isn’t located in two places may also be tricky without some customization. Still up in the air. |
Assuming NBHM you could reimplement those features on top, For weak references, you could resurrect For synchronizing across data structures, simply use a striped lock (e.g. A little more work than an existing off-the-shelf but less memory overhead, high concurrency, and less tricky code to maintain. |
Avoid creating an entry whose key is
null
at construction time.In the case of an entry whose key is stored in a
WeakReference
, the resulting entry will never be enqueued in theReferenceQueue
, so I'm not sure that it will necessarily ever be removed from the cache.Compare a similar change to
MapMakerInternalMap
that was part of cl/479157599.I get the impression that there may be a similar situation with weak values, but I'm not biting that off right now.
As always, we recommend that you use Caffeine instead of Guava's
cache
library.RELNOTES=n/a