Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadingCache.getIfPresent(key) returns null after LoadingCache.getUnchecked(key) returned non null value #2131

Open
alvinlin123 opened this issue Aug 12, 2015 · 15 comments
Labels

Comments

@alvinlin123
Copy link

I am using the LoadingCache in a highly concurrent system and I observed some behavior that I am not sure is intended (to me it looks like a bug :) ). I wrote the following sample code to illustrate the problem :

public static void main(String[] args) {
        final LoadingCache<String, Object> cache = 
            CacheBuilder.newBuilder()
                                 .expireAfterAccess(65, TimeUnit.MINUTES)
                                 .build(new CacheLoader<String, Object>(){

                                                    @Override
                                                     public Object load(String arg0) throws Exception {
                                                                Object obj = new Object();
                                                                    System.out.println("creating object: " + obj);
                                                                    return obj;
                                                     }
                                  });


        int threadCount = 600;
        final String key = "hello/world";
        Runnable task = new Runnable() {

            @Override
            public void run() {
                try {
                    Object valueFromUnchecked = cache.getUnchecked(key);
                    if (valueFromUnchecked == null) {
                        System.out.println(Thread.currentThread().getName() + " valueFromUnchecked is null!!!");
                    }

                    Object value = cache.getIfPresent(key);
                    if (value == null) {
                        System.out.println(Thread.currentThread().getName() + " value is null!!!");
                    }

                    if (value != valueFromUnchecked) {
                        System.out.println(String.format(Thread.currentThread().getName() + "valueFromUnchecked:%s, value:%s", valueFromUnchecked, value));
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        };

        for (int i = 0; i < threadCount; i++) {
            Thread t = new Thread(task);
            t.setName("thread-" + i);
            t.start();
        }

Sometimes thing are fine, I don't get the output of " value is null!!!" lines and some times I do. I never get the output of "valueFromUnchecked from getUnchecked is null!!!"

My understanding of from LoadingCache java doc, is that if getUnchecked() does not return null then getIfPresent() shouldn't return null as well; given that the cache has not expired yet.

@ben-manes
Copy link
Contributor

A cursory glance...

  • Its surprising that this fails
  • Expiration isn't needed to reproduce this
  • It seems that StrongValueReference returns null, though the value later materializes
  • Inserting a memory fence resolves the issue (Unsafe.fullFence())
  • Caffeine passes

It seems like final field visibility rules aren't working as expected, since that should avoid reordering and a SVR should never return null. Probably needs to be tested on JDK6/7 for good measure. Nothing in the code looks obviously wrong fence-wise.

@ben-manes
Copy link
Contributor

Actually the statement on SVR is wrong and just some confusion when debugging. The problem is related to the if (count != 0) guard clause. If the cache is pre-populated with a different entry then the test passes.

@alvinlin123
Copy link
Author

thanks for looking into this. Just FYI I compiled the above code into jvm 7 class file (because my prof system runs jvm 7). But I did run the code in both jvm7 and jvm8 both produced same result.

@ben-manes
Copy link
Contributor

Okay, so what is happening is that when LoadingValueReference#waitForValue() returns the count = 0. This is because the computing thread doesn't increment it until after its published through the value reference. This lets the reader see the value immediately, but before the count changes. Then when the reader immediately performs a getIfPresent the count is still zero and it receives null due to an early exit. When the computing thread finally increments the count then everything works normally. This means that one could argue that the expected happens-before relationship is being violated.

The best approach would be to simply remove the guard conditions. I don't remember when they were introduced. Most likely Bob added them in the original computing map and, since we inherited that, we never thought about his optimization. I don't think it buys anything and a load barrier shouldn't be required for correctness.

Sorry that this wasn't immediately obvious. Its been a very long time since I've reasoned through this code. Unfortunately I don't know if this will be fixed will be fixed any time soon.

@ben-manes
Copy link
Contributor

Actually the guards might have been okay when Bob added them, and an oversight when we introduced LoadingCache. When we used the Map interface there was no way to peek into the cache, as a get was computing. So that's when it was probably introduced to mirror the style in other methods and we didn't consider that interaction.

@kluever kluever added the type=defect Bug, not working as expected label Oct 1, 2015
@kevinb9n kevinb9n self-assigned this Oct 1, 2015
@kevinb9n kevinb9n added this to the 19.0 milestone Oct 1, 2015
lowasser added a commit that referenced this issue Oct 12, 2015
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=105235264
@cgdecker
Copy link
Member

From what I can tell, the attempt to fix this in commit bb5673f didn't work. I tried the sample program provided and am still seeing the issue. I'm looking into it now.

@cgdecker
Copy link
Member

So, the count != 0 check doesn't seem to have any bearing on the behavior here, because the count being incremented is only an indicator that the loaded value has been stored (that is, that the entry's LoadingValueReference has been replaced with a StrongValueReference, in this case.) Even with the check removed, the code ends up calling LoadingValueReference.get(), which returns null.

The general issue seems to be that get[Unchecked] is allowed to return as soon as it can get the loaded value. Meanwhile, the thread that's actually doing the loading gets the loaded value and then must store the loaded value. Meanwhile, other threads that were waiting on the value loading can call getIfPresent, racing with the storing of the value.

I can see a couple possible ways to fix this:

  1. Don't let threads blocking on the loading of the value see the value until it's been stored. This seems like (potentially?) a fairly invasive change, though I haven't tried it yet.
  2. LoadingValueReference.get() could be changed to check if its futureValue future is done, returning the value from it if so on the assumption that the value, having loaded successfully, is in the process of being stored. This is a very simple change and fixes the issue for the example program given, but I'm not certain it's a correct change.

It's also not entirely clear to me that this is something that should be changed, weird though it may be. What's the value in calling getIfPresent for a key on the same thread that already called get[Unchecked] on it?

I'm going to spend a bit more time looking at this today, but I'm likely going to push any changes back to 20.0 since this isn't a simple matter of removing a count != 0 check.

@ben-manes
Copy link
Contributor

I can't reproduce it with master and the test program. Ran with TestNG @Test(threadPoolSize = 10, invocationCount = 1000). I haven't looked into your analysis though.

@ben-manes
Copy link
Contributor

Sorry, had a typo. reproduced.

@cgdecker
Copy link
Member

I've got a solution in the works... it looks like it's not TOO invasive to make blocking gets only return after the value has been stored.

cgdecker added a commit that referenced this issue Oct 26, 2015
*** Reason for rollback ***

It turns out this doesn't fix the Guava issue. As such, I think it's probably preferable to keep the count != 0 check as it prevents unnecessary work when there are no entries in the segment. I have another fix in progress that seems to work.

*** Original change description ***

Attempt to address #2131 by removing the if (count != 0) check.

***
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=106181279
@cgdecker cgdecker modified the milestones: 20.0, 19.0 Dec 11, 2015
shubhamtagra added a commit to qubole/rubix that referenced this issue Sep 5, 2016
LoadingCache.getIfPresent returns null sometimes even when the entry is being loaded by another thread.
This leads to wrong invalidations and deletion of data. This fix works around the problem till the issue
is fixed in guava
@cgdecker cgdecker removed this from the 20.0 milestone Oct 6, 2016
@alvinlin123
Copy link
Author

so ... when will the fix of the issue be release? looks like it was intended, then removed from 20.0 release?

@uttamtiwari
Copy link

uttamtiwari commented Sep 28, 2018

I am also facing similar issue, can any one tell me which version I should use where the fix has been applied?
However I am not calling getUnchecked I am directly calling getIfPresent function and seeing null value sometimes.

@raghsriniv raghsriniv added the P3 label Jun 24, 2019
@kevinb9n kevinb9n removed their assignment Jun 19, 2021
@su1234rita
Copy link

su1234rita commented Feb 1, 2022

Any update on this fix? I am also facing a similar issue where getIfPresent returns a null sometimes.

@cgdecker
Copy link
Member

cgdecker commented Feb 2, 2022

I think it's unlikely we'll fix this at this point, though if someone wants to contribute a fix we'd be happy to look at that.

Note that we recommend people use Caffeine instead of Guava's cache; it's based on Guava's cache APIs but maintained by @ben-manes who has far more caching expertise than anyone on the Guava team these days.

@ben-manes
Copy link
Contributor

ben-manes commented Feb 2, 2022

Unfortunately I don't have the time to assist in maintaining Guava's Cache, and like the other coauthors, have moved on. I would prefer to have the difference between libraries be primarily performance and features for new code, rather than have users deal with bugs in their existing code. I can help review patches if anyone is interested in sending bug fixes.

I am familiar with the code base but Charles and I inherited it from Bob, where it was already very complex by the time we started adding / rewriting the caching logic. Unfortunately we didn't realize the linearization problems early enough so there were some original sins, likely further mistakes introduced later on, and we were both 20%ers (with Charles going full time for a short period). For example how to handle explicit writes for an in-flight load is not linearized (puts stomp the load whereas removals no-op, iirc). When writing narrow portions from the perspective of a transient best-effort cache the issues seemed benign, but taking a step back it's clearly wrong. The overwhelming amount of code due to forking ConcurrentHashMap makes it hard to wade through. Caffeine benefits from decorating the map, the addition of Java 8's computes, being much more test driven, and learning from these past experiences.

I wrote a simple Lincheck test below that could be extended and drive iterations on fixes for these linearization problems. However as @cgdecker mentioned since this cache is in maintenance mode, if possible then please try Caffeine where I've tried to be more diligent on correctness.

Lincheck Test
import org.jetbrains.kotlinx.lincheck.LinChecker;
import org.jetbrains.kotlinx.lincheck.annotations.Operation;
import org.jetbrains.kotlinx.lincheck.annotations.Param;
import org.jetbrains.kotlinx.lincheck.paramgen.IntGen;
import org.jetbrains.kotlinx.lincheck.strategy.stress.StressOptions;
import org.jetbrains.kotlinx.lincheck.verifier.VerifierState;
import org.testng.annotations.Test;

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;

@Param(name = "key", gen = IntGen.class, conf = "1:10")
public final class GuavaLincheckTest extends VerifierState {
  private final LoadingCache<Integer, Integer> cache;

  public GuavaLincheckTest() {
    cache = CacheBuilder.newBuilder().build(CacheLoader.from(k -> -k));
  }

  @Operation
  public Integer getIfPresent(@Param(name = "key") int key) {
    return cache.getIfPresent(key);
  }

  @Operation
  public Integer getUnchecked(@Param(name = "key") int key) {
    return cache.getUnchecked(key);
  }

  @Test
  public void stressTest() {
    var options = new StressOptions()
        .iterations(100)
        .invocationsPerIteration(10_000);
    new LinChecker(getClass(), options).check();
  }

  @Override
  protected Object extractState() {
    return cache.asMap();
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants