Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadAll being called once for each element of getAll #1697

Open
pschmeltzer opened this issue Dec 9, 2016 · 10 comments
Open

loadAll being called once for each element of getAll #1697

pschmeltzer opened this issue Dec 9, 2016 · 10 comments

Comments

@pschmeltzer
Copy link

pschmeltzer commented Dec 9, 2016

When using Ehcache3 (3.1.1) with on heap store, calls to getAll with a key set > 1 on cold cache result in loadAll being called for each key separately. The iterable being passed to loadAll in CacheLoaderWriter always appears to be a size of 1 regardless to the amount of keys in getAll.

Here is my cache configuration:
cache = cacheManager.createCache(name, CacheConfigurationBuilder.newCacheConfigurationBuilder(key, value, ResourcePoolsBuilder.heap(entries)) .withLoaderWriter(loader) .build() );

See comments on stackoverflow

@spierre99
Copy link

Exactly same problem for me. Can you advise us about a possible date fixing ?
Very difficult for us to optimize database loading by avoiding multiple SQL requests

@henri-tremblay henri-tremblay self-assigned this Feb 15, 2017
@henri-tremblay
Copy link
Contributor

henri-tremblay commented Feb 20, 2017

I was able to easily reproduce with the following code. We will now look for a fix

@Test
public void test() {
    CacheLoaderWriter<Integer, String> loaderWriter = new CacheLoaderWriter<Integer, String>() {
        @Override
        public String load(Integer integer) throws Exception {
            return integer.toString();
        }

        @Override
        public Map<Integer, String> loadAll(Iterable<? extends Integer> iterable) {
            System.out.println("loadAll called");
            return StreamSupport.stream(iterable.spliterator(), false)
                .peek(i -> System.out.println("Id: " + i))
                .map(e -> new AbstractMap.SimpleImmutableEntry<>(e, e.toString()))
                .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
        }

        @Override
        public void write(Integer integer, String s) {
        }

        @Override
        public void writeAll(Iterable<? extends Map.Entry<? extends Integer, ? extends String>> iterable) {
        }

        @Override
        public void delete(Integer integer) throws Exception {
        }

        @Override
        public void deleteAll(Iterable<? extends Integer> iterable) {
        }
    };
    CacheManagerBuilder cacheManagerBuilder = newCacheManagerBuilder();
    try(CacheManager cacheManager = cacheManagerBuilder.build(true)) {
        Cache<Integer, String> cache = cacheManager.createCache("test",
            CacheConfigurationBuilder.newCacheConfigurationBuilder(Integer.class, String.class,
                ResourcePoolsBuilder.heap(10)).withLoaderWriter(loaderWriter).build());


        cache.getAll(IntStream.range(1, 10).mapToObj(i -> Integer.valueOf(i)).collect(Collectors.toSet()));
    }
}

@spierre99
Copy link

Hi Henry,

Do you have any clue about how to solve this challenge ?

@henri-tremblay
Copy link
Contributor

henri-tremblay commented Mar 16, 2017 via email

@henryptung
Copy link

@henri-tremblay Any update on this? Would be interested in a workaround too, if available.

@henri-tremblay
Copy link
Contributor

Still complicated and was put on ice for now.

The best workaround I can see is to do the loadAll externally. You load everything you need and the putAll. And then do the getAll whenever you need. I'm assuming here that you want to pre-load some data.

@alexsnaps
Copy link
Member

alexsnaps commented Aug 21, 2017

Actually this isn't a bug, but rather is a feature… by design. Possibly a non-desired feature though... In the meantime, as I share some responsibility in this design, let me try to explain why the behavior is as such.

What the CacheLoader APIs aims at solving

Besides the problems around consistency a cache-through pattern addresses, the loader API enables readers to not have to synchronize around populating the cache. The loader will make sure only a single thread will populate the cache for a given key, while other threads accessing the same key will just block until the value is installed.

The idea behind the design

The CacheLoaderWriter.loadAll() takes the side of optimizing the cache, not the underlying SoR. This means that the set of keys passed to the .loadAll method will match a given locked segment of the underlying Cache's Store. In the case of the on-heap store, there are no such "segments", keys are locked on a per-key basis, not a segment.

That's why, in the example above, you only get to have an Iterable with a single entry. I think this has been discussed at length in one the early dev meetings (not quite sure which one though).

Simplicity

The current design enables entries being loaded to be made available as quickly as possible. It could be something else makes this impossible currently (I haven't read the code in a long while now), but this enables two threads each populating one of two keys being loaded.

t1: cache.getAll(a, b);
t2: cache.get(b);

The first thread (t1) populates a, but b is already loaded when it accesses it, as t2 populated the key b already.
If t1 would have locked both a & b prior to loading them, t2 couldn't have parallelized the load to b while a was being loaded.

Ordering

Things can get somewhat more tricky to deal with as well, imagine the case below:

t1: cache.getAll(a, b);
t2: cache.getAll(b, a);

These two threads can now deadlock each other. Obviously ordering the lock acquisitions would be the way to solve this, but there is no contraint on the type of a & b to be ? extends Comparable neither.

Now this can all be mitigated with other locking strategies, like installing some lock object in the mapping, so that t1 could see that a is already being loaded. Then the cache could either have each thread load a single key, or transfer the responsibility of loading all keys on a single thread (and hence CacheLoaderWriter.loadAll(), so to provide more options for the loader to optimize data access for the underlying system of record.

So... what?

All this to say that the route chosen was to optimize for latency of the cache. I hope that the examples above show how that approach makes access to the cache somewhat fairer to all... While other options exists, they come at a higher price in terms of runtime because of the additional coordination requirements.

Now maybe there are better ways of going about this though... I haven't given this any further thoughts for years to be honest. Nonetheless that's a vague attempt at explaining why things are the way they are; what challenges need to be accounted for when trying to "address this issue"; and, most importantly, what tradeoffs they may bring...

@henri-tremblay
Copy link
Contributor

Highly interesting explanation. I'll take that in consideration. The problem is really the loader writer. Doing tons of queries to the database instead of one to load a cache isn't at all efficient. So we will have to find a way.

@alexsnaps
Copy link
Member

alexsnaps commented Aug 23, 2017 via email

@henri-tremblay henri-tremblay removed their assignment Sep 18, 2017
@dsylenko
Copy link

Any news on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants