Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

HHH-1775 collection batch fetching #337

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
Contributor

pb00068 commented May 14, 2012

Tried to implement collection batch fetching as Steve proposed in HHH-1775.
Very similiar to how entity batches are handled,
now also "batch loadable collections" get tracked seperately on the BatchFetchQueue.
The problem with "peeking" into the second level cache is mitigated inasmuch
as collections get removed from batchFetchQueue as soon they are initializated from share cache
(CollectionEntry#postInitialize).
Furthermore pull-request includes some minor optimization also for entityBatches.
The pull-request passed succesfully all hibernate matrix-tests (including BatchedManyToManyTest)

I am curious why you decided to move batchLoadableEntityKeys to a 2-level-deep structure? I can see that the iteration in getEntityBatch would be more "targetted" which might translate to faster iteration time, especially in larger persistence contexts. But, it also introduces more storage overhead. Just wanted to make sure you actually verified that the offsets between these 2 concerns warranted the change.

Contributor

pb00068 commented Oct 26, 2012

Yes, for me keeping the iteration fast even in larger contexts was an important aspect
and in my personal opinion the 2-level-deep structure benefit should overwhelm the storage overhead.
On the other hand I must admit, that so far I did not perform dedicated benchmark tests.
Not at least also because, you know, there is no specific test-scenario which is representative for all immaginable use cases. In my concrete case, while debugging I feelt a little pity seing the batchLoad iterate over 100.000 collections who did non belong to my target entity class at all, whilst there effectively were just around 20 candidate collections to retrieve. In that concrete situation the demand of a 2-level-deep structure like proposed seemed obvious to me.

Contributor

pb00068 commented Oct 26, 2012

.. and nearly the same phenomenon did notice by debugging batchloading of entities (I have many lazy properties towards distributed entity classes, so the pool of uninitialized proxies grows fast in my environment), so that's the actual reason why I applied 2-level-deep structure also on EntityKeys.

Owner

sebersole commented Oct 26, 2012

It was specifically the collection of entity batchables that I was asking about. I'll look at that specific part some more as I apply this.

I'd also propose a few changes to the structure of the collection batchable map. First, I'd use the "collection role" as the key, rather than the persister. Secondly, the use of a Map as value is not necessary. Really we just need a "unique, ordered collection" (aka LinkedHashSet) of CollectionEntry/PersistentCollection; your code never does map lookups against this nested map. Maybe even an array structure. Again, I'll think about this as I apply the changes.

Or if you have ideas about those proposals let me know.

Again, thanks for the work on this.

Contributor

pb00068 commented Oct 26, 2012

Hi Steve,

IMHO substituting the LinkedHashMap with a LinkedHashSet brings nothing in terms of performance. This because LinkedHashSet internally relies on a LinkedHashMap anyway.
Please take a look at the Source-code of LinkedHashSet and HashSet

http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java.util/LinkedHashSet.java.html
http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/util/HashSet.java.html

The alternative using a simple array structure is indeed problematic for the performance of removeBatchLoadableCollection method.

Owner

sebersole commented Oct 26, 2012

As far as removeBatchLoadableCollection, a linear array search through through the size arrays we are likely talking is pretty trivial. Keep in mind that you already group them here by "collection role". So the size array we are talking is strictly just the number of instances of that collection role. Which as you said initially was very small in your case compared to the number of all collections of all types overall.

@sebersole sebersole closed this Oct 26, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment