-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
HHH-9840 Allow custom key implementations for 2nd level cache regions and define a typesafe contract for keys #974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgotten implementation?
I left a few line comments. Overall I still question defining the factory methods on the access strategies rather than on the RegionFactory. I get why you want it here. I just think its a trade off that needs to be discussed. |
…us key implementations
I've pushed a fix for the method names, missing factory instance and typos but didn't add the overloaded methods yet. |
@Sanne Could you clarify then why is the factory method on access strategy? I wouldn't put that directly into RegionFactory since the type of key should be per-region, but having that on Region would make better sense to me. But maybe (and very likely) I am missing something. I was also trying to figure out in what situations is it necessary to have collectionRole in the key; it seems to me that there is a separate region for each role. |
@Sanne this may be somewhat coming out of left field, but why all these I see that maybe some type safety, to help the developers on ORM itself, is gained by all the types. Not that this is neglectable, on the contrary. But I still fail to see the use of the getters. Doing a quick check, I think these aren't really used anyways (other than a couple of occurrences). Anyways, maybe pushing this too much to the simplest form here... |
@alexsnaps Hi! thanks for looking. Right, I was actually going to propose removal of all methods from the interfaces; they are there as the existing CacheKey implementation did have them and some tests used them; since I'm not familiar with this code area, when switching to interfaces I've been as conservative as possible. Since @sebersole suggested to change some of those methods (from 'getEntityOrRoleName' to the more specific 'getEntityName' for entities, and similarly for collections), I'm guessing that being conservative on these methods isn't on his priorities. My first priority was to allow using alternative implementations, and make sure the ORM 5 API wouldn't need further changes to do many possible smart things. What exactly is not defined, but my point is just that: to make sure the contract doesn't stand in the way of future experimentations and improvements. |
I have only a weak reason: noticed that all points in the ORM code which are using the cache, have primarily a reference to the AccessStrategy, while the Region needs to be retrieved.
As I'm understanding it, you could store multiple roles in the same cache. |
The NaturalIdCacheKey was different already, then we had the CacheKey which was used for both entity and collections. I transformed both to interfaces, and split CacheKey in two contracts.
It could! The original idea was to add a method on the CacheKey interface which would conceptually do "toEntityKey" to allow a transformation into the kind of keys which the persistence context uses: to reuse the key instance rather than having to allocate a key. The PersistenceContext needs a different API for its keys though, so rather than reusing the same instance as-is we where thinking of making this "toEntityKey" so that some key implementations in which this reuse was possible could do it. For that trick to fly though I need to explore some more things, and TBH it looks like it's not going to fly.. I don't normally work on ORM though: my time on this experiment is limited to some evening scraps, so talking with Steve we thought the change to interface (from implementation) was a good first step. I'm actually torn on the idea of using Object as a key directly for another reason: while I like the typesafety of the current proposal - and the possibility to add smarter methods on it in future - it would also be interesting for some configurations in which all of the tenantId and also the type and the role names are redundant, to use straight away the entity id w/o any wrapping. I was also considering a refactoring of the PersistenceContext container, so that entries would be organized in per-type containers rather than a single map; if we were to do that, the EntityKey instances would become totally redundant, and so also the option to reuse keys from the cache. |
I have tried to build something on top on that in https://github.com/rvansa/hibernate-orm/tree/HHH-9843 - though I am not using any special key, rather just not filling the entityOrRoleName. Creating a matrix/hierarchy with some fields ommitted would lead to class explosion, and I am not sure that the 4-8 bytes spared is worth it. |
@rvansa the 4-8 saved bytes mean 33% shaved off from one of the largest allocation points. On the type: yes I guess it's still needed for the current cache implementations. The point is that APIs in Hibernate ORM are maintained with a very strong sense of stability, this is one of the opportunities to change them. And time is ticking, so we need to focus on getting the API requirements out of the way to allow future improvements in all implementations. |
Sanne, in our IRC discussion I said that the specific methods on CacheKey are not used and could go away. In fact in our "pair design" of this I thought we had agreed that these contracts would have just one method... for conversion. I was simply pointing out that if you are going to (1) split out specific CacheKey types and (2) expose their state that the state ought to be specific. E.g. I like the added type-safety of splitting the interfaces. Implementors should still have a single impl if they wish, implementing all interfaces. |
While type-safety is definitely a good think, the ability to use plain Object and the external equality function sounds more and more appealing, memory-consumption-wise. And if we want only converter methods, it sounds like a way to go. |
+1 @sebersole may I change the keys back to Object? After discussing with the cache implementors, it imposes a restriction which we'd prefer to avoid. The added typesafety was nice to work with but we can think of an alternative compile-time only check in the future, something like findbugs's 'Nonnull' annotations: that would give the same safety against completely broken refactoring steps but without taking away flexibility from caches. |
That's fine. In retrospect having specific contracts essentially breaks the whole point here of allowing providers to reuse their underlying key impls. As for additional checks later during the build, meh. If you have time and an itch, go for it. |
I don't have time for adding additional checks at compile time now, it was just to say what we could this about doing that in the future, in case someone would really want the typesafety. Happy to just change to Object! |
@Sanne So, in the end the region / region factory methods will continue to take Object as key? I understand the changes suggested to ORM internals to reduce memory, but if we can carry on with keys as Object, I'd rather that too. Not sure I understood the entire discussion about externally provided comparison functions, but this is something Infinispan's had for a while, and indeed reduces the requirements on the keys since some keys do not have good equals impls, e.g. arrays. |
Replaced by #976 |
closing this, as it's replaced by #976 |
From JIRA:
I would like to allow second level cache providers the option to use custom key implementations.
The current default key implementation is very generic and is quite a large object to allocate in large quantities at runtime.
In some extreme cases, for example when the hit ratio is very low, this was making the efficiency penalty vs its benefits tradeoff questionable.
Depending on configuration settings there might be opportunities to use simpler key implementations, for example when multi-tenancy is not being used to avoid the tenant identifier, or when a cache instance is entirely dedicated to a single type to use the primary id only, skipping the role or entity name.
Even with multiple types sharing the same cache, their identifiers could be of the same org.hibernate.type.Type; in this case the cache container could use a single Type reference to implement a custom equality function without having to look up the Type on each equality check: that's a small optimisation but the equality function is invoked extremely frequently in many practical configurations.
Another reason is to make it more convenient to implement custom serialization protocols when the implementation supports clustering.
It might open the doors to more complex improvements, like HHH-9780.