Skip to content

Conversation

Sanne
Copy link
Member

@Sanne Sanne commented Jun 1, 2015

From JIRA:

I would like to allow second level cache providers the option to use custom key implementations.
The current default key implementation is very generic and is quite a large object to allocate in large quantities at runtime.

In some extreme cases, for example when the hit ratio is very low, this was making the efficiency penalty vs its benefits tradeoff questionable.

Depending on configuration settings there might be opportunities to use simpler key implementations, for example when multi-tenancy is not being used to avoid the tenant identifier, or when a cache instance is entirely dedicated to a single type to use the primary id only, skipping the role or entity name.

Even with multiple types sharing the same cache, their identifiers could be of the same org.hibernate.type.Type; in this case the cache container could use a single Type reference to implement a custom equality function without having to look up the Type on each equality check: that's a small optimisation but the equality function is invoked extremely frequently in many practical configurations.

Another reason is to make it more convenient to implement custom serialization protocols when the implementation supports clustering.

It might open the doors to more complex improvements, like HHH-9780.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgotten implementation?

@sebersole
Copy link
Member

I left a few line comments. Overall I still question defining the factory methods on the access strategies rather than on the RegionFactory. I get why you want it here. I just think its a trade off that needs to be discussed.

@Sanne
Copy link
Member Author

Sanne commented Jun 3, 2015

I've pushed a fix for the method names, missing factory instance and typos but didn't add the overloaded methods yet.

@rvansa
Copy link
Contributor

rvansa commented Jun 4, 2015

@Sanne Could you clarify then why is the factory method on access strategy? I wouldn't put that directly into RegionFactory since the type of key should be per-region, but having that on Region would make better sense to me. But maybe (and very likely) I am missing something.

I was also trying to figure out in what situations is it necessary to have collectionRole in the key; it seems to me that there is a separate region for each role.

@alexsnaps
Copy link
Contributor

@Sanne this may be somewhat coming out of left field, but why all these CacheKey subtypes? More so, why the getters on each of these? If the implementation can do whatever, why would it need to "store" all that info in there? Pushing it to the extreme, couldn't the CacheKey just be Object ?

I see that maybe some type safety, to help the developers on ORM itself, is gained by all the types. Not that this is neglectable, on the contrary. But I still fail to see the use of the getters. Doing a quick check, I think these aren't really used anyways (other than a couple of occurrences). Anyways, maybe pushing this too much to the simplest form here...

@Sanne
Copy link
Member Author

Sanne commented Jun 4, 2015

@alexsnaps Hi! thanks for looking.

Right, I was actually going to propose removal of all methods from the interfaces; they are there as the existing CacheKey implementation did have them and some tests used them; since I'm not familiar with this code area, when switching to interfaces I've been as conservative as possible.

Since @sebersole suggested to change some of those methods (from 'getEntityOrRoleName' to the more specific 'getEntityName' for entities, and similarly for collections), I'm guessing that being conservative on these methods isn't on his priorities.
@sebersole could we agree then on completely removing all these methods?

My first priority was to allow using alternative implementations, and make sure the ORM 5 API wouldn't need further changes to do many possible smart things. What exactly is not defined, but my point is just that: to make sure the contract doesn't stand in the way of future experimentations and improvements.

@Sanne
Copy link
Member Author

Sanne commented Jun 4, 2015

@rvansa

Could you clarify then why is the factory method on access strategy? I wouldn't put that directly into RegionFactory since the type of key should be per-region, but having that on Region would make better sense to me. But maybe (and very likely) I am missing something.

I have only a weak reason: noticed that all points in the ORM code which are using the cache, have primarily a reference to the AccessStrategy, while the Region needs to be retrieved.
It's possible that they fit better on Region, but it just seems that all relevant code which us using this is dealing with the AccessStrategy.
I'll give the alternative a shot if I find some free time.. definitely a good question!

I was also trying to figure out in what situations is it necessary to have collectionRole in the key; it seems to me that there is a separate region for each role.

As I'm understanding it, you could store multiple roles in the same cache.
And indeed, making the collectionRole unneccessary is one of the things I'm aiming at. Currently since you have to use the CacheKey implementation, it's always considered: so always a memory cost and always incorporated in both equals and hashcode computations.

@Sanne
Copy link
Member Author

Sanne commented Jun 4, 2015

@alexsnaps

this may be somewhat coming out of left field, but why all these CacheKey subtypes?

The NaturalIdCacheKey was different already, then we had the CacheKey which was used for both entity and collections. I transformed both to interfaces, and split CacheKey in two contracts.

Pushing it to the extreme, couldn't the CacheKey just be Object ?

It could! The original idea was to add a method on the CacheKey interface which would conceptually do "toEntityKey" to allow a transformation into the kind of keys which the persistence context uses: to reuse the key instance rather than having to allocate a key. The PersistenceContext needs a different API for its keys though, so rather than reusing the same instance as-is we where thinking of making this "toEntityKey" so that some key implementations in which this reuse was possible could do it.

For that trick to fly though I need to explore some more things, and TBH it looks like it's not going to fly.. I don't normally work on ORM though: my time on this experiment is limited to some evening scraps, so talking with Steve we thought the change to interface (from implementation) was a good first step.

I'm actually torn on the idea of using Object as a key directly for another reason: while I like the typesafety of the current proposal - and the possibility to add smarter methods on it in future - it would also be interesting for some configurations in which all of the tenantId and also the type and the role names are redundant, to use straight away the entity id w/o any wrapping.
That would only be feasible by accepting Object of course, but would that be worth it? Doing that to avoiding a single wrapper with a single field seems quite aggressive.

I was also considering a refactoring of the PersistenceContext container, so that entries would be organized in per-type containers rather than a single map; if we were to do that, the EntityKey instances would become totally redundant, and so also the option to reuse keys from the cache.

@rvansa
Copy link
Contributor

rvansa commented Jun 5, 2015

I have tried to build something on top on that in https://github.com/rvansa/hibernate-orm/tree/HHH-9843 - though I am not using any special key, rather just not filling the entityOrRoleName. Creating a matrix/hierarchy with some fields ommitted would lead to class explosion, and I am not sure that the 4-8 bytes spared is worth it.
Regrettably, Type is still required, I'll think about ways to work around that.

@Sanne
Copy link
Member Author

Sanne commented Jun 5, 2015

@rvansa the 4-8 saved bytes mean 33% shaved off from one of the largest allocation points.

On the type: yes I guess it's still needed for the current cache implementations.
When I mentioned using an externally defined Equality function plugged into the Cache container, I'm not referring to any known / existing feature of Infinispan or Ehcache.
Just making sure that such Cache implementors could take advantage of such a thing in the future.

The point is that APIs in Hibernate ORM are maintained with a very strong sense of stability, this is one of the opportunities to change them. And time is ticking, so we need to focus on getting the API requirements out of the way to allow future improvements in all implementations.

@sebersole
Copy link
Member

Sanne, in our IRC discussion I said that the specific methods on CacheKey are not used and could go away. In fact in our "pair design" of this I thought we had agreed that these contracts would have just one method... for conversion. I was simply pointing out that if you are going to (1) split out specific CacheKey types and (2) expose their state that the state ought to be specific. E.g. EntityCacheKey#getEntityOrRole makes no sense imo. So if you are going to expose that state, then it should be EntityCacheKey#getEntityName. That said, however, I still think the methods do not really add much.

I like the added type-safety of splitting the interfaces. Implementors should still have a single impl if they wish, implementing all interfaces.

@rvansa
Copy link
Contributor

rvansa commented Jun 5, 2015

While type-safety is definitely a good think, the ability to use plain Object and the external equality function sounds more and more appealing, memory-consumption-wise. And if we want only converter methods, it sounds like a way to go.

@Sanne
Copy link
Member Author

Sanne commented Jun 5, 2015

While type-safety is definitely a good think, the ability to use plain Object and the external equality function sounds more and more appealing, memory-consumption-wise. And if we want only converter methods, it sounds like a way to go.

+1
Alex proposed this initially so I think he would prefer that too.

@sebersole may I change the keys back to Object? After discussing with the cache implementors, it imposes a restriction which we'd prefer to avoid.

The added typesafety was nice to work with but we can think of an alternative compile-time only check in the future, something like findbugs's 'Nonnull' annotations: that would give the same safety against completely broken refactoring steps but without taking away flexibility from caches.

@sebersole
Copy link
Member

That's fine. In retrospect having specific contracts essentially breaks the whole point here of allowing providers to reuse their underlying key impls.

As for additional checks later during the build, meh. If you have time and an itch, go for it.

@Sanne
Copy link
Member Author

Sanne commented Jun 5, 2015

I don't have time for adding additional checks at compile time now, it was just to say what we could this about doing that in the future, in case someone would really want the typesafety.

Happy to just change to Object!

@galderz
Copy link
Member

galderz commented Jun 5, 2015

@Sanne So, in the end the region / region factory methods will continue to take Object as key? I understand the changes suggested to ORM internals to reduce memory, but if we can carry on with keys as Object, I'd rather that too.

Not sure I understood the entire discussion about externally provided comparison functions, but this is something Infinispan's had for a while, and indeed reduces the requirements on the keys since some keys do not have good equals impls, e.g. arrays.

@rvansa
Copy link
Contributor

rvansa commented Jun 8, 2015

Replaced by #976

@Sanne
Copy link
Member Author

Sanne commented Jun 23, 2015

closing this, as it's replaced by #976

@Sanne Sanne closed this Jun 23, 2015
@Sanne Sanne deleted the HHH-9840 branch July 1, 2015 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants