Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UpdateTimestampsCache non cached entities concurrency issue #2735

Closed
gokhanabatay opened this issue Apr 21, 2021 · 29 comments
Closed

UpdateTimestampsCache non cached entities concurrency issue #2735

gokhanabatay opened this issue Apr 21, 2021 · 29 comments

Comments

@gokhanabatay
Copy link
Contributor

gokhanabatay commented Apr 21, 2021

Hi @hazzik @maca88,
As I mentioned earlier current UpdateTimestampsCache has still concurrency issue. #2129

In real life scenario some of entities uses second level cache but most of entities does not uses second level cache but still try to update UpdateTimestampsCache. At least it should be configurable in entity level or session factory level, default may be true for backward compability issue.

Non cached entities could be related(foreign key) with cached entity may be better solution StandardQueryCache property spaces and cached entity property spaces only triggers UpdateTimestampsCache?

Current UpdateTimestampsCache uses AsyncReaderWriterLock if one writer grants lock other writers of different entities or readers wait for it. If this is true you could realize the concurrency issue that we are facing under high concurrency (400-500tps).

If still not acceptable please consider a way to implement Custom UpdateTimestampsCache via UpdateTimestampsFactory like cache.provider_class attribute in configuration. Could be named cache.update_timestamps_class.

Currently we are trying to replace current UpdateTimestampsCache via reflection but its not cleanest solution.

var cacheablePersisters = ISessionFactory.GetAllClassMetadata()
                .Where(x => x.Value.GetType() == typeof(SingleTableEntityPersister))
                .Select(x => x.Value)
                .Cast<SingleTableEntityPersister>()
                .Where(x => x.HasCache)
                .ToList();

var sessionFactoryImpl = (SessionFactoryImpl)ISessionFactory;
var standardQueryCache = (StandardQueryCache)sessionFactoryImpl.QueryCache;


var updateTimestamps = (CacheBase)typeof(UpdateTimestampsCache).GetField("_updateTimestamps", BindingFlags.Instance | BindingFlags.NonPublic).GetValue(sessionFactoryImpl.UpdateTimestampsCache);

var updateTimestampsCache = new DistributedUpdateTimestampsCache(updateTimestamps, cacheablePersisters.SelectMany(x => x.PropertySpaces).ToHashSet());

typeof(SessionFactoryImpl).GetField("updateTimestampsCache", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(sessionFactoryImpl, updateTimestampsCache);

if (standardQueryCache != null)
{
    typeof(StandardQueryCache).GetField("_updateTimestampsCache", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(standardQueryCache, updateTimestampsCache);
}
public class DistributedUpdateTimestampsCache : UpdateTimestampsCache
{
    protected readonly ISet<string> CacheableSpaces;

    public DistributedUpdateTimestampsCache(CacheBase updateTimestamps, ISet<string> cacheableSpaces) : base(updateTimestamps)
    {
        CacheableSpaces = cacheableSpaces;
    }

    public override void Invalidate(IReadOnlyCollection<string> spaces)
    {
        spaces = spaces.Where(x => CacheableSpaces.Contains(x)).ToList();

        if (spaces.Any())
        {
            base.Invalidate(spaces);
        }
    }

    public override void PreInvalidate(IReadOnlyCollection<string> spaces)
    {
        spaces = spaces.Where(x => CacheableSpaces.Contains(x)).ToList();

        if (spaces.Any())
        {
            base.PreInvalidate(spaces);
        }
    }

    public override Task InvalidateAsync(IReadOnlyCollection<string> spaces, CancellationToken cancellationToken)
    {
        if (cancellationToken.IsCancellationRequested)
        {
            return Task.FromCanceled<object>(cancellationToken);
        }

        spaces = spaces.Where(x => CacheableSpaces.Contains(x)).ToList();

        if (!spaces.Any())
        {
            return Task.CompletedTask;
        }

        return base.InvalidateAsync(spaces, cancellationToken);
    }

    public override Task PreInvalidateAsync(IReadOnlyCollection<string> spaces, CancellationToken cancellationToken)
    {
        if (cancellationToken.IsCancellationRequested)
        {
            return Task.FromCanceled<object>(cancellationToken);
        }

        spaces = spaces.Where(x => CacheableSpaces.Contains(x)).ToList();

        if (!spaces.Any())
        {
            return Task.CompletedTask;
        }

        return base.PreInvalidateAsync(spaces, cancellationToken);
    }
}
@maca88
Copy link
Contributor

maca88 commented Apr 24, 2021

The simplest solution would be to remove the locking entierly from UpdateTimestampsCache, which is what Hibernate has done a long time ago to improve performance. In theory the read/write locking prevents that IsUpToDate would return stale data, but that holds true only when one instance of the application is running. In case multiple application instances are running then IsUpToDate may return stale data due to another application instance writing data at the same time. So we could remove the locking entirely as Hiberanate did or add a configuration setting for disabling it.

@gokhanabatay
Copy link
Contributor Author

I agree @maca88 best solution remove locking entierly from UpdateTimestampsCache also configuration setting needed for non cached entities. If we are not interested in caching for an entity, in classs mapping Cache.Never() mapping confuguration could be implemented. Also configuration level ignore would be great.

Why an entity dml blocks another entity read, it's not occurred when we are using Cache.NonStrictReadWrite();

And you are right our application is running on openshift, 25 application (minimum 2 pods every app) rely on same local distributed cache. UpdateTimestampsCache is important for clients to detect stale data but not important as locking whole application.

@gokhanabatay
Copy link
Contributor Author

@maca88 I am willing to implement this feature if you can give a little guidance.

I agree @maca88 best solution remove locking entierly from UpdateTimestampsCache also configuration setting needed for non cached entities. If we are not interested in caching for an entity, in classs mapping Cache.Never() mapping confuguration could be implemented. Also configuration level ignore would be great.

Why an entity dml blocks another entity read, it's not occurred when we are using Cache.NonStrictReadWrite();

And you are right our application is running on openshift, 25 application (minimum 2 pods every app) rely on same local distributed cache. UpdateTimestampsCache is important for clients to detect stale data but not important as locking whole application.

What do you think about the offer and how should we proceed, which branch should I use as a base?

@hazzik
Copy link
Member

hazzik commented Apr 28, 2021

which branch should I use as a base?

master.

@maca88
Copy link
Contributor

maca88 commented Apr 28, 2021

Non cached entities could be related(foreign key) with cached entity may be better solution StandardQueryCache property spaces and cached entity property spaces only triggers UpdateTimestampsCache?

Updating timestamps for only cached entities is not a valid approach. Consider this query:

session.Query<User>()
.WithOptions(o => o.SetCacheable(true))
.Where(o => o.Organization.Name == "Name1")
.Select(o => new
{
  o.Id,
  o.Name,
  OrganizationName = o.Organization.Name
})
.ToList();

where User is cached but Organization is not. In case the Organization name is changed, your DistributedUpdateTimestampsCache will not update the Organization timestamp and the query will still return the same result even after the Organization name was changed. I made a working example to show that. By commenting these lines the test then passes.

In short, with your DistributedUpdateTimestampsCache, every change to a non cached entity may produce an outdated result for a cached query that references them. In case the cached entities are rarely changed the outdated cached result may be used for days, which can cause several bugs. I think that by removing the read/write locking (#2742), the performance should be much better and you won't need to use a custom UpdateTimestampsCache anymore.

@gokhanabatay
Copy link
Contributor Author

gokhanabatay commented Apr 28, 2021

@maca88 When we use redis pub/sub distributed cache in your implementation non cached entities causes too much synchronization events. Think about a configuration in entity mapping Cache.Never() that tells this entity never cached with query cache or entity cache ever.

If this approach is ok I could implement this in NHibernate and FluentNHibernate libraries.
For performance needs this is absolutely necessary.

There is also locks in NHibernate.Caches.StackExchangeRedis.DistributedLocalCacheRegionStrategy

@maca88
Copy link
Contributor

maca88 commented Apr 29, 2021

non cached entities causes too much syncronization events

The DistributedLocalCacheRegionStrategy was never meant to be used for caches that have a lot for writing, you would probably have a better performance by using either TwoLayerCacheRegionStrategy or DefaultRegionStrategy for them.

Think about a configuration in entity mapping Cache.Never() that tells this entity never cached with query cache or entity cache ever.

How would Cache.Never() work in case the non cached entity is used in a cached query like in the example above. Should Cache.Never() override the o.SetCacheable(true) set on the query?

For performance needs this is absolutly necessary.

In order to further improve the UpdateTimestampsCache performance, you could create a special region strategy for it, which would combine approaches used in DistributedLocalCacheRegionStrategy and TwoLayerCacheRegionStrategy. The strategy could have the following functionalities:

  • A local cache that would be used for retrieving/storing the values, similar to DistributedLocalCacheRegionStrategy
  • Every X seconds the local cache would synchronize its data with the Redis cache. The Redis cache would be used to synchronize the data among local caches similar to TwoLayerCacheRegionStrategy
  • As UpdateTimestampsCache stores only timestamps and does not use the Remove method, the synchronization logic can be simplified by only checking the timestamps and storing always the latest one

By synchronizing the local cache data every X seconds, you gain performance at a cost of potentially having stale data that is up to X seconds old.

@gokhanabatay
Copy link
Contributor Author

How would Cache.Never() work in case the non cached entity is used in a cached query like in the example above. Should Cache.Never() override the o.SetCacheable(true) set on the query?

Current Entity Mapping Strategies

namespace FluentNHibernate.Mapping
{
    public class CachePart : ICacheMappingProvider
    {
        public CachePart(Type entityType);
        public CachePart CustomInclude(string custom);
        public CachePart CustomUsage(string custom);
        public CachePart IncludeAll();
        public CachePart IncludeNonLazy();
        public CachePart NonStrictReadWrite();
        public CachePart ReadOnly();
        public CachePart ReadWrite();
        public CachePart Region(string name);
        public CachePart Transactional();
    }
}

I thought that we could add new strategy Cache.Never() as a new feature.
If an entity mapped as Cache.Never() we could throw an exception that tells entity is not cacheable, so you cannot use in query cache with o.SetCacheable(true). StandardQueryCache holds QuerySpaces so we could easily check it before execution.
Then don't update for Cache.Never() entities UpdateTimestampsCache ever.

As its a new feature so it would not be a breaking change.

Don't get me wrong removing lock at UpdateTimestampsCache(#2742) is a good start point, as you mentioned I could implement in custom region strategy. Just trying to improve NHibernate if it's make sense.

@maca88
Copy link
Contributor

maca88 commented Apr 30, 2021

If an entity mapped as Cache.Never() we could throw an exception that tells entity is not cacheable, so you cannot use in query cache with o.SetCacheable(true)

I am fine with that, feel free to create a PR for it.

@gliljas
Copy link
Member

gliljas commented May 4, 2021

But UpdateTimestampsCache and entity cachability are still orthogonal things and shouldn't be connected. Being able to easily circumvent the behavior with a custom UpdateTimestampsCache implementation seems like a feasible solution.

@gliljas
Copy link
Member

gliljas commented May 4, 2021

If an entity mapped as Cache.Never() we could throw an exception that tells entity is not cacheable, so you cannot use in query cache with o.SetCacheable(true)

I am fine with that, feel free to create a PR for it.

I disagree. SetCacheable shouldn't be the cause an exception, IMO. It's a hint, not a request.

Having Cache.Never as a setting to prevent certain entities (or columns/properties) from ever entering a cache could be useful.

@bahusoid
Copy link
Member

bahusoid commented May 4, 2021

SetCacheable shouldn't be the cause an exception, IMO. It's a hint, not a request.

@gliljas Exception is thrown to avoid issue described here.

@gliljas
Copy link
Member

gliljas commented May 4, 2021

SetCacheable shouldn't be the cause an exception, IMO. It's a hint, not a request.

@gliljas Exception is thrown to avoid issue described here.

Yes, but I'm not sure throwing an exception is a good thing. It should just deem the query uncacheable, perhaps log it, and move on. IMO, SetCacheable just means "cache it if you can". Nothing else. It doesn't throw an exception if query caching is entirely disabled.

@bahusoid
Copy link
Member

bahusoid commented May 4, 2021

It should just deem the query uncacheable

IMO such behavior can lead to unexpected hard to notice performance issues. For me it's better to throw and make user mark affected queries not cacheable explicitly or remove Cache.Never from entity. It's not the same to me as cache enabled/disabled example.

@maca88
Copy link
Contributor

maca88 commented May 4, 2021

But UpdateTimestampsCache and entity cachability are still orthogonal things and shouldn't be connected.

I do agree, we should probably move the configuration to a separate option that could be configured on the entities and collections tables. For example:

<class name="Order" query-cache="false">
  <cache usage="read-write" />
  ...

this way an entity can still be cached with the defined cache strategy, but cannot be used in a cached query.

Being able to easily circumvent the behavior with a custom UpdateTimestampsCache implementation seems like a feasible solution.

By just replacing the UpdateTimestampsCache, you cannot solve the issue that I described a few posts earlier. In order to avoid returning stale results for cached queries that contain spaces that are not updated with the custom UpdateTimestampsCache, we have to either throw an exception or ignore the SetCacheable(true) option, which is something that cannot be done inside a custom UpdateTimestampsCache.

It should just deem the query uncacheable, perhaps log it, and move on. IMO, SetCacheable just means "cache it if you can". Nothing else. It doesn't throw an exception if query caching is entirely disabled.

For me both approaches are valid. In this case I am leaning more towards using the fail-fast principle rather than fail-silently. I tend to agree with @bahusoid that it is not the same as globally enable/disable the cache. It is nice that we can globally enable/disable the cache based on a specific environment without modifying the code, but it is not nice to have a query that will never behave the way we defined it.

@gliljas
Copy link
Member

gliljas commented May 4, 2021

For me both approaches are valid. In this case I am leaning more towards using the fail-fast principle rather than fail-silently. I tend to agree with @bahusoid that it is not the same as globally enable/disable the cache. It is nice that we can globally enable/disable the cache based on a specific environment without modifying the code, but it is not nice to have a query that will never behave the way we defined it.

I understand that sentiment, but coming from a scenario where query composition is very common, the "query" in question is not defined up front, and may well include entities or properties that could have the proposed "Never" setting. I see a possible use of the Never setting as a security thing, i.e "this value should never be allowed to be stored in the cache layer".

Separating query cache settings from entity settings seems like a very good idea, and possibly it could be something that is bound to properties, rather that classes. Specifying it on class level would just be a convenience thing, to set a default.

While there's certainly such a thing as too many configuration settings, perhaps the "to throw or not to throw an exception" thing could warrant a setting. But I still say that SetCacheable is only a hint.

@maca88
Copy link
Contributor

maca88 commented May 4, 2021

Separating query cache settings from entity settings seems like a very good idea, and possibly it could be something that is bound to properties, rather that classes.

I do agree that it would be nice to have a setting bound to properties, but I think this is a separate issue and outside the scope of this issue. For example the following query:

session.Query<User>()
.WithOptions(o => o.SetCacheable(true))
.Where(o => o.FirstName == "Name1")
.ToList();

will put only the User identifiers in the query cache, so having a property setting does not make much sense for it. But it would make sense for the following query:

session.Query<User>()
.WithOptions(o => o.SetCacheable(true))
.Select(o => new { o.FirstName, o.LastName })
.ToList();

For this feature I think we should use a separate setting:

  <class name="User" query-cache="true">
    <cache usage="read-write" />
    <id name="Id">
      <generator class="native"/>
    </id>
    <property name="FirstName" />
    <property name="LastName" cache="false" />
  </class>

in the above example cache="false" setting would prevent the LastName property to be cached by the cache strategy and by the query cache.

@gokhanabatay
Copy link
Contributor Author

Separating query cache settings from entity settings seems like a very good idea, and possibly it could be something that is bound to properties, rather that classes.

I do agree that it would be nice to have a setting bound to properties, but I think this is a separate issue and outside the scope of this issue. For example the following query:

session.Query<User>()
.WithOptions(o => o.SetCacheable(true))
.Where(o => o.FirstName == "Name1")
.ToList();

will put only the User identifiers in the query cache, so having a property setting does not make much sense for it. But it would make sense for the following query:

session.Query<User>()
.WithOptions(o => o.SetCacheable(true))
.Select(o => new { o.FirstName, o.LastName })
.ToList();

For this feature I think we should use a separate setting:

  <class name="User" query-cache="true">
    <cache usage="read-write" />
    <id name="Id">
      <generator class="native"/>
    </id>
    <property name="FirstName" />
    <property name="LastName" cache="false" />
  </class>

in the above example cache="false" setting would prevent the LastName property to be cached by the cache strategy and by the query cache.

Main issue is in real life scenario most of entities does not use cache and we should give a configuration that entity never uses cache so don’t track UpdateTimeStamps.

I disagree seperate configuration on entity level, it could lead conflicts and again main issue is tracking UpdateTimestampsCache on never cached entities.

@gliljas
Copy link
Member

gliljas commented May 5, 2021

I disagree seperate configuration on entity level, it could lead conflicts and again main issue is tracking UpdateTimestampsCache on never cached entities.

And better configuration options is a way to solve that. Given the current setup, using UpdateTimestampsCache on never cached entities is the only correct thing to do.

@gokhanabatay
Copy link
Contributor Author

And better configuration options is a way to solve that. Given the current setup, using UpdateTimestampsCache on never cached entities is the only correct thing to do.

There is open pr for this issue and I am willing to improve, could you explain how to resolve concurrency issue on never cached entities?
Why using UpdateTimestampsCache is the only way if it’s not needed ever with never cached entities.

@gliljas
Copy link
Member

gliljas commented May 6, 2021

Why using UpdateTimestampsCache is the only way if it’s not needed ever with never cached entities.

It is. It's used for the query cache.

@gokhanabatay
Copy link
Contributor Author

Why using UpdateTimestampsCache is the only way if it’s not needed ever with never cached entities.

It is. It's used for the query cache.

@hazzik you have a pull request #2130 adressing this issue that I have been reported 2 years ago at 2019.

@maca88 @bahusoid What dou you think, how dou we proceed.

If proposal not accepted, after removing lock in UpdateTimestampsCache I could move non cached entity logic into UpdateTimestamps region cache custom implementation. Not preferred but at least it could be customized on our side. There is one Bottleneck Timestamper.Next() which has lock in it, but it could be ignored. I also think this lock could be entity persister level.

public static long Next()
{
	lock (lockObject)
	{
		// Ticks is accurate down to 100 nanoseconds - hibernate uses milliseconds
		// to help calculate next time so drop the nanoseconds portion.(1ms==1000000ns)
		long newTime = ((DateTime.UtcNow.Ticks / 10000) - baseDateMs) << BinDigits;
		if (time < newTime)
		{
			time = newTime;
			counter = 0;
		}
		else if (counter < OneMs - 1)
		{
			counter++;
		}
		return time + counter;
	}
}

@bahusoid
Copy link
Member

bahusoid commented May 7, 2021

I'm fine with both original plan (Cache.Never() setting to throw when used in cached queries) and with separate query-cache setting suggested by @maca88 here.

But I still say that SetCacheable is only a hint.

@gliljas I disagree. Ability to globally enable/disable cache doesn't make it a hint. When cache is enabled SetCacheable should either cache or throw (at least by default) - and I think it currently works exactly this way.

I see a possible use of the Never setting as a security thing, i.e "this value should never be allowed to be stored in the cache layer"

@gliljas So suggested PR seems already implements this use case - disables entity caching and forbids usages in other cache layers (query cache)

@gliljas
Copy link
Member

gliljas commented May 7, 2021

@bahusoid
Hibernate explicitly calls it a hint. Now, that in itself doesn't mean that it shouldn't be enforced, but together with its naming "cacheable" ("you can/may cache this") and the fact that non-compliance with the hint is most likely a non-fatal issue, I think that throwing an exception would be a non-desirable default. Unless you can opt out from it.

@bahusoid
Copy link
Member

bahusoid commented May 7, 2021

Hibernate explicitly calls it a hint

The Java Persistence API defines the notion of query hint, which unlike what its name might suggest, it has nothing to do with database query hints. The JPA query hint is a Java Persistence provider customization option.

JPA just uses word hint for any query customization option. It has nothing to do with "do it if you can" hint meaning (at least from hibernate standpoint) You can check other hibernate "hints" to see it. Hibernate just provides setHint method to support JPA. But it also provides native setCacheable method which doesn't mention it's a hint in any way.

@maca88
Copy link
Contributor

maca88 commented May 7, 2021

But it also provides native setCacheable method which doesn't mention it's a hint in any way.

By looking at the Hibernate code, it seems like that setCacheable is indeed used as a hint:
https://github.com/hibernate/hibernate-orm/blob/c29b2d27ee66b0da2af2a640b5c2179c6454a983/hibernate-core/src/main/java/org/hibernate/query/internal/AbstractProducedQuery.java#L1050

As I mentioned before, I am also fine that it is implemented as a hint, but at least we should log a warning so that the developer could see that the query won't be cached. We could optionally add a setting for throwing an exception instead.

I disagree separate configuration on entity level, it could lead conflicts and again main issue is tracking UpdateTimestampsCache on never cached entities.

query cache and cache strategy on an entity/collection level are two separate things so I don't see how it would lead to conflicts. From an end user standpoint, I do understand that <cache usage="none" /> is the most logical option, but theoretically if someone would like to disable query cache for entities that are cached, that won't be possible by using the existing setting.

@bahusoid
Copy link
Member

bahusoid commented May 7, 2021

it seems like that setCacheable is indeed used as a hint:

@maca88 Again as I already said all those "hints" come from JPA (Java Persistence API). JPA uses this term for query configuration options specific for JPA implementations (and hibernates implements JPA) So yes it's a hint in a way that if NHIbernate doesn't support provided hint via JPA - it will be ignored (though didn't check it) . But it doesn't mean that supported by NHIbernate hints (query options like readonly, comment, timeout, cacheable ... ) should be considered optional when applied to query. Also check Hibernate docs (see examples "Caching query using JPA" and "Caching query using Hibernate native API") - word hint is used only for JPA examples.

It seems my point is clear here - I vote for throwing vs silent behavior change (I see it as a potential source of hard to identify bugs and perf issues) . No more noise from me on this one - do whatever you decide is better.

@gokhanabatay
Copy link
Contributor Author

gokhanabatay commented May 9, 2021

query cache and cache strategy on an entity/collection level are two separate things so I don't see how it would lead to conflicts. From an end user standpoint, I do understand that is the most logical option, but theoretically if someone would like to disable query cache for entities that are cached, that won't be possible by using the existing setting.

From end user perspective I think this could lead conflict, when we talk about second level cache if its enabled then you can use query cache or entity cache. If we change configuration as you suggest there would be too much configuration.

Query Cache         Entity Cache
true                ReadWrite
true                ReadOnly
true                NonStrictReadWrite
true                Transactional
false               ReadWrite
false               ReadOnly
false               NonStrictReadWrite
false               Transactional
true                None(Not configured)
false               None(Not configured)

When Query Cache true and Entity Cache is not configured somethink, then just unique keys are stored in QueryCache, always gets entity from database when requested.
When Query Cache(default true) is false then nh does not updates(tracks changes) UpdateTimestapsCache

On the other hand I think Cache Never has a simpler approach?

I also vote for throwing exception if its counts :)

@fredericDelaporte
Copy link
Member

Obsoleted by #2744, which allows to exclude entities from all second level cache features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants