Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMap.put(String,Object,long,TimeUnit) broken starting with 3.9.2 #13272

Closed
carstenartur opened this issue Jun 7, 2018 · 20 comments · Fixed by #14867
Closed

IMap.put(String,Object,long,TimeUnit) broken starting with 3.9.2 #13272

carstenartur opened this issue Jun 7, 2018 · 20 comments · Fixed by #14867

Comments

@carstenartur
Copy link
Contributor

@carstenartur carstenartur commented Jun 7, 2018

After ttl exceeded the map entry is still there with all hazelcast versions since 3.9.2.
3.9.1 and earlier are ok.

@ahmetmircik

This comment has been minimized.

Copy link
Member

@ahmetmircik ahmetmircik commented Jun 8, 2018

@carstenartur can you please share a reproducer? i tried but didn't manage to reproduce the failure.

carstenartur added a commit to carstenartur/hazelcast that referenced this issue Jun 8, 2018
@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 8, 2018

added a test as #13278

@jerrinot jerrinot self-assigned this Jun 8, 2018
@jerrinot jerrinot added this to the 3.11 milestone Jun 8, 2018
@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 8, 2018

@carstenartur: thanks for the reproducer, I'll have a look shortly!

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 8, 2018

@carstenartur: ok, that's a good reproducer, thanks!

so apparently Hazelcast not expire locked entries. When I remove the locking from the test case then it's all OK.

the question is whether this is desirable. @ahmetmircik @vbekiaris ?

@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 8, 2018

Thanks, and keep in mind that it expires using 3.9.1 and earlier. The change in the semantic is not documented.

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 8, 2018

I just tried your test-case against 3.9.1 and 3.9 and it fails with these versions too. I see no behaviour change you are reporting. In the fact I see the logic for not expiring locked keys has been in the codebase for quite a long time.

This is from the 3.5.x serie

Can you please validate your setup? Thanks!

carstenartur added a commit to carstenartur/hazelcast that referenced this issue Jun 10, 2018
@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 10, 2018

Hi Jaromir,
I created another copy of the test using a three times longer waittime. (3600 instead of 1200 milliseconds) I commited it for 2.7.5 (did not find a 3.9.1 branch) as well. Sorry if that is not the best way to show this difference. I still have to learn git, not my daily business.
If you compare the results of these two variants on 2.7.5 and versions newer than 3.9.1 you should find that the test variant using a 3600 milliseconds waittime from map entry creation to map entry eviction is working up to hazelcast 3.9.1 and all later versions do not work.
Best regards,
Carsten

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 11, 2018

hello @carstenartur,

I am still confused. I tried your newer version of the test:

@Test
    public void afterLongerExpirationEntryShouldBeAway() throws InterruptedException {
        final String KEY = "key";

        Config config = new Config()
                .setInstanceName("instance")
                .addMapConfig(new MapConfig().setName("test").setMaxSizeConfig(new MaxSizeConfig(200,MaxSizeConfig.MaxSizePolicy.FREE_HEAP_SIZE)).setTimeToLiveSeconds(20));
        final HazelcastInstance node = createHazelcastInstance(config);
        try {
            IMap map = node.getMap("test");
            /**
             * after 1 second entry should be evicted
             */
            map.put(KEY, "value", 1, TimeUnit.SECONDS);
            /**
             * short time after adding it to the map, all ok
             */
            map.lock( KEY );
            Object object = map.get( KEY );
            map.unlock( KEY );
            assertNotNull( object );

            Thread.sleep( 3600 );
            /**
             * More than one second after adding it, now it should be away
             */
            map.lock( KEY );
            object = map.get( KEY );
            map.unlock( KEY );
            assertNull( object );
        }finally {
            node.shutdown();
        }
    }

and it's still failing on 3.9.1 and after exploring this a bit more I am now convinced this is an intended behaviour: We do not want to remove entries while somehow is holding a lock on them.

Also I am not quite sure what SW you are referring to by version 2.7.5. We have never released this version.

@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 11, 2018

You find this version after you fork the hazelcast git repository to you local eclipse:
grafik
There are no newer tags available for whatever reason.
Here is the commit that adds tests to 2.7.5 where then the test with the longer waittime does not fail while it fails on current hazelcast:
carstenartur@5b3c446
I am not sure what this all means. Is locking really meant to hold on eviction? Is the test code wrong because it causes a race condition? Is the documentation really reflecting reality?

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 11, 2018

I am sorry, I still do not understand what you mean by version 2.7.5. I don't even see it on the screenshot. Do you mean 3.7.5?

In Git you have to explicitly use the --tags to fetch tags from a remote repository.

@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 11, 2018

ah, yes, I mean 3.7.5 :)
carstenartur@5b3c446 is applied to 3.7.5.

@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 11, 2018

Just tried as advised to fetch the tags from remote repository and checked with the 3.9.1 checkout. As reported the test using the longer waittime does not fail using this version. So after 3.6 seconds the map entry is evicted (as it should imho). As the lock only is active for the time pulling the map entry I think the lock should not have an impact anyway. I think the eviction is the right thing in this situation and the question is why it is not evicted earlier. And of course why behavior changed and eviction does not any longer happen at all in newer hazelcast versions. Seems to be a bug imho.

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 11, 2018

ok, I can finally see the behaviour change. Let me have a look what has changed.

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 11, 2018

ok, I can see what is going on: There is a periodic task to remove expired keys.

In 3.9.1 and older the task is scheduled as soon as the ExpirationManager instance is created. In 3.9.2+ the task is scheduled only when a member state is set as STARTED.

There is some delay between the time when the ExpirationManager instance is created and member state is STARTED. Hence in 3.9.1 the expiration task kicks in while the test is sleeping and it removed the entry. In 3.9.2 the task kicks in later - and at this time the entry is already locked so it's not removed.

A long story short: There is a race in both 3.9.2 and older versions. However in 3.9.2 we change the timing a bit.

I think we should do this: Change the lock operation to check whether an entry is already expired. If it is then it should eagerly evict it before acquiring a lock. With this it won't rely on (racy) background task. It's the same algorithm the Get operation uses. @ahmetmircik, @vbekiaris: WDYT?

@carstenartur: thanks again for a reproducer!

@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Jun 11, 2018

Thanks for your help. regarding the IMap.put() I suggest to add a remark to the javadoc about accuracy of the timeout value. I was not aware that this is not using a scheduled event for eviction that has the potential to be very precise. A periodic task with fixed time slots to remove expired keys limits the accuracy of the eviction time, right?

@jerrinot

This comment has been minimized.

Copy link
Contributor

@jerrinot jerrinot commented Jun 11, 2018

@carstenartur: it's actually more complicated:
When you call map.get() then Hazelcast always checks whether an item is expired. If an item is actually expired then Hazelcast will remove it from a record store and the map.get() returns null. If you think about it then this would have worked even without any periodic task at all. The periodic task is just an optimization to reclaim memory faster - even when there is no map.get() call.

There is a caveat: We never expire an item of a locked key. Once you lock an entry then it shouldn't change before you unlock it.

You can see the behaviour yourself: When you remove the last lock()/unlock() pair from your test then it's passing - because the entry is eagerly expired during map.get()

So here is my proposal: map.lock() operation should behave similarly to map.get() operation: It should eagerly remove expired item just before acquiring a lock.

@vbekiaris

This comment has been minimized.

Copy link
Contributor

@vbekiaris vbekiaris commented Jun 12, 2018

I think we should do this: Change the lock operation to check whether an entry is already expired.

This makes sense to me, so if the key is already expired before the lock is acquired, map.get(k) will return null while the lock is being held.

@ahmetmircik

This comment has been minimized.

Copy link
Member

@ahmetmircik ahmetmircik commented Jun 27, 2018

@jerrinot it makes sense, only worry is where to do this check. LockOperation is a shared operation between services not a map-specific one, may be we can try to do this check inside DefaultRecordStore

@jerrinot jerrinot removed this from the 3.11 milestone Aug 6, 2018
@carstenartur

This comment has been minimized.

Copy link
Contributor Author

@carstenartur carstenartur commented Aug 27, 2018

Any chances that this will be fixed with other changes in this area currently being worked on?

@mmedenjak mmedenjak added this to the 3.12 milestone Sep 7, 2018
@johanhammar

This comment has been minimized.

Copy link

@johanhammar johanhammar commented Nov 20, 2018

Hi!

I think we're affected by this bug. We are using Spring Boot and when upgrading to 2.0.x and then later 2.1.0 we started to see that atleast one of the caches stopped evicting entries.

Problem is that we can't go back to 3.9.1 or earlier due to Spring Boot, we must back to Spring Boot 1.5.x

Funny thing is that I implemented a MapListener and while "observing" the cache, it seems like cache eviction started working. Timing issue?

So, is there a workaround for this problem?

@mmedenjak mmedenjak removed this from the 3.12 milestone Jan 25, 2019
@vbekiaris vbekiaris self-assigned this Apr 11, 2019
vbekiaris added a commit to vbekiaris/hazelcast that referenced this issue Apr 11, 2019
vbekiaris added a commit to vbekiaris/hazelcast that referenced this issue Apr 12, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272
carstenartur added a commit to carstenartur/hazelcast that referenced this issue Apr 12, 2019
Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de>
carstenartur added a commit to carstenartur/hazelcast that referenced this issue Apr 12, 2019
Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de>
vbekiaris added a commit to vbekiaris/hazelcast that referenced this issue Apr 15, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272
vbekiaris added a commit to vbekiaris/hazelcast that referenced this issue Apr 15, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272
@mmedenjak mmedenjak added this to the 4.0 milestone Apr 17, 2019
vbekiaris added a commit that referenced this issue Apr 19, 2019
Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de>
vbekiaris added a commit to vbekiaris/hazelcast that referenced this issue Apr 19, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272
vbekiaris added a commit that referenced this issue Apr 23, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes #13272
blazember added a commit to blazember/hazelcast that referenced this issue Jul 15, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272

(cherry picked from commit 0a2d959)
blazember added a commit to blazember/hazelcast that referenced this issue Jul 15, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes hazelcast#13272

(cherry picked from commit 0a2d959)
blazember added a commit that referenced this issue Jul 22, 2019
Introduces SPI for services that need to
execute some code before locking
a key.

Fixes #13272

(cherry picked from commit 0a2d959)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.