Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some updates to entries get lost from write behind queue #15060

Closed
gkrs opened this issue May 20, 2019 · 8 comments · Fixed by #15398

Comments

@gkrs
Copy link

commented May 20, 2019

There's a batch of entries, let say in number of 2000.
Big picture is to put initial versions of them the cache (with status field value 'N') and then put updated versions with status set to 'U'.

Coalesced write behind is being used here. The entries have custom equals and hashCode.

The story is as follows:

  • put all entries in the cache

  • they are registered in write behind queue with store times 1558097498625 - 1558097498766

  • they are stored by write behind, because they all fit in StoreWorker's lastHighestStoreTime: 1558097499038

  • after selecting entries to store, finished store operations are removed from queues

  • putting updated versions can be split into two parts:

    • updated entries with store times 1558097498634 - 1558097498650, which are eventually not written to DB
    • updated entries with store times 1558097500278 - 1558097501941, which are stored to DB

What happens in between is additional removeFinishedStoreOperationsFromQueues invocation which apparently removes the first part of updated entries from being stored. I think that happens because the entries got "old" store times, from the initial batch, which were not timely removed from queue and are now removed during updated batch, because of the custom equals, since using the generic one would indicate different objects.

Could you please take a look? This problem appears in our production code and some entries stored in DB are not current ones. We were unable to write test case which reproduces this issue.

We switched the write-coalescing off, as a workaround. It solved the issue, but affected the overall performance.

@ahmetmircik

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Write behind queue compares values by relying on equals method. Does updated value equal to previous value after status update?

@gkrs

This comment has been minimized.

Copy link
Author

commented May 27, 2019

In short: yes. It is equal.

@gkrs

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

Similar topis was created on google groups https://groups.google.com/d/topic/hazelcast/arCXp_6j6lw/discussion
The question is if this is an expected behavior and where is it described.

@vojtechtoman vojtechtoman self-assigned this Jun 5, 2019
@mmedenjak

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

Hi @gkrs ! Thank you for the update. We'll take a look if we can clarify the behaviour or if there is some issue that needs to be fixed.

@vojtechtoman

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

It looks like a concurrency issue when there are updates to CoalescedWriteBehindQueue while StoreWorker is running. It only occurs if the original and updated versions of entries are equal().
A short summary of what happened in the reported scenario:

USER                                     | StoreWorker
----------------------------------------------------------------------------------
add E (store time T) to writeBehindQueue |
                                         | select E from writeBehindQueue
                                         | process E
update E' (coalesced writeBehindQueue    |
           replaces E, keeps time T)     |
                                         | remove E from writeBehindQueue
                                         |          (removes E' which is equal()) 

If there are updates to entries in CoalescedWriteBehindQueue after StoreWorker selects entries to process and before it removes the (processed) entries from the queue, these updates will get lost (because they are equal()).

@vojtechtoman

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

Hi @gkrs, the scenario that I outlined above works only if E and its updated version E' are stored in the write behind queue as plain objects (that is, not serialized), so that E.equals() is used and not byte array comparison (which is the case for serialized representations). The latter would never see E and E' as equal (unless the version field is transient or you use custom serialization).

Are you, by any chance, using the MapInterceptor API? One of the side-effects of applying MapInterceptor.interceptPut() is that the objects will be added to the write behind queue in deserialized form (which would make the above scenario work).

@vojtechtoman

This comment has been minimized.

Copy link
Contributor

commented Jun 17, 2019

Hi @gkrs, can you please confirm if you are using the MapInterceptor API? We have a fix for the scenario that I described, but we would like to be sure it works for you, too.

@gkrs

This comment has been minimized.

Copy link
Author

commented Jun 18, 2019

Hi @vojtechtoman,
Yes, we are using MapInterceptor and interceptPut() specifically. Is there any hope to have it fixed?

vojtechtoman pushed a commit to vojtechtoman/hazelcast that referenced this issue Jun 18, 2019
…lcast#15060)

Changed the logic in StoreWorker.runInternal() so that it no longer
removes entries from the write behind queue after processing the
entries. Instead, it removes them from the queue already when
selecting the entries. For a coalescing queue, this ensures that
if there are concurrent updates to the queue in the meantime
(with newer entries replacing the old ones), the new entries do
not get removed by accident after processing. This could happen
when the entries were stored as plain objects (i.e. not serialized)
and the equals() implementation would consider the old and new
entries as equal (for instance, only properties that do not
participate in equals() changed).

Fixes hazelcast#15060
vojtechtoman pushed a commit to vojtechtoman/hazelcast that referenced this issue Jun 20, 2019
…rker is running (hazelcast#15060)

CoalescedWriteBehindQueue.removeFirstOccurence(e) now
does not remove the existing entry if its sequence
number is higher than e's. This means that the current
entry was inserted into the map after e and therefore
StoreWorker.removeFinishedStoreOperationsFromQueues()
should not remove it.

Fixes hazelcast#15060
vojtechtoman added a commit that referenced this issue Jul 30, 2019
…rker is running (#15060) (#15191)

* Coalesced WBQ: do not accidentally remove entries added while StoreWorker is running (#15060)

CoalescedWriteBehindQueue.removeFirstOccurence(e) now
does not remove the existing entry if its sequence
number is higher than e's. This means that the current
entry was inserted into the map after e and therefore
StoreWorker.removeFinishedStoreOperationsFromQueues()
should not remove it.

Fixes #15060
vojtechtoman pushed a commit to vojtechtoman/hazelcast that referenced this issue Jul 30, 2019
…rker is running (hazelcast#15060)

CoalescedWriteBehindQueue.removeFirstOccurence(e) now
does not remove the existing entry if its sequence
number is higher than e's. This means that the current
entry was inserted into the map after e and therefore
StoreWorker.removeFinishedStoreOperationsFromQueues()
should not remove it.

Forward port from maintenance-3.x.

Fixes hazelcast#15060

(cherry picked from commit f6378d4)
vojtechtoman added a commit that referenced this issue Aug 1, 2019
…rker is running (#15060) (#15398)

CoalescedWriteBehindQueue.removeFirstOccurence(e) now
does not remove the existing entry if its sequence
number is higher than e's. This means that the current
entry was inserted into the map after e and therefore
StoreWorker.removeFinishedStoreOperationsFromQueues()
should not remove it.

Forward port from maintenance-3.x.

Fixes #15060

(cherry picked from commit f6378d4)
@mmedenjak mmedenjak modified the milestones: 3.12.2, 3.12.3 Aug 1, 2019
Holmistr added a commit that referenced this issue Sep 4, 2019
…rker is running (#15060) (#15191)

* Coalesced WBQ: do not accidentally remove entries added while StoreWorker is running (#15060)

CoalescedWriteBehindQueue.removeFirstOccurence(e) now
does not remove the existing entry if its sequence
number is higher than e's. This means that the current
entry was inserted into the map after e and therefore
StoreWorker.removeFinishedStoreOperationsFromQueues()
should not remove it.

Fixes #15060
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.