Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimap becomes inconsistent when remove before shutdown #5220

Closed
purplefox opened this issue May 1, 2015 · 67 comments

Comments

Projects
None yet
@purplefox
Copy link

commented May 1, 2015

Hello folks,

We're experiencing a rather serious issue in Vert.x which results in inconsistent/corrupt multimap entries in the hazelcast cluster.

I have managed to reproduce this using pure Hazelcast APIs (no Vert.x involved).

Reproducer is here https://github.com/purplefox/hz-bug

The README contains a full explanation and instructions for reproducing.

There is also a thread from the Vert.x google group discussing this here:

https://groups.google.com/forum/?fromgroups#!topic/vertx/Im4UumOG8Ts

This issue is currently affecting Vert.x users in production, so if you could take a look / provide a workaround, that would be most helpful.

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 1, 2015

thank you for the detailed bug report, we'll take a look at it

@gurbuzali gurbuzali added this to the 3.4.3 milestone May 1, 2015

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 1, 2015

Are you sure that all remove calls return true ?
because in my tests I've checked and counted false returns and it was equal to the remaining entry count
(node1: 21, node2: 31, node3: 0, node4: 0, remaining entries: 52)

@purplefox

This comment has been minimized.

Copy link
Author

commented May 1, 2015

Yes, you're right - sometimes the remove fails.

However, why should the remove fail? It's just removing exactly the same key, value that it previously put in the map, and no other node has removed it.

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 1, 2015

if you shutdown more than 1 node concurrently, operations (in this case remove operation) may get timeout due to migration. but in that case we should've seen some timeout exceptions I guess.
That's another issue I will look in more detail, let's confirm that remaining entry count is equal to "not removed entry count". Do you see a case which these do not match ? if so we will have 2 different issues I presume.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 1, 2015

I think the issue here is that it's failing to remove entries which are clearly in the map.

IMHO it's a fundamental characteristic of any clustered/distributed map solution that the map should remain consistent when nodes are shutdown as long as there is at least one node in the cluster.

If a node shuts down and owns certain keys I'd expect that it must ensure those keys are migrated safely to another node before shutdown is complete. I'm guessing that isn't happening properly in this case.

I'm not seeing any timeouts, everything happens pretty quickly.

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 1, 2015

I'm not saying anything against your argument, I'm just trying to pinpoint the issue here.
Do you confirm that the number of "false returns" are equal to remaining entries ?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 1, 2015

Ah ok :)

I thought that your statement "may get timeout due to migration" implied that this was to be expected, and I assume this would leave the cluster in an inconsistent state, which clearly would be a major issue - users can shutdown nodes at any time and that's not something we can control.

I don't know if the number of false returns is equal to the number of remaining entries, but that could well be true. If you run it a few times maybe you could get some degree of confidence that is the case?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 1, 2015

Btw.. thanks for looking at this so quickly, it's much appreciated :)

@bwzhang2011

This comment has been minimized.

Copy link

commented May 4, 2015

@purplefox, thanks for such issue. I wonder such was some actual bug need to be fixed or some that would exist under some circumstance that we need pay attention to be prevented from.

@AlexThurston

This comment has been minimized.

Copy link

commented May 5, 2015

@bwzhang2011 This was an actual bug encountered by myself when trying to use Vertx - which in turn uses Hazelcast -that effectively rendered our clustered system to be non-functional.

Others have noted similar behaviour.

@AlexThurston

This comment has been minimized.

Copy link

commented May 5, 2015

Is there any movement/decision on this issue? Just trying to plan and mitigate accordingly.

@gurbuzali gurbuzali self-assigned this May 6, 2015

@purplefox

This comment has been minimized.

Copy link
Author

commented May 6, 2015

+1 This is very important for our users. If there is any sensible workaround that can be implemented before a full solution is provided, that would be useful too :)

@ajermakovics

This comment has been minimized.

Copy link
Contributor

commented May 6, 2015

One workaround is to use an IMap instead of a MultiMap. I ran a modified test and all the removes succeeded. Currently the IMap implementation is more mature than the MultiMap.

Of course IMap doesn't support multiple values per key but you could use multiple maps. If the names of these IMaps share a common prefix then they can also share configuration.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 6, 2015

Thanks @ajermakovics for that observation.

Perhaps it will provide a clue to the Hazelcast team about where the bug is...

@jensenmo

This comment has been minimized.

Copy link

commented May 8, 2015

+1 from us; we just had the issue in production again yesterday - only known workaround is to bounce the whole vert.x/hazelcast cluster, which causes an outage

@purplefox

This comment has been minimized.

Copy link
Author

commented May 11, 2015

Hi Hazelcast folks

just wondering if there has been any progress on this or if you know if any workaround?

We have a code freeze for Vert.x 3.0 in the not too distant future so would need some kind of solution before then.

I know these things are hard to tell but if you could give some kind of ETA for this it would be much appreciated, as we will need to take mitigating actions if it looks like its not going to be fixed ready for Vert.x 3.0.

Thanks :)

@lukjel

This comment has been minimized.

Copy link

commented May 11, 2015

+1 from my team. We've got the same problems as other (vert.x cluster). From our perspective - this is critical error.

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 12, 2015

Hi
I've sent a PR #5284 , once it's merged can you please try and confirm if it fixes the issue?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2015

Thanks Ali,

I have built hz from your fix branch and I can confirm that my reproducer seems to work as expected now.

I wonder if you could confirm one thing for me? Looking at your fix.. when the multimap values are merged you are setting the new id sequence to be the max of the merged ids plus an offset (=1000000). I am guessing this offset exists to account for any puts that might occur during the merge process (?). Is there a possibility if many puts (more than 1000000) occurred during the merge then we would still have problems?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2015

@AlexThurston Could you also try with your reproducer and Hz built from Ali's fix branch here: https://github.com/gurbuzali/hazelcast/tree/multimap-migration-fix ?

@purplefox

This comment has been minimized.

Copy link
Author

commented May 12, 2015

@gurbuzali Could you also confirm when the next Hazelcast release is due on the 3.4 branch? (I.e. the one containing this fix).

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 12, 2015

Hi @purplefox while merging all mutating operations retried so there may not be a put while merging. Problem here is with the transactions. Transactions gets nextRecordId but may not be committed at the time of migration. So if you have more that 1000000 transactions alive on multimap while migrating then you may have problems. We believe this is a very unlikely situation.
We use the same approach in IQueue too

PS: By transaction I mean using our TransactionalMultiMap interface

@gurbuzali

This comment has been minimized.

Copy link
Member

commented May 12, 2015

For the next maintenance release (3.4.3) I don't have an exact date, I will ask our release coordinator and come back to you.

@AlexThurston

This comment has been minimized.

Copy link

commented May 12, 2015

@purplefox I have good news and bad news. The good news is that the behaviour has changed and for the better. I am testing through Vertx which was the original impetus of this investigation.

I'll speak in mostly Vertx terms, and perhaps @purplefox can investigate if this is still related to only Hazelcast.

If I have 3 vertx process which are running verticles that register 100 handlers each, the map's size is correctly 300 (3 * 100) when everything is running.

I then (in parallel) kill 2 of the three vertx process. The resulting map correctly only contains entires for the first vertx process. However, there are handlers missing. I should be left with a map of size 100, but have a map of size 73 instead. Basically, I've lost handlers 27 handlers by shutting down Vertx processes.

Starting the two Vertx process that were previously killed results in a map that does not correctly reflect what is actually running in the cluster.

This is still a pretty serious and breaking issue (although slightly different), but I can't say if it's Vertx of Hazelcast that's the cause.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 28, 2015

Can you quantify "tiny chance"?

Some of our users can easily reproduce this in a real set-up by shutting down servers concurrently resulting in an inconsistent state. I.e. it doesn't appear to be a tiny chance, there appears to be a very significant probability of it happening. This can be seen in the reproducer I provided where the cluster gets into an inconsistent state in the majority of runs, so I'd say around a 75% chance of it happening in this setup.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented May 28, 2015

Here is what's going on:

Let's assume we have a cluster of 3 members: Member1, Member2, Member3 and we have just 3 partitions in a system.

Partition Table:

           / Member1 | Member2 | Member 3
Primary of |   0     |    1    |    2
Backup  of |   2     |    0    |    1

During a shutdown Member1 checks if Member2 has an update-to-date replica of a partition0 and it will exit only if the outcome is positive. Otherwise the Member1 waits until the backup is up-to-date.

This is the usual flow:

- Member1 -> Member2: Is your replica of `partition0` at version X
- Member2 -> Member1: Yes
- Member1 exits
- Member2 is promoted to be a new owner of `partition0`
- Member3 becomes a backup of `partition0`

However when Member2 calls shutdown before it's promoted to be a new owner, but after Member1 is killed then you will lose partition0

I was considering a quick workaround: If a member is in a process of shutting down then it will always reply NO to the "IsYourReplicateUpToDate" question, but:

  1. it could cause dead-locks.
  2. it doesn't solve all cases anyway

The shutdown process has to be re-designed. It's very likely to be a part of the 3.6 release.

@purplefox

This comment has been minimized.

Copy link
Author

commented May 28, 2015

OK, thanks for the update.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented May 28, 2015

No problem at all. I understand the pain caused by the current situation. I'll do my best to have a proper fix in 3.6.

Feel free to contact me if you have any question / workaround to discuss, etc.

@bwzhang2011

This comment has been minimized.

Copy link

commented May 28, 2015

@jerrinot, there's a doubt that shutdown logic is hidden behind as hz3.5 final has not been released and why it will be postponed to hz3.6. could it be backport into 3.4.4 or hz3.5.1 ?

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented May 28, 2015

@bwzhang2011: It's hard to answer it right now as I don't know how the final fix will look like at this moment. We tend to backports things only when we are very pretty sure they don't have any side-effects.

We'll update this ticket once we start designing the new shutdown process.

@apatrida

This comment has been minimized.

Copy link

commented Jun 8, 2015

Any updated @jerrinot

@apatrida

This comment has been minimized.

Copy link

commented Jun 8, 2015

or @purplefox , this looks deadly to vert.x 3 clustering

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2015

hi @jaysonminard not really. It's something we'll likely address in Hazelcast 3.6. I believe I saw a workaround somewhere in the vert.x codebase.

@purplefox

This comment has been minimized.

Copy link
Author

commented Jun 8, 2015

Yes it's all a bit of a mess. Luckily we have a workaround for this one which seems reasonable but we also need the following #5284 (comment) fixed in a release in the next few days.

@purplefox

This comment has been minimized.

Copy link
Author

commented Jun 8, 2015

Yes we have a workaround for this but we need 5284 fixed too (above) and we don't have a workaround for that.

@purplefox

This comment has been minimized.

Copy link
Author

commented Jun 8, 2015

If the worst comes to the worst we will have to fork and make our own hazelcast release that provides the fixes which isn't ideal, but not the end of the world.

Clearly it would be preferable if Hazelcast implemented the fixes themselves!

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2015

yeah, I understand the pain.
I'm not sure what the status of 3.4.3 is - most of our devs including myself are busy with last changes for 3.5 these days.

I'll trying to get a timeline for 3.4.3 and I'll update this ticket once I'll get some info.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2015

Hi, I talked to my colleagues and we are aiming to release 3.4.3 by the end of this week. This assumes all our internal tests will pass.

@purplefox

This comment has been minimized.

Copy link
Author

commented Jun 8, 2015

Ok, thanks @jerrinot please keep us posted.

@apatrida

This comment has been minimized.

Copy link

commented Jun 12, 2015

yay @jerrinot

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2015

Hi guys,
I apologize for not being very responsive. Our internals tests for 3.4.3 just passed. You can expect the release in no more than a few days.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 16, 2015

Hazelcast 3.4.3 was just released.

@apatrida

This comment has been minimized.

Copy link

commented Jun 16, 2015

yay!

On Tue, Jun 16, 2015 at 11:40 AM, Jaromir Hamala notifications@github.com
wrote:

Hazelcast 3.4.3 was just released.


Reply to this email directly or view it on GitHub
#5220 (comment)
.

@jerrinot

This comment has been minimized.

Copy link
Contributor

commented Jun 16, 2015

yay indeed!

@apatrida

This comment has been minimized.

Copy link

commented Jun 17, 2015

@purplefox can you please bump version of hazelcast for vert.x-3 and do a milestone7?

@jensenmo

This comment has been minimized.

Copy link

commented Jun 17, 2015

@purplefox I would second that - but for 2.1.x? Thanks!

@apatrida

This comment has been minimized.

Copy link

commented Jun 17, 2015

(its a good sign that people fight over wanting vert-x releases, because vert-x is so loved jaja)

@purplefox

This comment has been minimized.

Copy link
Author

commented Jun 18, 2015

@jerrinot @gurbuzali Unfortunately the 3.4.3 release is broken and not usable by Vert.x:

#5538

This is becoming extremely critical for us. We have a major release of Vert.x due next Monday, and right now we don't have a stable version of Hazelcast that we can use for it.

I hope the hazelcast team will look at this very urgently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.