-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FISH-7640 Improved Hazelcast functionality as it relates to CP subsystem #6318
FISH-7640 Improved Hazelcast functionality as it relates to CP subsystem #6318
Conversation
jenkins test |
b7f1846
to
c5ee720
Compare
jenkins test |
@Pandrex247 ready to go |
Do you have a simple test scenario for this? I'm not particularly familiar with the CP subsystem (I can't even find what "CP" stands for!) I started a domain with 2 instances, set |
Did you set the system properties for the instances as well (not just the DAS)? You need both |
What you are doing is what I did :). Tested by restarting nodes |
D'oh! I'd misconfigured it - thought I'd set it at the domain level but I'd set it specifically on the DAS. I still seem to be able to get it to start spamming the logs, does this behaviour sound right?
|
@Pandrex247 Good job! you found a bug :) I had set that variable to zero or one (depending on my hazelcast version) and never tested without it. |
c27f594
to
f73003f
Compare
jenkins test |
@Pandrex247 Fixed and I tested on a clean install of Payara with no parameters. The warnings when restarting instances with CP enabled is normal, even repeating messages trying to re-establish CP continuity when less than 3 nodes are available. See https://docs.hazelcast.com/hazelcast/5.3/cp-subsystem/management for more info |
Argh... found another bug or two... stay tuned |
f73003f
to
a735848
Compare
@Pandrex247 Ok, fixed now, for real this time :) |
- handles distributed lock exceptions well when CP subsystem is enabled - performs auto-cp-subsystem-reset when CP gets into unusable state
a735848
to
1bd5c57
Compare
jenkins test |
The DAS doesn't seem to be able to restart with this option turned on? Essentially the same setup as above, but when you run the Does there need to be some sort of shutdown hook override? |
CP is finicky about data loss and will try to prevent it. |
On second thought let me do some more tests. This shouldn’t happen with 3 instances but it does |
update I did the same test without CP enabled (without the PR) and I get the same error. Ok, did some more tests. Even with the "new code" (that's in this PR) disabled by using My conclusion is that Payara works better with this PR currently than without, so it's worth merging even if the issue remains. The problem is that I can't reproduce this via my standalone reproducer app so I can't really report this bug to Hazelcast in any useful way :( |
Furthermore, this test fails with no apps deployed either :) Yay! Bug is Not in this PR :) |
Found it. Looks like a regression somewhere in my code somewhere. Not this PR functionality but I’ll work on a fix here. About 50/50 it’s a big in hazelcast but not quite sure yet. It’s in the discovery SPI and interaction with it |
Or, better yet, feel free to merge this one, the above error has nothing to do with this one, I can fix the above in a separate PR. |
Issue (with PR) filed with Hazelcast: hazelcast/hazelcast#25100 |
I can't fix the issue here. This PR can be merged with no regressions. |
jenkins test |
@Pandrex247 I have disabled auto partition groups by default, which was triggering the Hazelcast bug. It works now... for realz :) |
if (Boolean.parseBoolean(System.getProperty("hazelcast.auto-partition-group", "false"))) { | ||
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig(); | ||
partitionGroupConfig.setEnabled(true); | ||
partitionGroupConfig.setGroupType(PartitionGroupConfig.MemberGroupType.SPI); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this changing the default behaviour even with CP subsytem turned off?
Now the partitions will default to "member" with each instance being in its own partition group instead of being determined by the DomainDiscoveryStrategy which utilises the instance group attribute and is optionally host-aware.
Without this it won't be host aware and the instance groups will be ignored for partition purposes - I'm not convinced this is an improvement.
Yes. The partitioning SPI bug is in Hazelcast and is present in the current Payra. Has nothing to do with CP.
Unless "host aware" is checked, everything will be in one partition group, which is the correct default behavior. Of course, once Hazelcast merges my fix, we can put this back to "turn on by default" in Payara |
Looks like my fix for hazelcast/hazelcast#25100 has been merged by hazelcast |
Looks like it is marked for Hazelcast 5.4.0, I'll try to keep an eye out. I still haven't got around to looking into what you've said, I'll try to make some time over the coming days. Unless you're now intending to change the behaviour back? I don't know how far away Hazelcast 5.4.0 is. |
that's the big questions isn't it? I am inclined to change it back now. What do you think? |
I think it makes sense to change it back - keeps this PR a bit cleaner with respects to not changing default behaviour |
ok, I'll change it back |
jenkins test |
At this stage, I think this is safe to merge now IMHO. Thank you! |
Description
Probably more of a bug fix than enhancement
See hazelcast/hazelcast#24897 for context
Documentation updates needed
hazelcast.cp-subsystem.auto-promote
(boolean, defaults to true) - Enable / Disable auto-promote functionalityhazelcast.auto-partition-group
(boolean, temporarily defaults to false) - Enable / Disable auto-partition-group functionalityhazelcast.cp-subsystem.cp-member-count
(int, defaults to zero) to enable CP subsystem without additional Hazelcast configurationTemporary workaround for Hazelcast issue
Auto partition groups are now disabled and system property has been added to control whether they are enabled or not.
Currently, disabled by default since issues are caused due to hazelcast/hazelcast#25100
Should be turned back on by default when Hazelcast is upgraded with the fix.