Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast do not clean up lock after member restart #14215

Closed
heliheli opened this Issue Dec 4, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@heliheli
Copy link

heliheli commented Dec 4, 2018

We have hazelcast cluster with 3 members: arbiter and 2 nodes, and quorum rule with two members to manage some lock. Nodes trying to get lock, arbiter do nothing.
Sometimes both nodes can start to indefinitely wait lock, for example:

  1. arbiter, node1 and node2 started
  2. node1 got lock, node2 waits
  3. node1 shutdown
  4. node2 got lock
  5. node2 shutdown
  6. node2 starts again
  7. node2 indefinitely wait lock, lock owner uuid = {previous instance of node2} that was already gracefully (or not, tested both) shutdowned

Tested on 3.10.1, 3.10.6, 3.11. jdk1.8.0_181

Example:

package hazelcast;

import com.hazelcast.config.Config;
import com.hazelcast.config.LockConfig;
import com.hazelcast.config.QuorumConfig;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.ILock;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HazelcastLockTest {

	private static final Logger LOGGER = LoggerFactory.getLogger(HazelcastLockTest.class);

	private static final String LOCK = "MY_LOCK";

	@Test
	public void testLock() {
		final Config config = new Config();
		config.getGroupConfig().setName("test");
		config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
		config.getNetworkConfig().getJoin().getTcpIpConfig()
				.setEnabled(true)
				.addMember("127.0.0.1:13703")
				.addMember("127.0.0.1:13704")
				.addMember("127.0.0.1:13705");
		config.getQuorumConfigs()
				.put("quorumRuleWithTwoMembers", new QuorumConfig("quorumRuleWithTwoMembers", true, 2));
		config.addLockConfig(new LockConfig(LOCK).setQuorumName("quorumRuleWithTwoMembers"));

		// Start arbiter and two nodes: primary and secondary
		config.getNetworkConfig().setPort(13703);
		final HazelcastInstance arbiter = Hazelcast.newHazelcastInstance(config);

		config.getNetworkConfig().setPort(13704);
		final HazelcastInstance primary = Hazelcast.newHazelcastInstance(config);

		config.getNetworkConfig().setPort(13705);
		final HazelcastInstance secondary = Hazelcast.newHazelcastInstance(config);

		{
			final ILock primaryLock = primary.getLock(LOCK);
			LOGGER.info("PRIMARY READY TO GET LOCK: {} is {}, {}", primaryLock.getLockCount(),
					primaryLock.isLocked(), primaryLock.getLockCount());
			primaryLock.lock();
			LOGGER.info("PRIMARY LOCKED");
		}

		// Shutdown primary
		primary.shutdown();

		{
			final ILock secondaryLock = secondary.getLock(LOCK);
			LOGGER.info("SECONDARY READY TO GET LOCK: {} is {}, {}", secondaryLock.getLockCount(),
					secondaryLock.isLocked(), secondaryLock.getLockCount());
			secondaryLock.lock();
			LOGGER.info("SECONDARY LOCKED");
		}

		// Shutdown secondary
		secondary.shutdown();

		// Start secondary again
		config.getNetworkConfig().setPort(13705);
		final HazelcastInstance secondaryRestarted = Hazelcast.newHazelcastInstance(config);

		{
			final ILock secondaryLock = secondaryRestarted.getLock(LOCK);
			LOGGER.info("SECONDARY READY TO GET LOCK again: {} is {}, {}", secondaryLock.getLockCount(),
					secondaryLock.isLocked(), secondaryLock.getLockCount());
			secondaryLock.lock(); // TODO fails here, indefinitely wait
		}

		// Shutdown arbiter and secondary
		arbiter.shutdown();
		secondaryRestarted.shutdown();
	}

}

@mmedenjak mmedenjak added this to the 3.12 milestone Dec 4, 2018

@mmedenjak

This comment has been minimized.

Copy link
Contributor

mmedenjak commented Dec 21, 2018

Hi @heliheli !

Thank you for your report. I've managed to reproduce the issue. The lock cleanup operation actually fails because there there is no quorum after the member count drops to 1. We'll fix this in time for 3.12 and possibly 3.11.2.

@mmedenjak mmedenjak self-assigned this Dec 21, 2018

mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this issue Dec 21, 2018

Matko Medenjak
Lock cleanup operations should not check for quorum
Lock cleanup operations when a the lock owner is removed should not
check for quorum. Introduced a new interface for operations -
QuorumCheckAwareOperation which allows each operation to control whether
the quorum check is performed, regardless of its other interfaces which
might be inherited.

Fixes: hazelcast#14215

mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this issue Jan 2, 2019

Matko Medenjak
Lock cleanup operations should not check for quorum
Lock cleanup operations when a the lock owner is removed should not
check for quorum. Introduced a new interface for operations -
QuorumCheckAwareOperation which allows each operation to control whether
the quorum check is performed, regardless of its other interfaces which
might be inherited.

Fixes: hazelcast#14215

mmedenjak pushed a commit to mmedenjak/hazelcast that referenced this issue Jan 3, 2019

Matko Medenjak
Lock cleanup operations should not check for quorum
Lock cleanup operations when a the lock owner is removed should not
check for quorum. Introduced a new interface for operations -
QuorumCheckAwareOperation which allows each operation to control whether
the quorum check is performed, regardless of its other interfaces which
might be inherited.

Fixes: hazelcast#14215

mmedenjak added a commit that referenced this issue Jan 3, 2019

Lock cleanup operations should not check for quorum (#14318)
Lock cleanup operations should not check for quorum

Lock cleanup operations when a the lock owner is removed should not
check for quorum. Introduced a new interface for operations -
QuorumCheckAwareOperation which allows each operation to control whether
the quorum check is performed, regardless of its other interfaces which
might be inherited.

Fixes: #14215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.