Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PNCounter not replicating #13793

Closed
neilstevenson opened this issue Sep 18, 2018 · 17 comments
Closed

PNCounter not replicating #13793

neilstevenson opened this issue Sep 18, 2018 · 17 comments

Comments

@neilstevenson
Copy link
Contributor

@neilstevenson neilstevenson commented Sep 18, 2018

On 3.10.5, PNCounter is not replicating using the default configuration

Steps to recreate
(1) start a member with no PNCounter or CRDT configuration
(2) create an increment some PNCounters from a client
(3) wait a few minutes to be sure replication-period-millis has passed
(4) start a 2nd member
(5) wait a few minutes to be sure replication-period-millis has passed
(6) kill the 1st member

PNCounters are lost.

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 18, 2018

Can you post a reproducer with the config here?

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 18, 2018

Also, can you post the logs?

@mmedenjak mmedenjak added this to the 3.11 milestone Sep 18, 2018
@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 18, 2018

Talked through this with @mmedenjak via webex.
ManCenter shows "No of Replicas = 1" after step 3 above, and this doesn't change after step 5.

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 18, 2018

The default config specifies a replica count of Integer.MAX_VALUE. I'm suspecting something is setting the replica count to 1. I will need the code to determine this.

If the replica count is 1, this scenario is expected.

@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 18, 2018

Hazelcast XML has no PNCounter or CRDT sections.
Java code boils down to

	public void run(String... args) throws Exception {
		Member member = this.hazelcastInstance.getCluster().getLocalMember();
		
		String counterName = 
				LocalTime.now().toString()
				+ " for " +
				member.getAddress().getHost() + ":" + member.getAddress().getPort();
		
		PNCounter pnCounter = this.hazelcastInstance.getPNCounter(counterName);
		pnCounter.incrementAndGet();

		System.out.println("===================");
		this.hazelcastInstance.getDistributedObjects()
        .stream()
        .filter(distributedObject -> distributedObject instanceof PNCounter)
        .forEach(distributedObject -> System.out.println("Distributed Object : " + distributedObject.getName()));
		System.out.println("===================");
		
	}
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 18, 2018

@emre-aydin where is the "replica count" value in mancenter coming from?

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 18, 2018

The hazelcast configuration is:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xmlns="http://www.hazelcast.com/schema/config"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.hazelcast.com/schema/config
           http://www.hazelcast.com/schema/config/hazelcast-config-3.10.xsd">

	<management-center enabled="true">http://localhost:8080/hazelcast-mancenter</management-center>

	<network>
		<join>
			<multicast enabled="false" />
			<tcp-ip enabled="true">
				<member-list>
					<member>127.0.0.1:5701</member>
				</member-list>
			</tcp-ip>
		</join>
	</network>

</hazelcast>

and the steps to reproduce are:

# Reproduce for [13793](https://github.com/hazelcast/hazelcast/issues/13793)

Instructions

* Run `java -jar target/issue13793-0.0.1-SNAPSHOT.jar`.
    * Lists one distributed object
* Wait one minute
* Run `java -jar target/issue13793-0.0.1-SNAPSHOT.jar` again to start 2nd member
    * Lists two distributed objects
* Wait one minute
* Kill off member 1
* Restart member 1
    * Lists one distirbuted object (should be two)

When we start one member, it starts it's own cluster ([127.0.0.1]:5701). When we start second member ([127.0.0.1]:5702), it finds the first one since it's in the hazelcast config and they form a cluster.
Then we kill the first member and restart it. Member [127.0.0.1]:5701 will now start alone since it does not know about member [127.0.0.1]:5702 (it is not in the configuration).

Since the members don't form a cluster, no replication is done and the state from the second member is not replicated to the first one.

@emre-aydin
Copy link
Member

@emre-aydin emre-aydin commented Sep 18, 2018

@jbee should know better.

@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 18, 2018

The discovery list in the reproducer just adds to the noise, so forget that last step

Start one server.
Wait.
Start a second server.
Kill the first server.

WIth this repoducer, for me on 3.10.5 the PNCounter from the first server is lost.

@jbee
Copy link
Contributor

@jbee jbee commented Sep 18, 2018

The number of replicas is the number of members that send a LocalPNCounterStatsImpl for the particular counter.

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 19, 2018

@jbee thanks!

@neilstevenson I'm suspecting your members are not forming a cluster. I've modified the code to periodically print out the PNCounter proxies. When I run the steps you've provided I get these logs:

First server

2018-09-19 08:55:27.703  INFO 17231 --- [           main] com.hazelcast.github.Issue13793          : No active profile set, falling back to default profiles: default
2018-09-19 08:55:27.755  INFO 17231 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@551bdc27: startup date [Wed Sep 19 08:55:27 CEST 2018]; root of context hierarchy
2018-09-19 08:55:28.202  INFO 17231 --- [           main] com.hazelcast.config.XmlConfigLocator    : Loading 'hazelcast.xml' from classpath.
2018-09-19 08:55:28.264  WARN 17231 --- [           main] c.h.config.AbstractXmlConfigHelper       : Name of the hazelcast schema location is incorrect, using default
2018-09-19 08:55:28.472  INFO 17231 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.11-BETA-1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [127.0.0.1]
2018-09-19 08:55:28.492  INFO 17231 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.11-BETA-1] Picked [127.0.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2018-09-19 08:55:28.504  INFO 17231 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Hazelcast 3.11-BETA-1 (20180828 - 8bdc7f9) starting at [127.0.0.1]:5701
2018-09-19 08:55:28.506  INFO 17231 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
2018-09-19 08:55:28.506  INFO 17231 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Configured Hazelcast Serialization version: 1
2018-09-19 08:55:28.507  INFO 17231 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5701 [dev] [3.11-BETA-1] A non-empty group password is configured for the Hazelcast member. Starting with Hazelcast version 3.8.2, members with the same group name, but with different group passwords (that do not use authentication) form a cluster. The group password configuration will be removed completely in a future release.
2018-09-19 08:55:28.673  INFO 17231 --- [           main] c.h.s.i.o.impl.BackpressureRegulator     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Backpressure is disabled
2018-09-19 08:55:28.683  INFO 17231 --- [           main] h.s.i.o.i.InboundResponseHandlerSupplier : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Running with 2 response threads
2018-09-19 08:55:29.041  INFO 17231 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Creating TcpIpJoiner
2018-09-19 08:55:29.178  INFO 17231 --- [           main] c.h.s.i.o.impl.OperationExecutorImpl     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Starting 8 partition threads and 5 generic threads (1 dedicated for priority tasks)
2018-09-19 08:55:29.179  INFO 17231 --- [           main] c.h.internal.diagnostics.Diagnostics     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2018-09-19 08:55:29.184  INFO 17231 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5701 [dev] [3.11-BETA-1] [127.0.0.1]:5701 is STARTING
2018-09-19 08:55:29.198  INFO 17231 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Cluster version set to 3.11
2018-09-19 08:55:29.199  INFO 17231 --- [           main] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5701 [dev] [3.11-BETA-1] 

Members {size:1, ver:1} [
	Member [127.0.0.1]:5701 - e7cc9f5d-9a09-4a4e-a263-4df22ae187c8 this
]

2018-09-19 08:55:29.217  INFO 17231 --- [           main] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Hazelcast will connect to Hazelcast Management Center on address: 
http://localhost:8080/hazelcast-mancenter
2018-09-19 08:55:29.242  INFO 17231 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5701 [dev] [3.11-BETA-1] [127.0.0.1]:5701 is STARTED
2018-09-19 08:55:29.251  INFO 17231 --- [.MC.Task.Poller] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Failed to pull tasks from management center
2018-09-19 08:55:29.252  INFO 17231 --- [MC.State.Sender] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Failed to connect to: http://localhost:8080/hazelcast-mancenter/collector.do
2018-09-19 08:55:29.378  INFO 17231 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2018-09-19 08:55:29.389  INFO 17231 --- [           main] com.hazelcast.github.Issue13793          : Started Issue13793 in 1.962 seconds (JVM running for 2.426)
===================
===================
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
2018-09-19 08:55:38.321  INFO 17231 --- [thread-Acceptor] com.hazelcast.nio.tcp.TcpIpAcceptor      : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Accepting socket connection from /127.0.0.1:56900
2018-09-19 08:55:38.322  INFO 17231 --- [thread-Acceptor] c.h.nio.tcp.TcpIpConnectionManager       : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Established socket connection between /127.0.0.1:5701 and /127.0.0.1:56900
===================
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
2018-09-19 08:55:44.332  INFO 17231 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5701 [dev] [3.11-BETA-1] 

Members {size:2, ver:2} [
	Member [127.0.0.1]:5701 - e7cc9f5d-9a09-4a4e-a263-4df22ae187c8 this
	Member [127.0.0.1]:5702 - 6db0b99c-ed78-44d3-b55d-0acff3a53a47
]

===================
Distributed Object : 08:55:45.516 for 127.0.0.1:5702
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
2018-09-19 08:55:53.022  INFO 17231 --- [.ShutdownThread] com.hazelcast.instance.Node              : [127.0.0.1]:5701 [dev] [3.11-BETA-1] Running shutdown hook... Current state: ACTIVE
2018-09-19 08:55:53.022  INFO 17231 --- [       Thread-2] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@551bdc27: startup date [Wed Sep 19 08:55:27 CEST 2018]; root of context hierarchy
2018-09-19 08:55:53.023  INFO 17231 --- [       Thread-2] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown

Second server

2018-09-19 08:55:36.747  INFO 17233 --- [           main] com.hazelcast.github.Issue13793          : No active profile set, falling back to default profiles: default
2018-09-19 08:55:36.798  INFO 17233 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@226a82c4: startup date [Wed Sep 19 08:55:36 CEST 2018]; root of context hierarchy
2018-09-19 08:55:37.252  INFO 17233 --- [           main] com.hazelcast.config.XmlConfigLocator    : Loading 'hazelcast.xml' from classpath.
2018-09-19 08:55:37.322  WARN 17233 --- [           main] c.h.config.AbstractXmlConfigHelper       : Name of the hazelcast schema location is incorrect, using default
2018-09-19 08:55:37.558  INFO 17233 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.11-BETA-1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [127.0.0.1]
2018-09-19 08:55:37.580  INFO 17233 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.11-BETA-1] Picked [127.0.0.1]:5702, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5702], bind any local is true
2018-09-19 08:55:37.594  INFO 17233 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Hazelcast 3.11-BETA-1 (20180828 - 8bdc7f9) starting at [127.0.0.1]:5702
2018-09-19 08:55:37.594  INFO 17233 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
2018-09-19 08:55:37.594  INFO 17233 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Configured Hazelcast Serialization version: 1
2018-09-19 08:55:37.594  INFO 17233 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5702 [dev] [3.11-BETA-1] A non-empty group password is configured for the Hazelcast member. Starting with Hazelcast version 3.8.2, members with the same group name, but with different group passwords (that do not use authentication) form a cluster. The group password configuration will be removed completely in a future release.
2018-09-19 08:55:37.764  INFO 17233 --- [           main] c.h.s.i.o.impl.BackpressureRegulator     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Backpressure is disabled
2018-09-19 08:55:37.777  INFO 17233 --- [           main] h.s.i.o.i.InboundResponseHandlerSupplier : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Running with 2 response threads
2018-09-19 08:55:38.144  INFO 17233 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Creating TcpIpJoiner
2018-09-19 08:55:38.277  INFO 17233 --- [           main] c.h.s.i.o.impl.OperationExecutorImpl     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Starting 8 partition threads and 5 generic threads (1 dedicated for priority tasks)
2018-09-19 08:55:38.278  INFO 17233 --- [           main] c.h.internal.diagnostics.Diagnostics     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2018-09-19 08:55:38.283  INFO 17233 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5702 [dev] [3.11-BETA-1] [127.0.0.1]:5702 is STARTING
2018-09-19 08:55:38.310  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:38.314  INFO 17233 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager       : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Established socket connection between /127.0.0.1:56900 and /127.0.0.1:5701
2018-09-19 08:55:44.337  INFO 17233 --- [ration.thread-0] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Cluster version set to 3.11
2018-09-19 08:55:44.339  INFO 17233 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5702 [dev] [3.11-BETA-1] 

Members {size:2, ver:2} [
	Member [127.0.0.1]:5701 - e7cc9f5d-9a09-4a4e-a263-4df22ae187c8
	Member [127.0.0.1]:5702 - 6db0b99c-ed78-44d3-b55d-0acff3a53a47 this
]

2018-09-19 08:55:45.349  INFO 17233 --- [           main] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Hazelcast will connect to Hazelcast Management Center on address: 
http://localhost:8080/hazelcast-mancenter
2018-09-19 08:55:45.355  INFO 17233 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5702 [dev] [3.11-BETA-1] [127.0.0.1]:5702 is STARTED
2018-09-19 08:55:45.359  INFO 17233 --- [.MC.Task.Poller] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Failed to pull tasks from management center
2018-09-19 08:55:45.359  INFO 17233 --- [MC.State.Sender] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Failed to connect to: http://localhost:8080/hazelcast-mancenter/collector.do
2018-09-19 08:55:45.498  INFO 17233 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2018-09-19 08:55:45.510  INFO 17233 --- [           main] com.hazelcast.github.Issue13793          : Started Issue13793 in 9.037 seconds (JVM running for 9.489)
===================
===================
Distributed Object : 08:55:45.516 for 127.0.0.1:5702
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
2018-09-19 08:55:53.173  INFO 17233 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection    : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connection[id=1, /127.0.0.1:56900->/127.0.0.1:5701, endpoint=[127.0.0.1]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2018-09-19 08:55:53.175  INFO 17233 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.175  INFO 17233 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.272  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.273  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.276  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.277  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.376  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.376  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.378  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.379  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.479  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.480  INFO 17233 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.480  WARN 17233 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionErrorHandler  : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Removing connection to endpoint [127.0.0.1]:5701 Cause => java.net.SocketException {Connection refused to address /127.0.0.1:5701}, Error-Count: 5
2018-09-19 08:55:53.481  WARN 17233 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Member [127.0.0.1]:5701 - e7cc9f5d-9a09-4a4e-a263-4df22ae187c8 is suspected to be dead for reason: No connection
2018-09-19 08:55:53.482  INFO 17233 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Starting mastership claim process...
2018-09-19 08:55:53.482  INFO 17233 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 08:55:53.482  INFO 17233 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Local MembersView{version=2, members=[MemberInfo{address=[127.0.0.1]:5701, uuid=e7cc9f5d-9a09-4a4e-a263-4df22ae187c8, liteMember=false, memberListJoinVersion=1}, MemberInfo{address=[127.0.0.1]:5702, uuid=6db0b99c-ed78-44d3-b55d-0acff3a53a47, liteMember=false, memberListJoinVersion=2}]} with suspected members: [[127.0.0.1]:5701] and initial addresses to ask: []
2018-09-19 08:55:53.483  INFO 17233 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 08:55:53.486  INFO 17233 --- [cached.thread-2] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5702 [dev] [3.11-BETA-1] 

Members {size:1, ver:3} [
	Member [127.0.0.1]:5702 - 6db0b99c-ed78-44d3-b55d-0acff3a53a47 this
]

2018-09-19 08:55:53.487  INFO 17233 --- [cached.thread-2] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Mastership is claimed with: MembersView{version=3, members=[MemberInfo{address=[127.0.0.1]:5702, uuid=6db0b99c-ed78-44d3-b55d-0acff3a53a47, liteMember=false, memberListJoinVersion=2}]}
2018-09-19 08:55:53.489  INFO 17233 --- [1_dev.migration] c.h.i.p.InternalPartitionService         : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Fetching most recent partition table! my version: 0
2018-09-19 08:55:53.490  INFO 17233 --- [1_dev.migration] c.h.i.p.InternalPartitionService         : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Most recent partition table version: 0
2018-09-19 08:55:53.490  INFO 17233 --- [cached.thread-3] c.h.t.TransactionManagerService          : [127.0.0.1]:5702 [dev] [3.11-BETA-1] Committing/rolling-back live transactions of [127.0.0.1]:5701, UUID: e7cc9f5d-9a09-4a4e-a263-4df22ae187c8
===================
Distributed Object : 08:55:45.516 for 127.0.0.1:5702
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
===================
Distributed Object : 08:55:45.516 for 127.0.0.1:5702
Distributed Object : 08:55:29.395 for 127.0.0.1:5701
===================
Distributed Object : 08:55:45.516 for 127.0.0.1:5702
Distributed Object : 08:55:29.395 for 127.0.0.1:5701

We can see that even after the first server is killed, the second server will contain both PNCounters.
I will need member logs from the time you've reproduced the issue. They can confirm this.

@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 19, 2018

This may be a reporting issue.

Try this,
(1) Start one server
(2) Wait
(3) Check PNCounter on MC, shows the counter from server 1
(4) Start a second server
(5) Wait
(6) Check PNCounter on MC, shows the counters from server 1 and server 2
... all good so far, although both report on MC with "1 replica"

(7) Kill server 1
(8) on MC, the PNCounter from server 1 disappears

Tried on HZ 3.10.5, MC 3.10.3, latest versions today

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 19, 2018

Can you send me the member logs?

@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 19, 2018

Server one

$ java -jar target/issue13793-0.0.1-SNAPSHOT.jar 

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.0.5.RELEASE)

2018-09-19 09:21:24.090  INFO 40712 --- [           main] com.hazelcast.github.Issue13793          : Starting Issue13793 v0.0.1-SNAPSHOT on yourhostname.home with PID 40712 (/Users/neil/EclipseProjects/tickets2/github13793/target/issue13793-0.0.1-SNAPSHOT.jar started by neil in /Users/neil/EclipseProjects/tickets2/github13793)
2018-09-19 09:21:24.094  INFO 40712 --- [           main] com.hazelcast.github.Issue13793          : No active profile set, falling back to default profiles: default
2018-09-19 09:21:24.157  INFO 40712 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@7cef4e59: startup date [Wed Sep 19 09:21:24 BST 2018]; root of context hierarchy
2018-09-19 09:21:24.807  INFO 40712 --- [           main] com.hazelcast.config.XmlConfigLocator    : Loading 'hazelcast.xml' from classpath.
2018-09-19 09:21:25.132  INFO 40712 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.10.5] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [127.0.0.1]
2018-09-19 09:21:25.152  INFO 40712 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.10.5] Picked [127.0.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2018-09-19 09:21:25.171  INFO 40712 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.10.5] Hazelcast 3.10.5 (20180913 - 6ffa2ee) starting at [127.0.0.1]:5701
2018-09-19 09:21:25.171  INFO 40712 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.10.5] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
2018-09-19 09:21:25.171  INFO 40712 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.10.5] Configured Hazelcast Serialization version: 1
2018-09-19 09:21:25.171  INFO 40712 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5701 [dev] [3.10.5] A non-empty group password is configured for the Hazelcast member. Starting with Hazelcast version 3.8.2, members with the same group name, but with different group passwords (that do not use authentication) form a cluster. The group password configuration will be removed completely in a future release.
2018-09-19 09:21:25.429  INFO 40712 --- [           main] c.h.s.i.o.impl.BackpressureRegulator     : [127.0.0.1]:5701 [dev] [3.10.5] Backpressure is disabled
2018-09-19 09:21:25.445  INFO 40712 --- [           main] h.s.i.o.i.InboundResponseHandlerSupplier : [127.0.0.1]:5701 [dev] [3.10.5] Running with 2 response threads
2018-09-19 09:21:25.973  INFO 40712 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5701 [dev] [3.10.5] Creating TcpIpJoiner
2018-09-19 09:21:26.238  INFO 40712 --- [           main] c.h.s.i.o.impl.OperationExecutorImpl     : [127.0.0.1]:5701 [dev] [3.10.5] Starting 4 partition threads and 3 generic threads (1 dedicated for priority tasks)
2018-09-19 09:21:26.239  INFO 40712 --- [           main] c.h.internal.diagnostics.Diagnostics     : [127.0.0.1]:5701 [dev] [3.10.5] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2018-09-19 09:21:26.244  INFO 40712 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5701 [dev] [3.10.5] [127.0.0.1]:5701 is STARTING
2018-09-19 09:21:26.265  INFO 40712 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5701 [dev] [3.10.5] Connecting to /127.0.0.1:5702, timeout: 0, bind-any: true
2018-09-19 09:21:26.272  INFO 40712 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5701 [dev] [3.10.5] Could not connect to: /127.0.0.1:5702. Reason: SocketException[Connection refused to address /127.0.0.1:5702]
2018-09-19 09:21:26.272  INFO 40712 --- [cached.thread-3] com.hazelcast.cluster.impl.TcpIpJoiner   : [127.0.0.1]:5701 [dev] [3.10.5] [127.0.0.1]:5702 is added to the blacklist.
2018-09-19 09:21:27.268  INFO 40712 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5701 [dev] [3.10.5] Cluster version set to 3.10
2018-09-19 09:21:27.270  INFO 40712 --- [           main] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5701 [dev] [3.10.5] 

Members {size:1, ver:1} [
	Member [127.0.0.1]:5701 - d30e8c3d-3266-46f7-ab41-e8cb439ab8d5 this
]

2018-09-19 09:21:27.296  INFO 40712 --- [           main] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.10.5] Hazelcast will connect to Hazelcast Management Center on address: 
http://localhost:8080/hazelcast-mancenter
2018-09-19 09:21:27.322  INFO 40712 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5701 [dev] [3.10.5] [127.0.0.1]:5701 is STARTED
2018-09-19 09:21:27.341  INFO 40712 --- [MC.State.Sender] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.10.5] Failed to connect to:http://localhost:8080/hazelcast-mancenter/collector.do
2018-09-19 09:21:27.342  INFO 40712 --- [.MC.Task.Poller] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.10.5] Failed to pull tasks from management center
2018-09-19 09:21:27.513  INFO 40712 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2018-09-19 09:21:27.530  INFO 40712 --- [           main] com.hazelcast.github.Issue13793          : Started Issue13793 in 4.013 seconds (JVM running for 4.547)
===================
Distributed Object : 09:21:27.531 for 127.0.0.1:5701
===================
2018-09-19 09:21:47.307  INFO 40712 --- [v.HealthMonitor] c.h.internal.diagnostics.HealthMonitor   : [127.0.0.1]:5701 [dev] [3.10.5] processors=4, physical.memory.total=16.0G, physical.memory.free=5.4G, swap.space.total=1024.0M, swap.space.free=984.3M, heap.memory.used=22.1M, heap.memory.free=278.9M, heap.memory.total=301.0M, heap.memory.max=3.6G, heap.memory.used/total=7.33%, heap.memory.used/max=0.61%, minor.gc.count=7, minor.gc.time=36ms, major.gc.count=2, major.gc.time=71ms, load.process=45.78%, load.system=NaN%, load.systemAverage=2.48, thread.count=40, thread.peakCount=41, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=2, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=0, client.connection.count=0, connection.count=0
2018-09-19 09:21:51.370  INFO 40712 --- [MC.State.Sender] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.10.5] Connection to management center restored.
2018-09-19 09:22:16.794  INFO 40712 --- [.MC.Task.Poller] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5701 [dev] [3.10.5] Management center task polling successful.
2018-09-19 09:24:14.876  INFO 40712 --- [thread-Acceptor] com.hazelcast.nio.tcp.TcpIpAcceptor      : [127.0.0.1]:5701 [dev] [3.10.5] Accepting socket connection from /127.0.0.1:59386
2018-09-19 09:24:14.877  INFO 40712 --- [thread-Acceptor] c.h.nio.tcp.TcpIpConnectionManager       : [127.0.0.1]:5701 [dev] [3.10.5] Established socket connection between /127.0.0.1:5701 and /127.0.0.1:59386
2018-09-19 09:24:20.905  INFO 40712 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5701 [dev] [3.10.5] 

Members {size:2, ver:2} [
	Member [127.0.0.1]:5701 - d30e8c3d-3266-46f7-ab41-e8cb439ab8d5 this
	Member [127.0.0.1]:5702 - 77bd8d5c-85ab-4b9d-9f8d-3aa85859bf8e
]

Killed: 9
$ 

Server two

$ java -jar target/issue13793-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.0.5.RELEASE)

2018-09-19 09:24:12.855  INFO 40891 --- [           main] com.hazelcast.github.Issue13793          : Starting Issue13793 v0.0.1-SNAPSHOT on yourhostname.home with PID 40891 (/Users/neil/EclipseProjects/tickets2/github13793/target/issue13793-0.0.1-SNAPSHOT.jar started by neil in /Users/neil/EclipseProjects/tickets2/github13793)
2018-09-19 09:24:12.858  INFO 40891 --- [           main] com.hazelcast.github.Issue13793          : No active profile set, falling back to default profiles: default
2018-09-19 09:24:12.921  INFO 40891 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@7cef4e59: startup date [Wed Sep 19 09:24:12 BST 2018]; root of context hierarchy
2018-09-19 09:24:13.425  INFO 40891 --- [           main] com.hazelcast.config.XmlConfigLocator    : Loading 'hazelcast.xml' from classpath.
2018-09-19 09:24:13.808  INFO 40891 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.10.5] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [127.0.0.1]
2018-09-19 09:24:13.831  INFO 40891 --- [           main] com.hazelcast.instance.AddressPicker     : [LOCAL] [dev] [3.10.5] Picked [127.0.0.1]:5702, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5702], bind any local is true
2018-09-19 09:24:13.850  INFO 40891 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.10.5] Hazelcast 3.10.5 (20180913 - 6ffa2ee) starting at [127.0.0.1]:5702
2018-09-19 09:24:13.850  INFO 40891 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.10.5] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
2018-09-19 09:24:13.850  INFO 40891 --- [           main] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.10.5] Configured Hazelcast Serialization version: 1
2018-09-19 09:24:13.850  INFO 40891 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5702 [dev] [3.10.5] A non-empty group password is configured for the Hazelcast member. Starting with Hazelcast version 3.8.2, members with the same group name, but with different group passwords (that do not use authentication) form a cluster. The group password configuration will be removed completely in a future release.
2018-09-19 09:24:14.124  INFO 40891 --- [           main] c.h.s.i.o.impl.BackpressureRegulator     : [127.0.0.1]:5702 [dev] [3.10.5] Backpressure is disabled
2018-09-19 09:24:14.141  INFO 40891 --- [           main] h.s.i.o.i.InboundResponseHandlerSupplier : [127.0.0.1]:5702 [dev] [3.10.5] Running with 2 response threads
2018-09-19 09:24:14.701  INFO 40891 --- [           main] com.hazelcast.instance.Node              : [127.0.0.1]:5702 [dev] [3.10.5] Creating TcpIpJoiner
2018-09-19 09:24:14.845  INFO 40891 --- [           main] c.h.s.i.o.impl.OperationExecutorImpl     : [127.0.0.1]:5702 [dev] [3.10.5] Starting 4 partition threads and 3 generic threads (1 dedicated for priority tasks)
2018-09-19 09:24:14.847  INFO 40891 --- [           main] c.h.internal.diagnostics.Diagnostics     : [127.0.0.1]:5702 [dev] [3.10.5] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2018-09-19 09:24:14.852  INFO 40891 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5702 [dev] [3.10.5] [127.0.0.1]:5702 is STARTING
2018-09-19 09:24:14.872  INFO 40891 --- [cached.thread-3] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 09:24:14.876  INFO 40891 --- [cached.thread-3] c.h.nio.tcp.TcpIpConnectionManager       : [127.0.0.1]:5702 [dev] [3.10.5] Established socket connection between /127.0.0.1:59386 and /127.0.0.1:5701
2018-09-19 09:24:20.915  INFO 40891 --- [ration.thread-0] com.hazelcast.system                     : [127.0.0.1]:5702 [dev] [3.10.5] Cluster version set to 3.10
2018-09-19 09:24:20.917  INFO 40891 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5702 [dev] [3.10.5] 

Members {size:2, ver:2} [
	Member [127.0.0.1]:5701 - d30e8c3d-3266-46f7-ab41-e8cb439ab8d5
	Member [127.0.0.1]:5702 - 77bd8d5c-85ab-4b9d-9f8d-3aa85859bf8e this
]

2018-09-19 09:24:21.955  INFO 40891 --- [           main] c.h.i.m.ManagementCenterService          : [127.0.0.1]:5702 [dev] [3.10.5] Hazelcast will connect to Hazelcast Management Center on address: 
http://localhost:8080/hazelcast-mancenter
2018-09-19 09:24:21.968  INFO 40891 --- [           main] com.hazelcast.core.LifecycleService      : [127.0.0.1]:5702 [dev] [3.10.5] [127.0.0.1]:5702 is STARTED
2018-09-19 09:24:22.211  INFO 40891 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2018-09-19 09:24:22.226  INFO 40891 --- [           main] com.hazelcast.github.Issue13793          : Started Issue13793 in 9.724 seconds (JVM running for 10.289)
===================
Distributed Object : 09:21:27.531 for 127.0.0.1:5701
Distributed Object : 09:24:22.229 for 127.0.0.1:5702
===================
2018-09-19 09:28:50.558  INFO 40891 --- [.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection    : [127.0.0.1]:5702 [dev] [3.10.5] Connection[id=1, /127.0.0.1:59386->/127.0.0.1:5701, endpoint=[127.0.0.1]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
2018-09-19 09:28:50.563  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 09:28:50.564  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 09:28:50.665  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 09:28:50.666  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 09:28:50.770  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 09:28:50.770  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 09:28:50.871  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Connecting to /127.0.0.1:5701, timeout: 0, bind-any: true
2018-09-19 09:28:50.872  INFO 40891 --- [cached.thread-2] com.hazelcast.nio.tcp.TcpIpConnector     : [127.0.0.1]:5702 [dev] [3.10.5] Could not connect to: /127.0.0.1:5701. Reason: SocketException[Connection refused to address /127.0.0.1:5701]
2018-09-19 09:28:50.872  WARN 40891 --- [cached.thread-2] c.h.nio.tcp.TcpIpConnectionErrorHandler  : [127.0.0.1]:5702 [dev] [3.10.5] Removing connection to endpoint [127.0.0.1]:5701 Cause => java.net.SocketException {Connection refused to address /127.0.0.1:5701}, Error-Count: 5
2018-09-19 09:28:50.873  WARN 40891 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.10.5] Member [127.0.0.1]:5701 - d30e8c3d-3266-46f7-ab41-e8cb439ab8d5 is suspected to be dead for reason: No connection
2018-09-19 09:28:50.874  INFO 40891 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.10.5] Starting mastership claim process...
2018-09-19 09:28:50.874  INFO 40891 --- [cached.thread-3] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.10.5] Local MembersView{version=2, members=[MemberInfo{address=[127.0.0.1]:5701, uuid=d30e8c3d-3266-46f7-ab41-e8cb439ab8d5, liteMember=false, memberListJoinVersion=1}, MemberInfo{address=[127.0.0.1]:5702, uuid=77bd8d5c-85ab-4b9d-9f8d-3aa85859bf8e, liteMember=false, memberListJoinVersion=2}]} with suspected members: [[127.0.0.1]:5701] and initial addresses to ask: []
2018-09-19 09:28:50.878  INFO 40891 --- [cached.thread-5] c.h.internal.cluster.ClusterService      : [127.0.0.1]:5702 [dev] [3.10.5] 

Members {size:1, ver:3} [
	Member [127.0.0.1]:5702 - 77bd8d5c-85ab-4b9d-9f8d-3aa85859bf8e this
]

2018-09-19 09:28:50.878  INFO 40891 --- [cached.thread-5] c.h.i.cluster.impl.MembershipManager     : [127.0.0.1]:5702 [dev] [3.10.5] Mastership is claimed with: MembersView{version=3, members=[MemberInfo{address=[127.0.0.1]:5702, uuid=77bd8d5c-85ab-4b9d-9f8d-3aa85859bf8e, liteMember=false, memberListJoinVersion=2}]}
2018-09-19 09:28:50.880  INFO 40891 --- [cached.thread-2] c.h.t.TransactionManagerService          : [127.0.0.1]:5702 [dev] [3.10.5] Committing/rolling-back live transactions of [127.0.0.1]:5701, UUID: d30e8c3d-3266-46f7-ab41-e8cb439ab8d5
2018-09-19 09:28:51.063  INFO 40891 --- [1_dev.migration] c.h.i.p.InternalPartitionService         : [127.0.0.1]:5702 [dev] [3.10.5] Fetching most recent partition table! my version: 0
2018-09-19 09:28:51.063  INFO 40891 --- [1_dev.migration] c.h.i.p.InternalPartitionService         : [127.0.0.1]:5702 [dev] [3.10.5] Most recent partition table version: 0
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 19, 2018

I see that member 2 receives the PN counter immediately after startup. So I don't see any problem here. The replica count is probably > 1 since the other member receives the counter data.

Can you add this code and periodically print out the PN counters on both members?

ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
        executor.scheduleAtFixedRate(() -> {
            System.out.println("===================");
            this.hazelcastInstance.getDistributedObjects()
                                  .stream()
                                  .filter(distributedObject -> distributedObject instanceof PNCounter)
                                  .forEach(distributedObject -> System.out.println("Distributed Object : " + distributedObject.getName()));
        }, 0, 10, TimeUnit.SECONDS);

I'd like to see if the PN counter data remains on member 2 after member 1 is killed. If this is so, then the problem is probably with reporting on mancenter and not in replication itself.

@neilstevenson
Copy link
Contributor Author

@neilstevenson neilstevenson commented Sep 19, 2018

Thanks, there's no need for that, I tried with a client and the PNCounters are both there.

So either the surviving server isn't sending the info to MC fo both counters or MC isn't showing it

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Sep 19, 2018

I think I see the problem. We report the "local" PN counter statistics for each member to mancenter. These local stats are created when there is an add operation on that member but not when we replicate the state from an another replica. Since the first PNCounter was only updated on the first member, the second member will contain the data but not report it to mancenter.

That's also why mancenter reports only 1 replica while the state is actually stored on two. This is a core issue that should be easy to fix.

mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Sep 19, 2018
PNCounter statistics were previously created only when the counter was
being mutated. The counter state may also be created because of
replication and the counter state may be removed because of migration.
PNCounter statistics should follow this lifecycle.

Fixes: hazelcast#13793
mmedenjak added a commit that referenced this issue Sep 20, 2018
…13804)

PNCounter statistics were previously created only when the counter was
being mutated. The counter state may also be created because of
replication and the counter state may be removed because of migration.
PNCounter statistics should follow this lifecycle.

Fixes: #13793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.