Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot shut down a Neo4j instance that forms the first member of a cluster #12530

Open
luanne opened this issue Jun 6, 2020 · 3 comments
Open

Comments

@luanne
Copy link

luanne commented Jun 6, 2020

I configured the first instance of a causal cluster and started it up.
As expected, it reports:

2020-06-06 10:19:12.124+0000 INFO  ======== Neo4j 4.0.5 ========
2020-06-06 10:19:12.127+0000 INFO  Starting...
2020-06-06 10:19:15.931+0000 INFO  Database 'system' is waiting for a total of 3 core members...

I tried to shut it down to change some config but I cannot. Tried neo4j stop as well as killing the process when I start it with neo4j console. I was forced to kill the process by pid in both cases.

The logs report:

020-06-06 10:19:49.948+0000 INFO  Neo4j Server shutdown initiated by request
2020-06-06 10:19:56.186+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:06.245+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:16.280+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:26.345+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:36.351+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:46.357+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:56.436+0000 INFO  Database 'system' is waiting for a total of 3 core members...
  • Neo4j version: 4.0.5
  • Operating system: OSX
  • API/Driver:N/A
  • Steps to reproduce
  1. Start up the first instance configured to be a core member of a cluster. Do not start any other instances.
  2. Try to shut it down
  • Expected behavior
    It shuts down gracefully

  • Actual behavior
    It does not shut down, have to kill the java process.

@luanne luanne added the bug label Jun 6, 2020
@martinfurmanski
Copy link
Contributor

@luanne Could you get us a stack trace of it after you have called stop?

@luanne
Copy link
Author

luanne commented Jun 8, 2020

Sure

2020-06-06 10:19:12.124+0000 INFO  ======== Neo4j 4.0.5 ========
2020-06-06 10:19:12.127+0000 INFO  Starting...
2020-06-06 10:19:15.931+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:19:26.012+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:19:36.061+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:19:46.161+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:19:49.948+0000 INFO  Neo4j Server shutdown initiated by request
2020-06-06 10:19:56.186+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:06.245+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:16.280+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:26.345+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:36.351+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:46.357+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:20:56.436+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:06.493+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:16.573+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:26.652+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:36.725+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:46.775+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:21:56.799+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:06.840+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:16.868+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:26.918+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:36.962+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:46.967+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:22:57.024+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:07.112+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:17.194+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:27.244+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:37.265+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:47.318+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:23:57.392+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:24:07.400+0000 INFO  Database 'system' is waiting for a total of 3 core members...
2020-06-06 10:24:16.046+0000 ERROR Clustering components for database 'system' have encountered a critical error Encountered error when attempting to reconcile database system from state 'EnterpriseDatabaseState{databaseId=DatabaseId{00000000[system]}, operatorState=STOPPED, failed=false}' to state 'online'
java.lang.IllegalStateException: Encountered error when attempting to reconcile database system from state 'EnterpriseDatabaseState{databaseId=DatabaseId{00000000[system]}, operatorState=STOPPED, failed=false}' to state 'online'
	at com.neo4j.dbms.DbmsReconciler.reportErrorAndPanicDatabase(DbmsReconciler.java:447)
	at com.neo4j.dbms.DbmsReconciler.handleReconciliationErrors(DbmsReconciler.java:432)
	at com.neo4j.dbms.DbmsReconciler.lambda$postReconcile$15(DbmsReconciler.java:381)
	at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
	at com.neo4j.dbms.DbmsReconciler.postReconcile(DbmsReconciler.java:379)
	at com.neo4j.dbms.DbmsReconciler.lambda$scheduleReconciliationJob$8(DbmsReconciler.java:246)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.neo4j.dbms.api.DatabaseManagementException: Unable to start database `DatabaseId{00000000[system]}`
	at com.neo4j.dbms.database.ClusteredMultiDatabaseManager.startDatabase(ClusteredMultiDatabaseManager.java:71)
	at com.neo4j.dbms.database.ClusteredMultiDatabaseManager.startDatabase(ClusteredMultiDatabaseManager.java:31)
	at com.neo4j.dbms.database.MultiDatabaseManager.forSingleDatabase(MultiDatabaseManager.java:112)
	at com.neo4j.dbms.database.MultiDatabaseManager.startDatabase(MultiDatabaseManager.java:98)
	at com.neo4j.dbms.DbmsReconciler.start(DbmsReconciler.java:549)
	at com.neo4j.dbms.Transitions$TransitionFunction.lambda$prepare$0(Transitions.java:219)
	at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:347)
	at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:348)
	at com.neo4j.dbms.DbmsReconciler.doTransitions(DbmsReconciler.java:330)
	at com.neo4j.dbms.DbmsReconciler.lambda$doTransitions$10(DbmsReconciler.java:320)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	... 3 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.lifecycle.LifecycleAdapter$3@565fe93d' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join or bootstrap a raft group with id RaftId{00000000} and members DatabaseCoreTopology{DatabaseId{00000000} [MemberId{8c86fbbc}]} in time. Please restart the cluster.".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:465)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
	at com.neo4j.causalclustering.common.ClusteredDatabase.start(ClusteredDatabase.java:39)
	at com.neo4j.dbms.database.ClusteredMultiDatabaseManager.startDatabase(ClusteredMultiDatabaseManager.java:67)
	... 13 more
Caused by: java.util.concurrent.TimeoutException: Failed to join or bootstrap a raft group with id RaftId{00000000} and members DatabaseCoreTopology{DatabaseId{00000000} [MemberId{8c86fbbc}]} in time. Please restart the cluster.
	at com.neo4j.causalclustering.identity.RaftBinder$BindingConditions.allowContinue(RaftBinder.java:402)
	at com.neo4j.causalclustering.identity.RaftBinder$BindingConditions.allowContinueBinding(RaftBinder.java:382)
	at com.neo4j.causalclustering.identity.RaftBinder.bindToInitialRaftGroup(RaftBinder.java:206)
	at com.neo4j.causalclustering.identity.RaftBinder.getBoundState(RaftBinder.java:147)
	at com.neo4j.causalclustering.identity.RaftBinder.bindToRaft(RaftBinder.java:139)
	at com.neo4j.causalclustering.core.CoreBootstrap.bindAndStartMessageHandler(CoreBootstrap.java:77)
	at com.neo4j.causalclustering.core.CoreBootstrap.perform(CoreBootstrap.java:62)
	at org.neo4j.kernel.lifecycle.LifecycleAdapter$3.start(LifecycleAdapter.java:86)
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
	... 16 more

Let me know if you want me to send you the debug.log somehow

@umuzammil
Copy link

umuzammil commented Aug 3, 2020

Hello, any updates on this please? Quick note that this is not limited to the first core that joins (or makes a failed attempt to join) a cluster. Once any/all members enter a state of waiting for others to join, shutdown times out (perhaps indefinitely) for any of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants