FATAL errors when accessing Curator objects before they are fully initialized. #41

akiani · 2012-03-07T02:14:23Z

Hi JZ,
In my tests, I often run into these errors:

INFO : org.apache.zookeeper.server.PrepRequestProcessor.run - PrepRequestProcessor exited loop!
FATAL: org.apache.zookeeper.server.SyncRequestProcessor.run - Severe unrecoverable error, exiting
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:243)
at org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:214)
at org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:237)
at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:215)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:315)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:468)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:107)

Which block the execution of my tests. I seem to be able to get around them by adding waits after I create the Curator objects that I'm creating but I was wondering if there is a better way of handling this.

I'm using Curator's test client as well.

Many thanks,
Amir

Randgalt · 2012-03-07T03:26:39Z

I'd need to see a sample. I don't get these in my own tests.

Randgalt · 2012-03-07T20:40:08Z

Actually - I've started seeing them now myself. I'll see what I can do.

akiani · 2012-03-07T20:56:23Z

Great :) maybe it's something in the new ZK?

ntolia · 2012-03-08T08:51:25Z

@akiani Thank you so much for filing this bug and, FWIW, I run into this very frequently too. I have been playing with Java7 on Mac OS X for a few days and my local tests kept on failing without anything obvious in my logs. For some reason, this bug doesn't trigger with Java 6. Now I finally know why: SyncRequestProcessor calls System.exit().

akiani · 2012-03-08T18:45:57Z

@ntolia That's exactly the configuration that I'm using as well. Sorry I should have pointed that out...

akiani · 2012-03-08T18:47:08Z

Actually, let me correct that, I do have Java 1.6 on my Lion:

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-383-11A511)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-383, mixed mode)

ntolia · 2012-03-13T17:05:40Z

@Randgalt Is there any update on this bug? If you tell me where to go look, I would be happy to submit a patch.

Randgalt · 2012-03-13T17:08:11Z

I haven't had a chance. However, my suspicion is that the problem is somewhere in the shutdown code of TestingServer and/or TestingCluster. Those classes are pretty hacked up.

Randgalt · 2012-03-14T00:18:58Z

I just pushed a change to TestingServer and TestingCluster that should make this issue better. (see issue 46). Please re-try with these changes.

ntolia · 2012-03-14T00:40:59Z

Unfortunately, I still get the same failure as earlier with 1.1.5-SNAPSHOT. However, if I disable tests that use TestingServer, things work fine.

akiani · 2012-03-14T00:42:49Z

Thanks Jordan, but I wasn't even using TestingCluster :D I was using TestingServer. I'm very surprised that fixing TestingCluster fixed my test for you... It's still failing for me with the same error.

akiani · 2012-03-14T00:44:11Z

Seems like you forgot to git add TestingServer?

Randgalt · 2012-03-14T04:52:01Z

Unfortunately, I still get the same failure as earlier with 1.1.5-SNAPSHOT. However, if I disable tests that use TestingServer, things work fine.

I didn't build new JARs. You'd have to take it from source.

Randgalt · 2012-03-14T04:52:35Z

It's here: c50087d

ntolia · 2012-03-14T04:55:38Z

I didn't build new JARs. You'd have to take it from source.

I should have been clearer. I cloned master, built, installed (it showed up as 1.1.5-SNAPSHOT locally), and then tested. Things still failed.

Randgalt · 2012-03-14T05:02:37Z

OK - can you put together a sample that fails? Or - are you referring to some of my tests? I realize a few tests are failing.

ntolia · 2012-03-14T07:05:07Z

Trying to create a simple test but I need to remove a bunch of internal code to get at the simplest possible repro. This is proving to be harder than expected but I should hopefully be able to have something soon. In the meantime, with master, I do get this stacktrace via a debugger but then again, that is nothing new.

Breakpoint hit: "thread=SyncThread:0", java.lang.System.exit(), line=960 bci=0

SyncThread:0[1] where
  [1] java.lang.System.exit (System.java:960)
  [2] org.apache.zookeeper.server.SyncRequestProcessor.run (SyncRequestProcessor.java:153)

However, one of the other threads was captured in this state:

  .... (whole bunch of logback stack traces)
  [23] org.apache.zookeeper.server.PrepRequestProcessor.shutdown (PrepRequestProcessor.java:733)
  [24] org.apache.zookeeper.server.ZooKeeperServer.shutdown (ZooKeeperServer.java:439)
  [25] com.netflix.curator.test.TestingServer.stop (TestingServer.java:152)
  [26] com.netflix.curator.test.TestingServer.close (TestingServer.java:170)

Stack traces for all threads are available if it would be helpful.

Also, I don't know if it makes a difference but sometimes, when I call CuratorFramework.close() on a client that had been connected to a TestServer that has been close()d, I sometimes get a stack trace along the lines of:

Error while calling watcher
java.lang.IllegalStateException: null
  at com.google.common.base.Preconditions.checkState(Preconditions.java:129) ~[guava-11.0.2.jar:na]
  at com.netflix.curator.framework.state.ConnectionStateManager.addStateChange(ConnectionStateManager.java:130) ~[curator-framework-1.1.5-SNAPSHOT.jar:na]
   ...

ntolia · 2012-03-26T22:59:55Z

Just wanted to provide an update on this bug report. I still get the failure but I am having trouble creating a standalone test. My test does the same thing our internal code does but things work in the standalone unit test. Points to timing issues but I have nothing further than that right now. I am going to keep on digging though.

I also wonder if this is related to #47.

Randgalt · 2012-04-04T18:24:57Z

I just pushed a total rewrite of TestingServer and TestingCluster based on work by Jérémie BORDIER (ahfeel). Let me know if it behaves any better.

ntolia · 2012-04-04T18:46:56Z

This is a good news/bad news story.

Good news: I no longer get a System.exit(). I had retested a couple of days ago with your zkDb.commit() patch and I was still hitting the System.exit() failure case.

Bad news: My TestingServer-based tests now hang but I need to figure out if that is just because of some thread that isn't cleaning up after itself, an internal problem that could be my fault, or something else with the TestingServer code (TestingCluster is not used in this particular suite of tests). I am going to spend some time this week looking into the hang and will let you know if I can narrow the problem down.

Randgalt · 2012-04-04T18:52:27Z

:(

ntolia · 2012-04-04T19:01:50Z

I am not sure if there are timing issues there but, on a second mvn test run, TestingServer-based tests didn't hang but a few did fail (that do not fail with Java6 + Curator 1.1.5). Will keep on digging into this but am swamped with a couple of other things.

Randgalt · 2012-04-04T23:35:42Z

OK - I just pushed a new version of TestingCluster that resurrects so really ugly bytecode manipulation that I didn't think was still needed. Deep inside of ZooKeeper you can get an Assertion that screws things up. With this change, my tests all run fine now - even with Gradle.

Randgalt · 2012-04-13T02:49:35Z

Has anyone tried with the latest JARs?

Randgalt · 2012-04-16T20:15:07Z

As I haven't heard back on this I am closing it.

ntolia · 2012-04-16T23:39:15Z

FWIW, it didn't work with master as of a week ago and doesn't work with Curator 1.1.7. Same symptoms as earlier with hung tests. I will request a reopen when I get a chance to provide more information or an easy repro.

Randgalt · 2012-04-16T23:41:10Z

Sorry :( Thanks for your patience on this.

Randgalt closed this as completed Apr 16, 2012

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL errors when accessing Curator objects before they are fully initialized. #41

FATAL errors when accessing Curator objects before they are fully initialized. #41

akiani commented Mar 7, 2012

Randgalt commented Mar 7, 2012

Randgalt commented Mar 7, 2012

akiani commented Mar 7, 2012

ntolia commented Mar 8, 2012

akiani commented Mar 8, 2012

akiani commented Mar 8, 2012

ntolia commented Mar 13, 2012

Randgalt commented Mar 13, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

akiani commented Mar 14, 2012

akiani commented Mar 14, 2012

Randgalt commented Mar 14, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

ntolia commented Mar 26, 2012

Randgalt commented Apr 4, 2012

ntolia commented Apr 4, 2012

Randgalt commented Apr 4, 2012

ntolia commented Apr 4, 2012

Randgalt commented Apr 4, 2012

Randgalt commented Apr 13, 2012

Randgalt commented Apr 16, 2012

ntolia commented Apr 16, 2012

Randgalt commented Apr 16, 2012

FATAL errors when accessing Curator objects before they are fully initialized. #41

FATAL errors when accessing Curator objects before they are fully initialized. #41

Comments

akiani commented Mar 7, 2012

Randgalt commented Mar 7, 2012

Randgalt commented Mar 7, 2012

akiani commented Mar 7, 2012

ntolia commented Mar 8, 2012

akiani commented Mar 8, 2012

akiani commented Mar 8, 2012

ntolia commented Mar 13, 2012

Randgalt commented Mar 13, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

akiani commented Mar 14, 2012

akiani commented Mar 14, 2012

Randgalt commented Mar 14, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

Randgalt commented Mar 14, 2012

ntolia commented Mar 14, 2012

ntolia commented Mar 26, 2012

Randgalt commented Apr 4, 2012

ntolia commented Apr 4, 2012

Randgalt commented Apr 4, 2012

ntolia commented Apr 4, 2012

Randgalt commented Apr 4, 2012

Randgalt commented Apr 13, 2012

Randgalt commented Apr 16, 2012

ntolia commented Apr 16, 2012

Randgalt commented Apr 16, 2012