-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATAL errors when accessing Curator objects before they are fully initialized. #41
Comments
I'd need to see a sample. I don't get these in my own tests. |
Actually - I've started seeing them now myself. I'll see what I can do. |
Great :) maybe it's something in the new ZK? |
@akiani Thank you so much for filing this bug and, FWIW, I run into this very frequently too. I have been playing with Java7 on Mac OS X for a few days and my local tests kept on failing without anything obvious in my logs. For some reason, this bug doesn't trigger with Java 6. Now I finally know why: SyncRequestProcessor calls System.exit(). |
@ntolia That's exactly the configuration that I'm using as well. Sorry I should have pointed that out... |
Actually, let me correct that, I do have Java 1.6 on my Lion: $ java -version |
@Randgalt Is there any update on this bug? If you tell me where to go look, I would be happy to submit a patch. |
I haven't had a chance. However, my suspicion is that the problem is somewhere in the shutdown code of TestingServer and/or TestingCluster. Those classes are pretty hacked up. |
I just pushed a change to TestingServer and TestingCluster that should make this issue better. (see issue 46). Please re-try with these changes. |
Unfortunately, I still get the same failure as earlier with 1.1.5-SNAPSHOT. However, if I disable tests that use TestingServer, things work fine. |
Thanks Jordan, but I wasn't even using TestingCluster :D I was using TestingServer. I'm very surprised that fixing TestingCluster fixed my test for you... It's still failing for me with the same error. |
Unfortunately, I still get the same failure as earlier with 1.1.5-SNAPSHOT. However, if I disable tests that use TestingServer, things work fine. I didn't build new JARs. You'd have to take it from source. |
It's here: c50087d |
I should have been clearer. I cloned master, built, installed (it showed up as 1.1.5-SNAPSHOT locally), and then tested. Things still failed. |
OK - can you put together a sample that fails? Or - are you referring to some of my tests? I realize a few tests are failing. |
Trying to create a simple test but I need to remove a bunch of internal code to get at the simplest possible repro. This is proving to be harder than expected but I should hopefully be able to have something soon. In the meantime, with master, I do get this stacktrace via a debugger but then again, that is nothing new.
However, one of the other threads was captured in this state:
Stack traces for all threads are available if it would be helpful. Also, I don't know if it makes a difference but sometimes, when I call CuratorFramework.close() on a client that had been connected to a TestServer that has been close()d, I sometimes get a stack trace along the lines of:
|
Just wanted to provide an update on this bug report. I still get the failure but I am having trouble creating a standalone test. My test does the same thing our internal code does but things work in the standalone unit test. Points to timing issues but I have nothing further than that right now. I am going to keep on digging though. I also wonder if this is related to #47. |
I just pushed a total rewrite of TestingServer and TestingCluster based on work by Jérémie BORDIER (ahfeel). Let me know if it behaves any better. |
This is a good news/bad news story. Good news: I no longer get a System.exit(). I had retested a couple of days ago with your zkDb.commit() patch and I was still hitting the System.exit() failure case. Bad news: My TestingServer-based tests now hang but I need to figure out if that is just because of some thread that isn't cleaning up after itself, an internal problem that could be my fault, or something else with the TestingServer code (TestingCluster is not used in this particular suite of tests). I am going to spend some time this week looking into the hang and will let you know if I can narrow the problem down. |
:( |
I am not sure if there are timing issues there but, on a second mvn test run, TestingServer-based tests didn't hang but a few did fail (that do not fail with Java6 + Curator 1.1.5). Will keep on digging into this but am swamped with a couple of other things. |
OK - I just pushed a new version of TestingCluster that resurrects so really ugly bytecode manipulation that I didn't think was still needed. Deep inside of ZooKeeper you can get an Assertion that screws things up. With this change, my tests all run fine now - even with Gradle. |
Has anyone tried with the latest JARs? |
As I haven't heard back on this I am closing it. |
FWIW, it didn't work with master as of a week ago and doesn't work with Curator 1.1.7. Same symptoms as earlier with hung tests. I will request a reopen when I get a chance to provide more information or an easy repro. |
Sorry :( Thanks for your patience on this. |
Hi JZ,
In my tests, I often run into these errors:
INFO : org.apache.zookeeper.server.PrepRequestProcessor.run - PrepRequestProcessor exited loop!
FATAL: org.apache.zookeeper.server.SyncRequestProcessor.run - Severe unrecoverable error, exiting
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:243)
at org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:214)
at org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:237)
at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:215)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:315)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:468)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:107)
Which block the execution of my tests. I seem to be able to get around them by adding waits after I create the Curator objects that I'm creating but I was wondering if there is a better way of handling this.
I'm using Curator's test client as well.
Many thanks,
Amir
The text was updated successfully, but these errors were encountered: