New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test runner fails with stack overflow on -Djava.security.debug=access,failure #11609
Comments
@dweiss could you please provide a reproducible example for this problem? If I do this in a simple project with a test and configure it like this:
Then it works fine. |
You need a larger project I think. Try adding this to Lucene and you'll see
that it won't work -- security debugging logs are emitted recursively from
gradle agent code itself and then everything spins out of control. The best
way to reproduce would be to modify this in Lucene and try to run tests (-p
lucene/core for example).
I am sorry to be brief here -- I am away from humanity at large this
weekend.
Dawid
…On Fri, Mar 27, 2020 at 5:15 PM Jendrik Johannes ***@***.***> wrote:
@dweiss <https://github.com/dweiss> could you please provide a
reproducible example for this problem.
If I do this in a simple project with a test and configure it like this:
test {
systemProperty('java.security.debug', 'access,failure')
}
Then it works fine.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11609 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABQWLTYJGS7D2VFVB2F2EDRJTGLZANCNFSM4JWZNLVQ>
.
|
Get Lucene code, master branch:
I'm at commit 5c2011a6fb7d3c833. Then try this - it only runs a single test class and will work:
Now try to enable security logging:
and it all crashes fairly quickly. The exact exception will vary; I received this one on the first call: Exception: java.lang.StackOverflowError thrown from the UncaughtExceptionHandler in thread "main"
|
Let me know if you need anything else, @jjohannes . Another issue we would quite love to see in gradle is better load balancing in the test runner. This has been discussed in other issues (#2669). Thank you for your willingness to help out! If you need any more info please use Lucene dev mailing list or e-mail me directly. |
The team currently does not have the bandwidth even though there is a reproducer. Initial investigations did not show anything obvious. If this is blocking and you get the bandwidth to pinpoint the issue more precisely, this could be reconsidered. |
bq. Initial investigations did not show anything obvious. What's happening is pretty obvious - it's a recursive loop caused by security manager/output stream capture. The fix isn't obvious -- I agree here. The only solution that would work is to redirect sub-process streams to an external file(s) and tail them from the parent process (to detect changes/ appended output). This is a blocker in the sense that there is no way around it. The fraction of people affected is teeny-tiny though (expert JVM users). |
I've created a simple reproduction of this issue: https://github.com/rjernst/gradle-test/tree/d1738c117131170d8fc3bf3d4ec91560e885f091 Also of note is @mark-vieira did a diagnosis of this issue a while ago (elastic/elasticsearch#41213 (comment)). It seems to be related to gradle logging recursively triggering security checks for getting the line ending system property. |
Hi @rjernst ! I've been there when working on randomized testing test runner. I don't think you can reasonably intercept the sysout from a subprocess running with all security logging enabled if you mess with those standard streams on Java side... There was always something that caused a problem - infinite recursion with security manager calls is one thing, local stack-overflows is another (when the test's thread is near the edge of the stack and tries to sysout a message somewhere, then the infrastructure that substitutes System.out/System.err actually does go overboard and throws stack overflow exception). The least intrusive way (and most robust way) of capturing all of subprocess output was to redirect output streams to a file (upon creation) and then tail that file externally from the runner. Then the subprocess is free to do whatever it wants, all of the output is captured (including internal JVM logs) and no code is needed on the forked code side to intercept or redelegate the output. |
I ran Ryan's repro and it worked for me with gradle master, thank you. |
Correction - this is still an issue. While Ryan's repro works, a larger test fails. For example, see my Lucene repro:
This almost works... until it breaks:
|
@dweiss damn it. I will look into this. This is weird, I mean its weird that it seems to work for the first outputs but then starts failing again... |
@breskeby it's not an easy one to solve - I've been there. My experience is that if you do anything as much as touch System.out or System.err to try to wrap/redirect them, it's almost impossible to avoid infinite recursion somewhere - security audits are all over the place. |
@dweiss the latest 7.1 snapshot (gradle master snapshot) is incompatible with the current lucene master. can you tell me which gradle version your snapshot was that you tested this with? The fix will only be shipped with gradle 7.0. |
I think I had a dim moment there and tried with Lucene's wrapper (6.6.1)... No wonder it didn't work, eh? Let me retry with 7.0 once it's out - apologies for the noise. |
Hello @breskeby . I'm sorry for the delay - took a while... Lucene is now on Gradle 7.2 and I can still reproduce the same problem as before. The repro remains the same:
This starts up pumping JVM output but then crashes quickly. You can see the infinite loop it runs into. Like I said in the issue, I don't think there is a reasonable way to fix that from the Java level, other than tailing an independent file stream from the subprocess. The security manager is scheduled for removal, so perhaps it's no longer as much of a pressing issue as it was before? Don't know.
|
@big-guy I think the above comment means the issue can be reopened? I don't think it makes sense to file a new one? |
When you're working with security policies it's often the case that there is a need to debug the policy file by passing:
java.security.debug=access,failure
This emits tons of information to the console. Gradle runner stumbles on this one though. I suspect the problem is that console is redirected somehow (didn't check the code) and the code that emits messages recursively writes more messages (security-stack related).
Now... I know solving this is hard because I implemented a test runner for Lucene (randomizedtesting) and the only solution to dump all kinds of forked JVM console information (including security stuff or the logs potentially dumped by the JVM bootstrap even before classes are initialized) was to redirect the forked process output entirely to a separate temporary file and have a separate communication channel (socket-based or otherwise) for passing messages to the master process. The master tails the output if it needs to display or capture it anyhow. The test runner class can capture the length of the output file so that boundaries of output emitted for each test can be properly captured.
The benefit of doing this is also that writing to the console is asynchronous from anything else and doesn't rely on it.
The text was updated successfully, but these errors were encountered: