Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: timeout -t #26

Closed
floyd-fuh opened this issue May 8, 2018 · 31 comments
Closed

Feature request: timeout -t #26

floyd-fuh opened this issue May 8, 2018 · 31 comments
Assignees
Milestone

Comments

@floyd-fuh
Copy link
Contributor

Would it be possible to get the -t switch to specify a timeout for a testcase? Fuzzing is only efficient when the inputs do not take too long. I'm currently trying to do another run with Tika, but alredy the fourth input dry run is just stuck for several minutes...

@floyd-fuh
Copy link
Contributor Author

floyd-fuh commented May 8, 2018

As you said in another issue:

JQF does not handle hangs at all at this point, which is a limitation (mainly because we are interested in unit tests and not long-running programs). I'm trying to find good ways to kill running threads preemptively without breaking too much (Thread.stop in the JDK is deprecated, and Future.cancel or Thread.interrupt leaks resources because the threads aren't actually killed). If you have any suggestions here, I would love to hear them!

My first suggestion is to try implementing Thread.interrupt as suggested in the docs, for some cases that might help. I would implement this as a first attempt to stop the thread.

My second suggestion is maybe a little uncommon, but I would say let's just ignore the deprecation and call stop(). While not very nice, this is fuzzing-land where we simply need this kind of control, because AFL fuzzing will always find a way to hang Java. As the same doc link suggest, it's probably best to catch ThreadDeath, even if that's not optimal.

There are also suggestions to use the new JDK 9 spinWait(), but I'm not sure that method would help.

A different approach would be using ExecutorService. This is also how Kelinci does it or also explained on stackoverflow.

rohanpadhye added a commit that referenced this issue May 8, 2018
@rohanpadhye
Copy link
Owner

I've added some limited support for the -t option when using jqf-afl-fuzz. The only snag is that some timeouts may get misclassfied as "crashes" instead of "hangs". This is because I'm trying to stick to using stock AFL instead of having a custom AFL for JQF, which means I have to fool AFL into believing that it has killed the target program on timeout by mocking signals. In the worst case, the signal will appear before AFL's timer goes off and it will therefore be classified as a crash instead of a hang.

Note: You will have to re-run the make command in addition to mvn package (or simply setup.sh which does both) because the AFL proxy has also been modified.

The default value for timeout is 10000 (milliseconds) to prevent spurious timeouts, but I've tried fuzzing Tika with 5000 on my machine and it was fine.

The implementation does not use either Thread.stop() nor Thread.interrupt(). Instead, since we are instrumenting program branches for fuzzing, we can simply monitor how much time has elapsed after every few thousand branches (without incurring too much overhead), and then throw a run-time exception if the elapsed time is more than the set timeout. This idea was suggested by Koushik Sen.

This trick means that JQF can detect hangs due to infinite loops fairly efficiently, but not hangs due to deadlocks where no branches are being executed. For now, I will say this is sufficient as most fuzzing targets are not I/O intensive. In the future, we can possible interrupt deadlocks using Thread.interrupt().

rohanpadhye added a commit that referenced this issue May 9, 2018
@rohanpadhye
Copy link
Owner

This is still not working as expected. Although the first hang gets detected, throwing a TimeoutException from within an instrumentation handler seems to corrupt the instrumentation tracking itself and subsequent hangs do not get detected.

I'm still working on it... the issue remains open.

rohanpadhye added a commit that referenced this issue May 20, 2018
This change allows guidances to throw RuntimeException(s) such
as TimeoutException without breaking the instrumentation and
tracing logic.

Addresses #26.
@floyd-fuh
Copy link
Contributor Author

I just removed the -t argument for one of my runs, and while it seems to be running fine it says in the process list /usr/local/bin/afl-fuzz -d -i ./tika-corpus-only-small -o tika-out2 -T edu.berkeley.cs.jqf.examples.tika.TikaParserTest#fuzz -m 2000 -t 50 /opt/jqf/bin/jqf-afl-target -v edu.berkeley.cs.jqf.examples.tika.TikaParserTest fuzz @@. Is that correct that it uses a timeout of 50ms now? Or is that something I shouldn't be concerned about?

@rohanpadhye
Copy link
Owner

@floyd-fuh Don't worry about that. We always give AFL a 50ms timeout regardless of what the actual timeout is. This is so that AFL sets its timeout flag for every single input. In practice, AFL only marks a run as a "hang" if the timeout flag is set AND the return value of waitpid() on the forked fuzz target indicates SIGKILL. In JQF, we always let AFL's timeout flag fire (in 50ms) and handle the real timing measurements in Java. When a TimeoutException is thrown, we lie to AFL and tell it that the process ended with SIGKILL.

@floyd-fuh
Copy link
Contributor Author

Ah, perfect, thanks for the pointers! I think I just thought it wouldn't work correctly, but everything is fine.

Btw. I ran into new problems with the timeouts. JQF classified the Java bug found in #27 as a crash. As you described above not what you intended, but perfectly fine in my use case. So a miss-classification as crash or hang is fine for me. However, finding a similar bug in a Java library JQF actually got stuck. I think the reason is that the endless loop is in a part of the code that was not instrumented, so JQF won't be able to measure time and throw a run-time exception after the elapsed time.

So at the moment I have to recheck on my fuzzers from time to time. Today they were running for 5 days, but got stuck after the first 7 hours. I wasn't too unhappy because that meant a new finding, but it's obviously not optimal having to check on them and restart fuzzers often.

So I guess a Thread.interrupt solution would be really helpful.

And sorry for asking for so many features, I kind of feel bad, but JQF is really improving a lot and I clearly prefer it over Kelinci and java-afl at the moment. Keep up the good work 👍

rohanpadhye added a commit that referenced this issue Jun 26, 2018
Timouts during call events create some problems due to
the bubbling of the TimeoutException up the stack in
the presence of instrumentation that expects all
method throws to be caught and reported, so as to maintain
a valid shadow stack.

We also throw the TimeoutException directly instead of
setting a flag and re-throwing every time, to prevent
exceptions from being thrown in the exception-handling
instructions and events generated from finally-blocks and
so on.

Partially addresses #26.
@rohanpadhye
Copy link
Owner

rohanpadhye commented Sep 18, 2018

Timeouts are working fine in AFL-mode for now. Please re-open if issues persist.

@floyd-fuh
Copy link
Contributor Author

I updated the version of JQF I'm using to the latest github version and started fuzzing Apache Tika again. However, as you said AFL timeout should work fine now, I ran JQF with lower timeouts, such as 1000 or even 300. It runs fine and also the execs/s seem to show that it is going faster. However, stock AFL would usually put a lot of files in the "hangs" directory and would very quickly find hangs. However, JQF never puts any file in the "hangs" directory for me now, which is very weird with these timeout settings.

So when I run with -t 300 I expect every file that takes more than 300ms of processing to be put in the hangs directory. This doesn't seem to be the case. So my guess is at the moment JQF is not able to find endless loops as it doesn't fill the hangs directory. Can you confirm this?

Can you elaborate a little on how the timeout setting is applied?

@rohanpadhye rohanpadhye reopened this Oct 2, 2018
@rohanpadhye
Copy link
Owner

rohanpadhye commented Oct 2, 2018

Hi Floyd,

I think this could be because of Tika's multi-threaded parsers. The last commit changed the behavior to throw TimeoutException(s) only for the first branch-event after the time limit instead of for every event after the time limit, since in some other benchmarks we were seeing timeouts being continuously re-thrown from finally{} blocks that surrounded the (possibly infinite) loops. As much as possible, we would like to execute the finally blocks even if the enclosed loop was infinite.

However, this means that if the timeout occurs in not-the-main-thread, it may not get reported. I think I can attempt a fix that reports a timeout in the first branch-event per-thread after the time limit is reached.

rohanpadhye added a commit that referenced this issue Oct 2, 2018
If a timeout occurrs in a thread other than the thread that has
invoked the test method, then the TimeoutException does not
propagate to the FuzzStatement and can remain uncaught.

Therefore, this commit adds a flag that is set once per test
run whenever a timeout is detected in any thread. If the flag
is set at the end of the run, then the result is forcibly
changed to TIMEOUT.

This commit also bundles a test for verifying that the result is
a TIMEOUT when the exception is actually caught.

Addresses #26.
@rohanpadhye
Copy link
Owner

Pushed an attempted fix. Please let me know if this changes anything. If you still do not see any timeouts (or misclassified crashes), I may have to debug specifically with Tika.

@floyd-fuh
Copy link
Contributor Author

Perfect! Looks much better now... At least I get hangs now, obviously I cannot be sure if it is really detecting all of them :)

Thanks a lot for the fix

@Ahmedfir
Copy link
Contributor

Ahmedfir commented Sep 23, 2019

Hi Rohan,
Is there any similar solution as the described here for AFL (-t) in zest fuzzing?

I am currently using fuzzing with Zest driver and after some iterations it hangs for long time.
Adding a timeout value to the test method i.e.:
@Fuzz @Test(timeout = 1000) public void testNoFilter(...)
will cause an exception inside of JUnitQuickcheck:
Method testNoFilter should have no parameters at com.pholser.junit.quickcheck.runner.JUnitQuickcheck.validateTestMethods(JUnitQuickcheck.java:82).

Many thanks in advance for your time and support.

@Ahmedfir
Copy link
Contributor

Hi Rohan,

Just to keep you updated I have tried to define the timeout with a global Rule and it seems to not hang anymore i.e.
```
@rule
public Timeout globalTimeout = Timeout.millis(500);


But I have then noticed that the total and valid coverage are always at `0 branches (0% of map)` so I guess this timeout rule is breaking the coverage calculation... so it's not a solution or a complete solution.

@rohanpadhye
Copy link
Owner

Hi @Ahmedfir,

In Zest, you can set timeout for individual runs via the system property jqf.ei.TIMEOUT. Perhaps I should add a -t option to the jqf-zest launcher as well to make this easier. It used to be experimental but I believe that it should work just as well as in AFL.

@rohanpadhye
Copy link
Owner

Reopening as jqf-zest needs a -t option.

@rohanpadhye rohanpadhye reopened this Sep 23, 2019
@Ahmedfir
Copy link
Contributor

thank you for the quick answer. It's a nice idea, I think. I didn't use the CLI before. I have only run zest with the maven plugin so-far but I will try CLI with the jqf.ei.TIMEOUT and keep you updated. Thanks again.

@rohanpadhye
Copy link
Owner

rohanpadhye commented Sep 23, 2019 via email

@Ahmedfir
Copy link
Contributor

Hi @rohanpadhye,
I have tried the maven parameter and it doesn't seem to work for me i.e.
-Djqf.ei.timeout=500

@rohanpadhye
Copy link
Owner

@Ahmedfir Do you know what is causing the hangs? In both the AFL and Zest guidances, we check the timeout during each code coverage event. This means that we can detect timeouts due to potentially infinite loops in instrumented app classes. If the app is timing out because it is waiting for I/O or if it is looping in an uninstrumented class, then JQF cannot detect it. In Java, it is not safe to stop a thread while it is waiting on I/O or compute. Since all of this is essentially in-process fuzzing, we cannot send SIGKILL either.

Can you please confirm the source of your app hangs? Maybe try repro with the last generated input (I believe Zest stores the last input in a file called .cur_input in the results directory).

@Ahmedfir
Copy link
Contributor

Hi @rohanpadhye, indeed it is timing out because of a thread running further inside of the test. thank you anyway, I will try to avoid such uses in the future. Cheers

@rohanpadhye rohanpadhye self-assigned this Sep 26, 2019
@rohanpadhye rohanpadhye added this to the 1.4 milestone Sep 26, 2019
rohanpadhye added a commit that referenced this issue Sep 26, 2019
rohanpadhye added a commit that referenced this issue Sep 26, 2019
@rohanpadhye
Copy link
Owner

I've added options for both jqf-zest and the maven plugin. For some benchmarks, very low timeouts are still resulting in unexpected exceptions in the instrumentation, which still needs looking into. Let me know if you also encounter such issues.

@Ahmedfir
Copy link
Contributor

Ahmedfir commented Sep 26, 2019

thank you very much! I am simply trying to not using any extra thread in the test execution.
Nevertheless, I am currently having troubles with the tracing and the coverage calculation (this has nothing to do with the current issue; timeout...).
Basically, I am always getting 0 branches (0.00% of map)
and I suspect it's caused by a static field used (and modified) in both the test and the generator. But I don't think it is the real cause as I had it working before with static fields.
Or may be because I have a very long executing @BeforeClass method. So the ThreadTracer (the instrumentation for the coverage calculation) traces the @BeforeClass instead of the test method... But I don't think it's really the case because it's getting the test method as entry point.
Did you get such issues before?

@rohanpadhye
Copy link
Owner

Perhaps the test is not running in the main thread? Zest currently only traces the main thread.

@Ahmedfir
Copy link
Contributor

It is in the main thread. but now I am doubting about something else. The test is executing code in a jar file that might be invisible to the tracer... hmm I will dive further in this direction and let you know. :) thank you as always for your quick answers!

@rohanpadhye
Copy link
Owner

More undocumented features that may be useful: try setting -Djanala.verbose=true before fuzzing to get a log of all classes that get instrumented (or not). Maybe this needs to be made default when -v option is given.

@Ahmedfir
Copy link
Contributor

Thanks, that' great to know!
I got the tracing and the code coverage to work when downgrading my maven configuration target and source to java 9 instead of 12. I don't know exactly what was the issue with java 12... (it can be an issue related to my local setup :p )
PS: I got the coverage working with jdk 12 for simpler examples.
PS: May be JQF doesn't handle well jdk 12 and I've missed this point in the documentation. Else May be it would be nice to mention that in the documentation...
Please don't hesitate to contact me if you want to investigate this mysterious issue further. :)

@floyd-fuh
Copy link
Contributor Author

I was just wondering why Zest has no "hangs" directory in its output directory? Or is it just by default not using any timeout configuration?

I think for Java security analysis it is as important to get all the hangs as the failures, as infinite loops might be really bad.

@rohanpadhye
Copy link
Owner

rohanpadhye commented Oct 3, 2019 via email

@floyd-fuh
Copy link
Contributor Author

Ah, no, that's perfectly fine then. I was just wondering. I just haven't seen any being stored in the failures directory.

@Ahmedfir
Copy link
Contributor

so-far I am not seeing any timeout scenario happening.
I have also tried changing (locally) the timeout implementation by passing the timeout directly to the org.junit.runners.BlockJUnit4ClassRunner#methodBlock in the edu.berkeley.cs.jqf.fuzz.junit.TrialRunner by calling the org.junit.runners.BlockJUnit4ClassRunner#withPotentialTimeout method. This would normally reproduce the same behaviour as adding a timeout annotation to a normal junit test method.
Even this solution didn't lead the tests to timeout (knowing that the tests are in the same thread...).
So I think it would be fair to close this ticket with the status "won't fix".
Thank you again for your time. :)

@rohanpadhye
Copy link
Owner

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants