-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/html: fuzz this package #27848
Comments
This bug has fixed. |
The current implementation seems to be incompleted, will be fixed by conforming latest spec. |
What would you suggest as a fuzzing strategy? I could run domato against this lib and report crashes/hangs if this makes any sense. |
The idea is to use go-fuzz. |
I would happily use go-fuzz but I'm not sure fuzzing an html parser with just random data would cover all interesting paths. It's hard to produce stuff like Maybe we could use both? |
It's not completely random. You can specify an initial corpus data. go-fuzz will take it from there. |
It's not just random data, see: |
Actually I think I already did this in 2015: |
Update: Running gofuzz but didn't find anything so far (except for the already reported bugs) but I will leave it running for a while Ran domato against the patched html library and found 3 crashes with a sample size of 10K files. Is anyone interested in looking into the cause of the crash? (The files are big and messy to inspect, will probably take me some time to go through them). |
There's at least a couple of approaches to addressing this. One is to use a "fuzzing dictionary" and/or "seed corpus", described in https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md Two is to accept arbitrary random bytes as input, and map each byte to a string, a string more likely to tickle interesting code paths in the HTML parser. For example: https://play.golang.org/p/3QE4960bHsa Doing the reverse map from the existing HTML test cases to this "compressed" format is left as an exercise for the reader. Once you have a dense mapping like this, where each raw input byte is relatively independent, it might be relatively straightfoward to minimize the repro case, if go-fuzz doesn't already help you do so: cut out random sub-slices of the "compressed", backing off if it no longer crashes. |
Nice idea, that is probably going to take a longer while. I'll add info when I have news. Thanks for this. |
Just to give a quick update: I gave this a shot a couple of months ago and didn't find any relevant crashes in a couple of weeks of fuzzing. My plan now is to wait and see how support for oss-fuzz and first-class citizenship for fuzzing discussions will unfold. If fuzzing becomes part of the testing flow in Go I'll provide the needed FuzzXyz functions and write the necessary configurations to have it run on some beefy hardware and cover it properly. |
Given a couple of bugs reported by @tr3ee from malformed/incomplete tags
like:
whose reproducers are quite simple and have caused runtime panics or infinite hangs, perhaps fuzzing could help us discover what lurks beyond and even such cases.
/cc @namusyaka @dgryski @dvyukov @bradfitz @nigeltao
The text was updated successfully, but these errors were encountered: