Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential fuzzing of pulldown-cmark and commonmark.js #663

Merged
merged 2 commits into from Jun 20, 2023

Conversation

mgeisler
Copy link
Collaborator

@mgeisler mgeisler commented Jun 8, 2023

This adds a new fuzzer which will run random Markdown texts through both pulldown-cmark and commonmark.js. The output from commonmark.js is turned into Events and the two are compared.

The bundled commonmark.js and commonmark.min.js files were generated by running

npm run build

at revision 20b52e5 of https://github.com/commonmark/commonmark.js/. The output differs slightly from what we see on https://spec.commonmark.org/dingus/ so I’m not sure what version is deployed there.

@mgeisler mgeisler marked this pull request as draft June 8, 2023 16:13
@mgeisler
Copy link
Collaborator Author

mgeisler commented Jun 8, 2023

Hey there! This is the fuzzer I've been using to find

I don't think it makes sense to merge this as-is since it keeps finding more corner cases. Instead of me opening 10+ issues, I hope the fuzzer can be a useful tool for the maintainers of this crate.

I run it with this command:

cargo fuzz run commonmark_js -- -only_ascii=1

@Martin1887
Copy link
Collaborator

The dingus deployed version is the latest release (0.30), and so it should be the version used here.

In addition, commonmark.js code should be inside a new directoy inside third_party directory including its license. And, if I'm not wrong, only the minified version is used, right? The another one should be deleted then.

Thanks for your contribution!

@mgeisler
Copy link
Collaborator Author

The dingus deployed version is the latest release (0.30), and so it should be the version used here.

Good point, let me change the JavaScript to a released version to have a stable reference.

I'll also move it to the third_party directory.

@Martin1887
Copy link
Collaborator

Any news here? I also think that creating issues reported by the fuzzer is not bad, though a new tag fuzzed_finding would help to identify them, since usually they are less-priority bugs.

@mgeisler
Copy link
Collaborator Author

Any news here? I also think that creating issues reported by the fuzzer is not bad, though a new tag fuzzed_finding would help to identify them, since usually they are less-priority bugs.

No news, I didn't do the cleanups yet. Let me take a look and do that now.

@mgeisler mgeisler marked this pull request as ready for review June 17, 2023 17:41
@mgeisler
Copy link
Collaborator Author

Hi @Martin1887, I moved things around and cleaned up the PR a little. I found that I could use https://lib.rs/rquickjs to run the JavaScript and only pay the setup cost once per fuzzer run. This brings a ~5x speedup on my machine compared to before (from about 30 exec/s to around 150 exec/s).

When I run the fuzzer, it normally finds a parsing difference after a few thousand attempts.

also think that creating issues reported by the fuzzer is not bad, though a new tag fuzzed_finding would help to identify them, since usually they are less-priority bugs.

I'm glad you like the issues, I'll keep reporting them then 😄 A tag sounds good so that one can look for common patterns.

@mgeisler mgeisler changed the title Fuzz pulldown-cmark against commonmark.js Differential fuzzing of pulldown-cmark and commonmark.js Jun 17, 2023
@Martin1887
Copy link
Collaborator

Great! I will review more carefully the changes and merge the pull request.

The first finding I think can be fixed is the big number of new dependencies: they should be put inside dev-dependencies to avoid compiling them in the final binary and also avoiding the increase of vulnerabilities surface.

Thanks!

This adds a new fuzzer which will run random Markdown texts through
both pulldown-cmark and the commonmark.js reference implementation.
The commonmark.js output is turned into Markdown events and the two
event streams are compared.

The bundled `commonmark.min.js` file is from

    https://unpkg.com/commonmark@0.30.0/dist/commonmark.min.js

This is the same file you get when you `npm install commonmark`.
@mgeisler
Copy link
Collaborator Author

Great! I will review more carefully the changes and merge the pull request.

Thanks for reviewing it!

The first finding I think can be fixed is the big number of new dependencies: they should be put inside dev-dependencies to avoid compiling them in the final binary and also avoiding the increase of vulnerabilities surface.

The dependencies are only for the fuzzers, so in a sense, they're already dev-dependencies 😄

The anyhow and pretty_assertions dependencies could be removed. I only added pretty_assertions since it shows a nice diff when the fuzzer finds a mismatch. It looks like this in my terminal:

image

@Martin1887
Copy link
Collaborator

My bad, in the GitHub's file diff I didn't notice that it was a different Cargo.toml file.

I have reviewed all changes and they seem right, I only have updated quick-xml to the latest version, released the past week.

However, in ARM the fuzzer fails with the following error in rquickjs-sys:
image

I will try to test it tomorrow in x86.

Thanks!

@mgeisler
Copy link
Collaborator Author

However, in ARM the fuzzer fails with the following error in rquickjs-sys: image

I will try to test it tomorrow in x86.

Oh, interesting! I've only tested it on my Linux machine. I don't know the rquickjs dependency well — it was suggested to me as an easy way to get JavaScript code running and indeed it works nicely (on x86 at least).

@Martin1887
Copy link
Collaborator

I have tried it in x86_64 and it works, so I have created the following issue and the pull request can be merged.

DelSkayn/rquickjs#169

Thanks.

@Martin1887 Martin1887 merged commit 34c2bb5 into pulldown-cmark:master Jun 20, 2023
1 check failed
@mgeisler mgeisler deleted the fuzzing branch June 21, 2023 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants