Add benchmarking using arbitrary fuzzing #465

juntyr · 2023-07-16T11:13:38Z

This is the start of my very roundabout way to get back to #444, where we really need a benchmark that captures something other than JSON-like-RON to ron::Value. I hope to upgrade our arbitrary fuzzer to use proper typing to generate an arbitrary data structure and its corresponding Serialize and Deserialize implementation. For a new PR, we would then first run the fuzzer, then extract the corpus for the arbitrary target, and then benchmark serialising and deserialising based on these examples. Ideally, the current main branch would also be pulled in again and run on these benchmarks as well to provide an automatic comparison.

This will probably take me several weekends to fully implement, but I hope it will finally give us the needed insights to land #444 with the best perf-maintainability tradeoff.

I've included my change in CHANGELOG.md

Add tests to document the following bugs found by fuzzing and now fixed:

struct, enum, and variant names are always validated
unit structs / variants called r can be parsed by ron::Value (which previously thought this was the start of a raw string)
strings containing '\\' are serialised as raw strings when escaping is turned off
a stack of nested Options which are serialised with #![enable(implicit_some)] and contains a None cannot be uniquely deserialised, since we have no idea where the None came from. This case has to be tracked, so that Somes can be inserted in case a None is detected inside an unbroken stack of implicit Somes.
deserialising "A('/')" into ron::Value fails as the struct type searcher reads into the char and then finds a weird comment starter there

Problematic bugs which need to be documented, tested, and discussed further:

deserialising Some(...) inside deserialize_any with #![enable(unwrap_variant_newtypes)] cannot work as currently implemented, thus it is now properly detected with a new (very specific) error code. Unwrapping variant newtypes currently reaches through Options, and [v0.9] Breaking: Treat Some like any newtype variant #413 makes it more explicit by treating Some like a newtype variant. However, deserialize_any cannot support newtype variant Some in all cases, since it special-cases Some(...) to look at .... E.g. Some(a: 4) works great in typed mode and looks very nice, but cannot be supported here. Either we decide to make Some explicitly not a newtype variant (which is a breaking change since it kind of escaped through it before and loses us the nice syntax), or we keep this very obscure error which should not be encountered often. The former would definitely be safer. Another alternative is to use Add minimal support for internally tagged and untagged enums #451 to pre-parse the struct type in deserialize_any when unwrap_variant_newtypes is enabled and to handle tuples, structs, and unit structs with special cases.

Future work

can we also fuzz over the serde flatten and enum attributes?
the struct type guesser introduced in Add minimal support for internally tagged and untagged enums #451 is quite expensive and can lead to quadratic parsing times. Perhaps we could cache some information / an AST would also work better here.

codecov-commenter · 2023-07-16T11:18:25Z

Codecov Report

Patch coverage: 86.61% and project coverage change: +0.08% 🎉

Comparison is base (52f282d) 85.19% compared to head (93d06a7) 85.28%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
+ Coverage   85.19%   85.28%   +0.08%     
==========================================
  Files          66       72       +6     
  Lines        8513     8850     +337     
==========================================
+ Hits         7253     7548     +295     
- Misses       1260     1302      +42

Files Changed	Coverage Δ
tests/307_stack_overflow.rs	`97.91% <ø> (ø)`
src/value/mod.rs	`47.25% <28.57%> (+0.82%)`	⬆️
src/de/mod.rs	`76.03% <62.22%> (+0.31%)`	⬆️
src/parse.rs	`79.18% <71.79%> (-2.92%)`	⬇️
src/ser/mod.rs	`71.57% <77.02%> (+0.67%)`	⬆️
tests/250_variant_newtypes.rs	`98.66% <89.28%> (-1.06%)`	⬇️
tests/447_compact_maps_structs.rs	`100.00% <100.00%> (ø)`
tests/465_implicit_some_stack.rs	`100.00% <100.00%> (ø)`
tests/465_no_comment_char_value.rs	`100.00% <100.00%> (ø)`
tests/465_r_name_value.rs	`100.00% <100.00%> (ø)`
... and 5 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

juntyr · 2023-08-17T19:17:23Z

?r @torkleyy @manunio This PR became quite big and contains several parts:

upgrade the arbitrary fuzzer to fuzz any serde data types and values (excluding anything requiring attributes)
fix any bugs discovered by the fuzzer so far
small code style improvements along the way
a benchmark suite which is executed on new PRs and runs across the fuzzer corpus

If you have some time, I'd appreciate any feedback you can give on this PR - thanks in advance!

juntyr · 2023-08-17T19:17:59Z

P.S. the benchmarking CI test is expected to still fail since it cannot yet compare against the benchmark on the main branch, which is only added in this PR

… reading in the corpus

…wtypes

…any, also for Some + Fix check_struct_type lookahead

…zing deserialisation

…heck for unwrapping newtype variants

torkleyy

Looks good!

* First steps towards a lossless Value::Number * Allow parsing +unsigned as unsigned int * Add additional tests for number parsing * Added CHANGELOG entry * Improve coverage by running tests across all features * Refactor number parsing for better readability * Extend number tests to typed ser+de * Adjust #465 tests to lossless Value::Number

juntyr mentioned this pull request Jul 16, 2023

[v0.9] Breaking: Treat Some like any newtype variant #413

Closed

1 task

juntyr force-pushed the fuzzy-benchmark branch from 80f0156 to 7a541cf Compare July 17, 2023 12:30

juntyr force-pushed the fuzzy-benchmark branch 3 times, most recently from 09f17fe to 4af5152 Compare August 17, 2023 18:21

juntyr self-assigned this Aug 17, 2023

juntyr marked this pull request as ready for review August 17, 2023 19:13

juntyr requested a review from torkleyy August 17, 2023 19:14

juntyr added 19 commits August 19, 2023 15:31

Early prototyping with a typed arbitrary fuzzer (ser only so far) and…

fe9b493

… reading in the corpus

Also fuzz the ser::PrettyConfig (identation-excluded)

a0d1672

Start implementing the arbitrary typed data deserialising fuzzing

634f6f8

Fix None inside stack of implicit Some-s

53e032d

Detect problematic Some inside deserialize_any with unwrap_variant_ne…

5715e8f

…wtypes

Alternative to ron-rs#413: Some is explicitly not a newtype variant

513eec4

Fix clippy::useless_conversion lint

bca27b9

Another alternative: allow newtype variant unwrapping in deserialize_…

50ecb5f

…any, also for Some + Fix check_struct_type lookahead

Fix PartialOrd impls for Map and Float

0933897

Implement arbitrary tuple struct (static field names slice FIXME) fuz…

4438244

…zing deserialisation

Fully fix Float comparison with total_ord

dc9f90a

Fix clippy lints

6352107

Finished arbitrary struct and enum deserialisation fuzzing

6208450

Create CI workflow for benchmarking

def15a5

Fix corpus download

2bb6bdf

Fix corpus unzip

9e2448e

Fix corpus unzip to existing extraction directory

028cc7d

Give benchmark the comparison branch name

b1db368

Restrict the benchmark to unique cases (ty, value, ron)

6ff0664

juntyr added 5 commits August 19, 2023 15:31

Add test for the Serialize identifier validation

58c624e

Add tests for further fuzzer-found bugs

5598752

Add the extensive CHANGELOG entry

dbf5e6c

Add the test and changelog entry from the subsumed ron-rs#413

25a52c4

Add an early return + more tests for the expensive newtype or tuple c…

93d06a7

…heck for unwrapping newtype variants

juntyr force-pushed the fuzzy-benchmark branch from c23cb59 to 93d06a7 Compare August 19, 2023 15:32

torkleyy approved these changes Aug 20, 2023

View reviewed changes

juntyr merged commit dea68fe into ron-rs:master Aug 20, 2023
7 of 8 checks passed

juntyr added a commit to juntyr/ron that referenced this pull request Aug 20, 2023

Adjust ron-rs#465 tests to lossless Value::Number

c3af695

juntyr deleted the fuzzy-benchmark branch August 20, 2023 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking using arbitrary fuzzing #465

Add benchmarking using arbitrary fuzzing #465

juntyr commented Jul 16, 2023 •

edited

Loading

codecov-commenter commented Jul 16, 2023 •

edited

Loading

juntyr commented Aug 17, 2023 •

edited

Loading

juntyr commented Aug 17, 2023

torkleyy left a comment

Add benchmarking using arbitrary fuzzing #465

Add benchmarking using arbitrary fuzzing #465

Conversation

juntyr commented Jul 16, 2023 • edited Loading

codecov-commenter commented Jul 16, 2023 • edited Loading

Codecov Report

juntyr commented Aug 17, 2023 • edited Loading

juntyr commented Aug 17, 2023

torkleyy left a comment

Choose a reason for hiding this comment

juntyr commented Jul 16, 2023 •

edited

Loading

codecov-commenter commented Jul 16, 2023 •

edited

Loading

juntyr commented Aug 17, 2023 •

edited

Loading