Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New fuzz #97

Merged
merged 17 commits into from Jan 28, 2020
Merged

New fuzz #97

merged 17 commits into from Jan 28, 2020

Conversation

Licenser
Copy link
Member

Update to new fuzzer code for #95

@codecov
Copy link

codecov bot commented Jan 26, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@ad0c959). Click here to learn what that means.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #97   +/-   ##
=========================================
  Coverage          ?   85.68%           
=========================================
  Files             ?       39           
  Lines             ?     5245           
  Branches          ?        0           
=========================================
  Hits              ?     4494           
  Misses            ?      751           
  Partials          ?        0
Flag Coverage Δ
#avx2 80.03% <75%> (?)
#avx2KnownKey 78.81% <75%> (?)
#sse 80.72% <75%> (?)
#sseKnownKey 79.44% <75%> (?)
Impacted Files Coverage Δ
tests/jsonchecker.rs 100% <ø> (ø)
src/sse42/deser.rs 92.64% <100%> (ø)
src/numberparse.rs 87.86% <100%> (ø)
src/avx2/deser.rs 90% <60%> (ø)
src/lib.rs 95.35% <77.77%> (ø)
src/stage2.rs 59.87% <88.23%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad0c959...cb6ee05. Read the comment docs.

@Licenser
Copy link
Member Author

Initial perf check on changes to fix issues found with the new fuzzer:

note so great

master

======= simd_json ======== parse|stringify ===== parse|stringify ====
data/canada.json         530 MB/s   420 MB/s   710 MB/s   330 MB/s
data/citm_catalog.json  1120 MB/s   480 MB/s  1540 MB/s   600 MB/s
data/twitter.json        920 MB/s   600 MB/s   990 MB/s   650 MB/s
data/log.json           1080 MB/s  1080 MB/s  1080 MB/s  1080 MB/s

===== simd_json_tape ===== parse|stringify ===== parse|stringify ====
data/canada.json         770 MB/s                    
data/citm_catalog.json  1900 MB/s                    
data/twitter.json       1670 MB/s                    
data/log.json           2170 MB/s                    

this

======= simd_json ======== parse|stringify ===== parse|stringify ====
data/canada.json         500 MB/s   430 MB/s   670 MB/s   330 MB/s
data/citm_catalog.json  1050 MB/s   490 MB/s  1460 MB/s   620 MB/s
data/twitter.json        910 MB/s   620 MB/s   990 MB/s   650 MB/s
data/log.json           1080 MB/s  1080 MB/s  1080 MB/s  1080 MB/s

===== simd_json_tape ===== parse|stringify ===== parse|stringify ====
data/canada.json         740 MB/s                    
data/citm_catalog.json  1790 MB/s                    
data/twitter.json       1650 MB/s                    
data/log.json           2170 MB/s                    

@Licenser
Copy link
Member Author

Licenser commented Jan 28, 2020

It looks like the cache takes a serious hit here resulting in the observed slowdown :(

                                    Instructions             Cache.              Cycle/byte      
        Name           Cycles    Normal.     Branch     Misses   References    Best       Avg    
apache_builds            537705    1356915     172392        797       4305      4.190      4.225
apache_builds(+/-)      -9.975%    -9.243%    -5.063%    70.891%    70.407%    -3.516%    -9.975%
canada                 18268717   53939041    7628342      18128      50422      8.073      8.116
canada(+/-)              0.983%    -1.380%    -1.078%    44.583%    58.185%     1.432%     0.983%
citm_catalog            7099823   18328666    2447906       8413      33624      4.073      4.111
citm_catalog(+/-)       -2.014%    -7.403%    -7.837%    77.368%    21.782%    -1.265%    -2.014%
github_events            236860     543599      62042        400       1788      3.587      3.637
github_events(+/-)      -5.775%    -7.210%    -5.820%    67.000%    43.792%    -1.490%    -5.775%
gsoc-2018               8031694   15905348    1443474      32090      87390      2.382      2.413
gsoc-2018(+/-)           6.438%    -5.487%   -16.243%    87.865%    83.404%    10.142%     6.438%
instruments             1352269    2982151     394612       1371      10838      6.080      6.137
instruments(+/-)         3.222%    -2.253%     0.342%    78.848%    62.991%     4.650%     3.222%
log                       12919      24338       2893         11         71      5.608      5.940
log(+/-)                 0.612%    -6.028%    -3.249%    81.818%    47.887%     0.590%     0.612%
marine_ik              29166886   71431965   10320335      51062     134342      9.718      9.776
marine_ik(+/-)           3.925%    -1.378%    -1.309%    62.191%    62.030%     3.769%     3.925%
mesh                    6443221   16584212    2362871       4278      22146      8.863      8.904
mesh(+/-)                1.170%     0.662%    -1.180%    76.858%    75.156%     1.773%     1.170%
mesh.pretty             8430988   22055619    2935841       7571      31026      5.308      5.345
mesh.pretty(+/-)         5.592%     1.710%    -0.840%    74.772%    76.500%     5.520%     5.592%
numbers                  965364    2496051     311521        912       5568      6.398      6.430
numbers(+/-)            -1.476%     0.931%    -2.803%    73.794%    75.952%     2.181%    -1.476%
random                  3402648    8355062    1052801       3066      19965      6.629      6.666
random(+/-)             -6.729%   -10.234%    -4.260%    86.367%    37.145%    -5.993%    -6.729%
twitter                 3212697    6728927     767723       3790      29549      5.047      5.087
twitter(+/-)             3.554%    -4.728%    -2.327%    84.697%    67.830%     4.624%     3.554%
twitterescaped          3910963    8354387    1015571       3307      27423      6.915      6.954
twitterescaped(+/-)     -0.652%    -8.045%    -9.677%    85.455%    64.668%     0.002%    -0.652%
update-center           2762077    6104194     773926       3221      22675      5.144      5.180
update-center(+/-)      -1.921%   -10.788%    -5.789%    85.408%    68.419%    -1.000%    -1.921%

@Licenser Licenser marked this pull request as ready for review January 28, 2020 14:27
@Licenser
Copy link
Member Author

@sunnygleason this adds the new fuzzer which checks for more things. The new memory access checks found some issues with it having the possibility to read a few bytes past the original input data. The solution to that was to move the entire input data into.

The fix degraded performance generally I am very hesitant to merge anything that degrades performance OTOH safety outweighs performance. Lastly, we might be able to gain some of that back with additional tweaks, but I somehow rather have a correct slower version released then a possibly buggy faster one and then we try to make good on the perf in a different PR.

What do you think?

@sunnygleason
Copy link
Member

@Licenser this is excellent work, thank you for making it happen!

I agree that if we have a problematic release, we should issue a patch release and follow up on performance soon after.

@sunnygleason
Copy link
Member

(feel free to merge this so you can get a build out and I'll follow up w/ the review asap)

@Licenser
Copy link
Member Author

Ja, I'm not sure how problematic the issues fuzz found are, they've been around since pretty much the beginning and no one hit them yet - but better safe then sorry I think.

@Licenser
Copy link
Member Author

While I wait for the CI I'll take the chance to elaborate.

The basic issue is that we read slightly behind the input (about as wide as a SIMD register). We protect against this crashing already since if that would fall beyond the page boundary we relocate the data.

The issue still feels somewhat valid non the less, as it might allow reading memory that doesn't belong to the parser. I can't come up with a real example that would trigger that but I'm not an expert in stealing data :P the new approach always relocates the data and pads it with enough memory top ensure this doesn't cause a problem - that makes it slower :(

@Licenser Licenser merged commit 178fef3 into master Jan 28, 2020
@Licenser Licenser added this to In progress in 0.3 via automation Jan 28, 2020
@Licenser Licenser moved this from In progress to Done in 0.3 Mar 4, 2020
@Licenser Licenser deleted the new-fuzz branch June 17, 2020 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.3
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants