Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: coverage instrumentation for fuzzing #14565

Open
dvyukov opened this issue Feb 29, 2016 · 40 comments
Open

cmd/compile: coverage instrumentation for fuzzing #14565

dvyukov opened this issue Feb 29, 2016 · 40 comments

Comments

@dvyukov
Copy link
Member

@dvyukov dvyukov commented Feb 29, 2016

Go-fuzz (https://github.com/dvyukov/go-fuzz) is quite successful at finding bugs in Go code and reasonably widely used in Go community. However there are several problems with the current go-fuzz implementation that hinder wider adoption (in particular internal adoption at Google):

  1. go-fuzz mimics go tool build logic, which leads to constant breakages.
  2. go-fuzz-build does not handle cgo, and it is hard to implement.
  3. coverage instrumentation is source-to-source, which makes it very difficult to integrate with other build systems.
  4. source-to-source transformation can't handle all cases and has limited transformation capabilities (e.g. instrumenting && is tough). Some code patterns can be mishandled or lead to build failures.
  5. source-to-source transformation produces slow code (lots of closures).

Ideally we have coverage instrumentation in compiler, and corresponding support in go tool. Something similar to -race flag, which triggers compiler instrumentation and adds race build tag.

@dvyukov dvyukov added this to the Unplanned milestone Feb 29, 2016
@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Feb 29, 2016

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 11, 2019

@dvyukov Do you have any references for how other toolchains implement this?

Eg, I assume for cgo support, we'd want to do something that interoperates well with whatever the C compiler does.

@kcc
Copy link

@kcc kcc commented Oct 11, 2019

LLVM does this:
https://clang.llvm.org/docs/SanitizerCoverage.html#inline-8bit-counters
It would be great if the Go's instrumentation is compatible with LLVM's so that they can interoperate.

In addition, we also do
https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
which provides additional signal to fuzzing. It can be implemented in a second iteration.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 11, 2019

@kcc Thanks. A bunch of clarifying questions:

What's the expected behavior when an 8-bit counter overflows? Should it saturate or wrap around?

Do the increments need to be atomic? (I seem to recall suggesting LLVM should make them atomic, but then that turning out to be a performance hit.)

Do I understand that the basic idea is the compiler should transform every

func f() {
    if foo { ... } else { ... }
}

into

var f.counters [2]uint8

func f() {
    if foo { f.counters[0]++; ... } else { f.counters[1]++; ... }
}

? (I.e., for every branch point, we just need to increment a counter to indicate whether we took the true or false branch.) Or are there other cases we need to track, or maybe cases where we don't need to track?

Do we need to do anything for goto/labels? (Intuitively I'd think no.)

What about indirect function calls? I suppose we can hash (CALL instruction's PC, target instruction's PC) and increment some large indirect calls table.

What about panics? If there's f(); f(), I'd guess you want to distinguish whether the first or second f() panicked?

Does LLVM do any special linker stuff to ensure all the counters end up consecutively? E.g., put them in some ELF section?

Do users care about knowing which counter maps to which source location? (Intuitively, I'm assuming you really only care that the coverage data is different than you've seen before.)

--

In fact, if that's true (i.e., you only care to identify whether we've triggered new code paths), would it suffice to just keep a running hash for each G of any indirect call or conditional branch it takes? And then deterministically hash these together at the end?

@kcc
Copy link

@kcc kcc commented Oct 11, 2019

Should it saturate or wrap around?

either is fine, but wrap around is easier to implement.
With wrap around we have a potential to lose a bit of information if the counter value is k*256,
but on average it's not a problem.

Do the increments need to be atomic?

No. They will be racy, and for concurrent code will be imprecise. It's tolerable.
Most of the code we fuzz is single-threaded anyway.

if foo { f.counters[0]++; ... } else { f.counters[1]++; ... }

roughly -- yes.

E.g., put them in some ELF section?

Yes, we put the counters into a special ELF section so that for a given DSO all the counters are consecutive.

Do users care about knowing which counter maps to which source location?

libFuzzer does care about it for various reasons.
Mostly for visualization, but also for figuring our where the function begins, etc.
So yes, we need to map the counters to PCs. Here is how we do it:
https://clang.llvm.org/docs/SanitizerCoverage.html#pc-table

What about indirect function calls?

With an extra flag, indirect-calls we get a call to __sanitizer_cov_trace_pc_indirect(void *callee)
inserted for every indirect call, and it's the user responsibility to record the indirect call pair.
It's not exceptionally important, it can go in a second iteration.

Also note the trick we are using to instrument edges and not just basic blocks:
https://clang.llvm.org/docs/SanitizerCoverage.html#edge-coverage

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 11, 2019

Thanks, that was helpful.

Also note the trick we are using to instrument edges and not just basic blocks: https://clang.llvm.org/docs/SanitizerCoverage.html#edge-coverage

I saw that and read it a couple times, but I'm still a little unclear on what it's suggesting.

Do I understand correctly that it's just emphasizing that

if foo { ... }

has to be rewritten into

if foo { counter42++; ... } else { counter43++ }

and not simply into

if foo { counter42++; ... }

?

@kcc
Copy link

@kcc kcc commented Oct 12, 2019

if foo { counter42++; ... } else { counter43++ }

Yes. It makes the signal for the fuzzing engine stronger.
Again, not absolutely necessary in the first iteration, but pretty useful.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 12, 2019

So yes, we need to map the counters to PCs. Here is how we do it:
https://clang.llvm.org/docs/SanitizerCoverage.html#pc-table

What PC should be associated with each counter? The PC for the counter increment instruction?

@kcc
Copy link

@kcc kcc commented Oct 12, 2019

In our implementation it is the PC of the first instruction of the basic block (the function address for the entry block). Doesn't matter much though as long as you can correctly symbolize the PC.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 12, 2019

Okay. That sounds like enough info to get started with. Thanks!

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 18, 2019

@kcc One more question: is there any preference for edges that are reported by instrumentation to more closely match the source language or the target language?

For example, we compile:

switch x {
case 0, 7, 8, 9, 10:
    one()
case 2, 11, 20:
    two()
}

as something like:

    if x <= 2 {
        if x == 0 { goto Lone }
        if x == 2 { goto Ltwo }
    } else {
        if uint(x - 7) <= 3 { goto Lone }
        if x == 11 { goto Ltwo }
        if x == 20 { goto Ltwo }
    }
    goto Ldone
Lone:
    one()
    goto Ldone
Ltwo:
    two()
    goto Ldone
Ldone:

go-fuzz currently only inserts three counters (one for each explicit case, and then one more for the implicit empty default case). Is that preferred, or should we report edges based on compiled form (so ~10 counters for the above example), or does it not matter?

@kcc
Copy link

@kcc kcc commented Oct 18, 2019

Reporting more counters may increase the overhead a bit, but it may provide a bit more signal.
So, if the compiled form will have 10 edges, it's ok to have 10 edges.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 19, 2019

Change https://golang.org/cl/202117 mentions this issue: cmd/compile, cmd/link: add coverage instrumentation for fuzzing

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 19, 2019

Okay, so here's my progress for the day:

CL 202117 is a very rough initial pass at implementing coverage instrumentation in the compiler and linker. The linker work probably needs some tweaking, but then it's basically done. The compiler could probably be a lot smarter though.

It currently generates counters using __libfuzzer_extra_counters (like go-fuzz does), and it just assigns random names/offsets to each one. (This can be improved.)

Demo:

$ cat go.mod
module github.com/mdempsky/pngfuzz

go 1.14

require github.com/dvyukov/go-fuzz-corpus v0.0.0-20190920191254-c42c1b2914c7
$ cat main.go
package main

// #include <stdint.h>
import "C"

import (
	"unsafe"

	"github.com/dvyukov/go-fuzz-corpus/png"
)

//export LLVMFuzzerTestOneInput
func LLVMFuzzerTestOneInput(data *C.uint8_t, size C.size_t) C.int {
	s := make([]byte, size)
	for i := C.size_t(0); i < size; i++ {
		s[i] = byte(*(*C.uint8_t)(unsafe.Pointer(uintptr(unsafe.Pointer(data)) + uintptr(i))))
	}

	png.Fuzz(s)
	return 0
}

func main() {
}
$ go build -gcflags=all=-d=fuzzing -buildmode=c-archive -o pngfuzz.a .
$ clang -o png.fuzzer pngfuzz.a -fsanitize=fuzzer
$ ./png.fuzzer |& head -4
INFO: Seed: 3185389706
INFO: 6012 Extra Counters
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus

Also, note that main.go is easily generated (go-fuzz-build operates by generating a very similar file). I don't expect most users to have to write it themselves; I expect them to only need to write a fuzz target like png.go (which is the target used above).

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 19, 2019

(Also note that the demo above is using Go modules, which isn't currently supported by go-fuzz-build.)

@thepudds
Copy link

@thepudds thepudds commented Oct 21, 2019

@mdempsky First, this is very exciting.

Second, a quick FYI -- for the "Make fuzzing a first class citizen" proposal (#19109), Dmitry had written a few paragraphs with some initial thoughts on possible approaches to compiler integration in the "Coverage instrumentation" section of the proposal google doc here :

https://docs.google.com/document/u/1/d/1zXR-TFL3BfnceEAWytV8bnzB2Tfp6EPFinWVJ5V4QC8/pub

I'm sure Dmitry will comment here before long, so you should probably mostly ignore my comment, but just wanted to share that quick FYI in case useful while you are actively looking at this (but sorry in advance if you had seen that write-up recently, and/or if not otherwise useful).

CC @josharian

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 21, 2019

@thepudds Thanks!

As for @dvyukov 's proposal, I think it should be relatively easy to vary what instrumentation code we emit as needs evolve. For now, my goal is libFuzzer compatibility, since that seems to be what existing tools expect (e.g., oss-fuzz).

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Oct 22, 2019

The key of that paragraph is:
"The state of the art with respect to what's the best mode constantly evolves. The proposed instrumentation with function calls should support most of these modes seamlessly"
:)
So if we could keep the exact implementation as implementation detail (not part of stable ABI), that would be ideal. At least for some period of time until and if we have reasons to expose it.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 22, 2019

So if we could keep the exact implementation as implementation detail (not part of stable ABI), that would be ideal.

I think that's the case with my current CL. For the most part, Go doesn't guarantee any ABI stability.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 28, 2019

Change https://golang.org/cl/203887 mentions this issue: cmd/compile, runtime: add comparison tracing for libFuzzer

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 28, 2019

CL 203887 implements basic comparison tracing by calling __sanitizer_cov_trace_cmp{1,2,4,8} before each integer comparison. This allows much more efficient fuzzing of fuzz targets like:

//export LLVMFuzzerTestOneInput
func LLVMFuzzerTestOneInput(data *C.uint8_t, size C.size_t) C.int {
	if size == 4 {
		data := *(*[4]byte)(unsafe.Pointer(data))
		if data == [4]byte{'F', 'U', 'Z', 'Z'} {
			panic("got em")
		}
	}
	return 0
}

Whereas before this would have to churn randomly and hope to stumble upon the magic 32-bit constant, now libFuzzer can find it almost instantaneously.

@timtadh
Copy link

@timtadh timtadh commented Nov 1, 2019

Quick self interested question on this. I implemented a precise profiler for collecting dynamic control flow graphs of basic blocks a few years ago. It works reasonably well and I have a few research publications from the work (and a few more in the works). However, for all the reasons that it is difficult for go-fuzz to stay up to date it is tough for Dynagrok to stay up to date.

So, here are my questions:

  1. Is it possible to collect Dynamic Control Flow Graphs with this instrumentation?
  2. Is it possible for a user to extend the info collected by this instrumentation?
  3. Is it possible to collect traces?
  4. Is it possible to collect or sample variable values?

My overall goal is to get the Software Engineering community more interested in doing research on Go. They are pretty focused on Java and to some extent C++ and Javascript. The community has invented a few interesting techniques over the years and Go would actually be a great language for them to experiment on.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Nov 1, 2019

Chatted with @timtadh about this directly a bit.

This certainly isn't going to happen for Go 1.14; but for Go 1.15, I'll look into supporting something like Clang's -fsanitize-coverage=trace-pc-guard feature (to support fuzzing engines like AFL and Honggfuzz), which @timtadh thinks would be usable for Dynagrok too.

gopherbot pushed a commit that referenced this issue Nov 5, 2019
This CL adds experimental coverage instrumentation similar to what
github.com/dvyukov/go-fuzz produces in its -libfuzzer mode. The
coverage can be enabled by compiling with -d=libfuzzer. It's intended
to be used in conjunction with -buildmode=c-archive to produce an ELF
archive (.a) file that can be linked with libFuzzer. See #14565 for
example usage.

The coverage generates a unique 8-bit counter for each basic block in
the original source code, and emits an increment operation. These
counters are then collected into the __libfuzzer_extra_counters ELF
section for use by libFuzzer.

Updates #14565.

Change-Id: I239758cc0ceb9ca1220f2d9d3d23b9e761db9bf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/202117
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Nov 5, 2019
This CL extends cmd/compile's experimental libFuzzer support with
calls to __sanitizer_cov_trace_{,const_}cmp{1,2,4,8}. This allows much
more efficient fuzzing of comparisons.

Only supports amd64 and arm64 for now.

Updates #14565.

Change-Id: Ibf82a8d9658f2bc50d955bdb1ae26723a3f0584d
Reviewed-on: https://go-review.googlesource.com/c/go/+/203887
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@mdempsky
Copy link
Member

@mdempsky mdempsky commented Nov 5, 2019

I've submitted my CLs to add basic libfuzzer instrumentation to cmd/compile.

I've also written a mostly-drop-in compatible tool at github.com/mdempsky/go114-fuzz-build that utilizes it. Make sure to have a recently updated and built Go development snapshot on your $PATH, and then you can try it out:

# Install
$ go get -u github.com/mdempsky/go114-fuzz-build

# Build a fuzzer
$ cd $GOPATH/src/github.com/dvyukov/go-fuzz-corpus/png
$ go114-fuzz-build -o png.a .
$ clang -o png png.a -fsanitize=fuzzer
$ ./png corpus

My expectation is this instrumentation should be comparable to go-fuzz's "native" mode, and it should be better than go-fuzz's "libfuzzer" mode.

However, I don't have that much experience with either dvyukov/go-fuzz or libfuzzer. It would be great if experienced go-fuzz users could give it a spin, and let me know their experiences. E.g., does it fuzz faster/slower? Does it succeed at finding any new failures that dvyukov/go-fuzz did not, or cases that it's unable to find?

@picatz
Copy link

@picatz picatz commented Mar 6, 2020

👋 Hello @mdempsky!

I wanted to share some of my initial results comparing the different fuzzing options on version 1.14 for the image/png package:

  • old-go-fuzz
  • old-libfuzzer
  • new-libfuzzer

Both the libfuzzer options used similar amounts of resources over extended periods of time, while using half as much CPU as the pure go-fuzz version. The newer libfuzzer option almost produced significantly more log output and had a higher exec/s. Interestingly, the native go-fuzz case seemed to have a memory leak that was immediately apparent.

old-go-fuzz

~7000 exec/s ... slowed down to ~700 execs/s
Screen Shot 2020-03-05 at 11 40 21 AM

old-libfuzzer

~15500 execs/s
Screen Shot 2020-03-05 at 11 40 07 AM

new-libfuzzer

~18500 execs/s
Screen Shot 2020-03-05 at 11 39 58 AM

Dockerfiles

To help reproduce some of my findings, these are the containers I deployed to Nomad: png-cases.zip

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 6, 2020

@picatz Thanks! I really appreciate your independent testing.

It sounds like about 20% faster execution with native libfuzzer than previous libfuzzer support (which was already 2x faster than go-fuzz's own mutation engine)?

By increased log output, I assume you mean reports about finding new, interesting inputs? Any concrete numbers to report there? E.g., how many interesting inputs discovered after T time duration. (new-libfuzzer supports comparison tracing, whereas old-libfuzzer does not. I'd expect this would allow new-libfuzzer to discover interesting test cases with fewer executions.)

@picatz
Copy link

@picatz picatz commented Mar 6, 2020

Here are the actual logs from that png case for more accurate numbers/analysis: logs.zip

$ cat new-libfuzzer-logs.txt | head -n 30
INFO: Seed: 1855829159
INFO: 6009 Extra Counters
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED ft: 22 corp: 1/1b lim: 4 exec/s: 0 rss: 26Mb
#3507   NEW    ft: 23 corp: 2/3b lim: 6 exec/s: 0 rss: 30Mb L: 2/2 MS: 5 CopyPart-ChangeBit-ShuffleBytes-ChangeByte-CopyPart-
#5519   NEW    ft: 29 corp: 3/11b lim: 8 exec/s: 0 rss: 30Mb L: 8/8 MS: 2 ChangeByte-InsertRepeatedBytes-
#7628   NEW    ft: 39 corp: 4/19b lim: 8 exec/s: 0 rss: 30Mb L: 8/8 MS: 4 ShuffleBytes-ShuffleBytes-CMP-CMP- DE: "\xff\xff\xff\xff"-"\x89PNG\x0d\x0a\x1a\x0a"-
#10742  NEW    ft: 44 corp: 5/30b lim: 11 exec/s: 0 rss: 30Mb L: 11/11 MS: 4 InsertByte-InsertByte-InsertByte-PersAutoDict- DE: "\x89PNG\x0d\x0a\x1a\x0a"-
#11803  REDUCE ft: 44 corp: 5/28b lim: 11 exec/s: 0 rss: 30Mb L: 9/9 MS: 1 EraseBytes-
#17866  NEW    ft: 69 corp: 6/45b lim: 17 exec/s: 0 rss: 30Mb L: 17/17 MS: 3 CrossOver-CMP-CrossOver- DE: "\x01\x00\x00\x00"-
#18345  NEW    ft: 143 corp: 7/62b lim: 17 exec/s: 0 rss: 30Mb L: 17/17 MS: 4 InsertByte-CrossOver-CMP-PersAutoDict- DE: "\xff\xff\xff\xff\xff\xff\xff\xff"-"\x89PNG\x0d\x0a\x1a\x0a"-
#18358  NEW    ft: 145 corp: 8/79b lim: 17 exec/s: 0 rss: 30Mb L: 17/17 MS: 3 ChangeBit-EraseBytes-InsertRepeatedBytes-
#18434  NEW    ft: 148 corp: 9/96b lim: 17 exec/s: 0 rss: 30Mb L: 17/17 MS: 1 ShuffleBytes-
#18536  REDUCE ft: 148 corp: 9/95b lim: 17 exec/s: 0 rss: 30Mb L: 16/17 MS: 2 ShuffleBytes-CrossOver-
#18577  REDUCE ft: 149 corp: 10/111b lim: 17 exec/s: 0 rss: 30Mb L: 16/17 MS: 1 ChangeByte-
#20063  REDUCE ft: 149 corp: 10/110b lim: 17 exec/s: 20063 rss: 30Mb L: 16/17 MS: 1 EraseBytes-
#20959  NEW    ft: 155 corp: 11/127b lim: 17 exec/s: 20959 rss: 30Mb L: 17/17 MS: 1 ChangeBit-
#21470  NEW    ft: 156 corp: 12/135b lim: 17 exec/s: 21470 rss: 30Mb L: 8/17 MS: 1 ChangeBit-
#21566  REDUCE ft: 167 corp: 13/151b lim: 17 exec/s: 21566 rss: 30Mb L: 16/17 MS: 1 ChangeByte-
#27063  NEW    ft: 171 corp: 14/171b lim: 21 exec/s: 27063 rss: 30Mb L: 20/20 MS: 2 EraseBytes-InsertRepeatedBytes-
#30824  NEW    ft: 186 corp: 15/191b lim: 21 exec/s: 30824 rss: 30Mb L: 20/20 MS: 1 ChangeBit-
#31135  NEW    ft: 187 corp: 16/212b lim: 21 exec/s: 31135 rss: 30Mb L: 21/21 MS: 1 InsertByte-
#35481  NEW    ft: 188 corp: 17/236b lim: 25 exec/s: 35481 rss: 30Mb L: 24/24 MS: 1 InsertRepeatedBytes-
#55228  REDUCE ft: 197 corp: 18/264b lim: 43 exec/s: 27614 rss: 30Mb L: 28/28 MS: 2 PersAutoDict-CMP- DE: "\xff\xff\xff\xff"-"\xff\xff\xff\xff\xff\xff\xff\xff"-
#57324  NEW    ft: 198 corp: 19/292b lim: 43 exec/s: 28662 rss: 30Mb L: 28/28 MS: 1 CMP- DE: "\x1a&\x0a\x1a"-
#62992  NEW    ft: 200 corp: 20/331b lim: 48 exec/s: 20997 rss: 30Mb L: 39/39 MS: 3 PersAutoDict-CopyPart-CMP- DE: "\x1a&\x0a\x1a"-"\x13\x00\x00\x00\x00\x00\x00\x00"-
#63208  REDUCE ft: 200 corp: 20/330b lim: 48 exec/s: 21069 rss: 30Mb L: 38/38 MS: 1 EraseBytes-
#64160  REDUCE ft: 200 corp: 20/329b lim: 48 exec/s: 21386 rss: 30Mb L: 37/37 MS: 2 InsertRepeatedBytes-EraseBytes-
#64786  NEW    ft: 204 corp: 21/357b lim: 48 exec/s: 21595 rss: 30Mb L: 28/37 MS: 1 CMP- DE: "IHDR"-

Compared to the old one:

$ cat old-libfuzzer-logs.txt | head -n 30
INFO: Seed: 4240969293
INFO: 65536 Extra Counters
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED ft: 28 corp: 1/1b lim: 4 exec/s: 0 rss: 26Mb
#4017   NEW    ft: 35 corp: 2/9b lim: 8 exec/s: 0 rss: 30Mb L: 8/8 MS: 5 ShuffleBytes-InsertByte-CopyPart-ChangeBinInt-InsertRepeatedBytes-
#32768  pulse  ft: 35 corp: 2/9b lim: 33 exec/s: 16384 rss: 31Mb
#65536  pulse  ft: 35 corp: 2/9b lim: 68 exec/s: 16384 rss: 31Mb
#131072 pulse  ft: 35 corp: 2/9b lim: 128 exec/s: 16384 rss: 31Mb
#262144 pulse  ft: 35 corp: 2/9b lim: 261 exec/s: 16384 rss: 31Mb
#524288 pulse  ft: 35 corp: 2/9b lim: 526 exec/s: 15887 rss: 31Mb
#1048576        pulse  ft: 35 corp: 2/9b lim: 1050 exec/s: 15650 rss: 31Mb
#2097152        pulse  ft: 35 corp: 2/9b lim: 2094 exec/s: 15650 rss: 31Mb
#4194304        pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15592 rss: 31Mb
#8388608        pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15563 rss: 31Mb
#16777216       pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15563 rss: 31Mb
#33554432       pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15548 rss: 31Mb
#67108864       pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15545 rss: 31Mb
#134217728      pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15541 rss: 31Mb
#268435456      pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15542 rss: 31Mb
#536870912      pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15547 rss: 31Mb
#1073741824     pulse  ft: 35 corp: 2/9b lim: 4096 exec/s: 15557 rss: 31Mb

I just realized the existing corpus was only picked up by the go-fuzz case:

$ cat old-go-fuzz-logs.txt | head -n 30
2020/03/05 16:33:34 workers: 2, corpus: 266 (3s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2020/03/05 16:33:37 workers: 2, corpus: 266 (6s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 1252, uptime: 6s
2020/03/05 16:33:40 workers: 2, corpus: 266 (9s ago), crashers: 0, restarts: 1/3687, execs: 7375 (816/sec), cover: 1252, uptime: 9s
2020/03/05 16:33:43 workers: 2, corpus: 266 (12s ago), crashers: 0, restarts: 1/6042, execs: 18128 (1507/sec), cover: 1252, uptime: 12s
2020/03/05 16:33:46 workers: 2, corpus: 266 (15s ago), crashers: 0, restarts: 1/7526, execs: 22580 (1502/sec), cover: 1252, uptime: 15s
2020/03/05 16:33:49 workers: 2, corpus: 266 (18s ago), crashers: 0, restarts: 1/7556, execs: 37781 (2095/sec), cover: 1252, uptime: 18s
2020/03/05 16:33:52 workers: 2, corpus: 266 (21s ago), crashers: 0, restarts: 1/9506, execs: 66544 (3164/sec), cover: 1252, uptime: 21s
2020/03/05 16:33:55 workers: 2, corpus: 266 (24s ago), crashers: 0, restarts: 1/9496, execs: 94966 (3952/sec), cover: 1252, uptime: 24s
2020/03/05 16:33:58 workers: 2, corpus: 266 (27s ago), crashers: 0, restarts: 1/9459, execs: 122978 (4549/sec), cover: 1252, uptime: 27s
2020/03/05 16:34:01 workers: 2, corpus: 266 (30s ago), crashers: 0, restarts: 1/8870, execs: 150799 (5021/sec), cover: 1252, uptime: 30s
2020/03/05 16:34:04 workers: 2, corpus: 266 (33s ago), crashers: 0, restarts: 1/9399, execs: 178595 (5407/sec), cover: 1252, uptime: 33s
2020/03/05 16:34:07 workers: 2, corpus: 266 (36s ago), crashers: 0, restarts: 1/9385, execs: 206481 (5730/sec), cover: 1252, uptime: 36s
2020/03/05 16:34:10 workers: 2, corpus: 266 (39s ago), crashers: 0, restarts: 1/9376, execs: 234410 (6005/sec), cover: 1252, uptime: 39s
2020/03/05 16:34:13 workers: 2, corpus: 266 (42s ago), crashers: 0, restarts: 1/9717, execs: 262372 (6242/sec), cover: 1252, uptime: 42s
2020/03/05 16:34:16 workers: 2, corpus: 266 (45s ago), crashers: 0, restarts: 1/9672, execs: 290182 (6444/sec), cover: 1252, uptime: 45s
2020/03/05 16:34:19 workers: 2, corpus: 266 (48s ago), crashers: 0, restarts: 1/9635, execs: 317961 (6620/sec), cover: 1252, uptime: 48s
2020/03/05 16:34:22 workers: 2, corpus: 266 (51s ago), crashers: 0, restarts: 1/9871, execs: 345499 (6770/sec), cover: 1252, uptime: 51s
2020/03/05 16:34:25 workers: 2, corpus: 266 (54s ago), crashers: 0, restarts: 1/9812, execs: 372876 (6901/sec), cover: 1252, uptime: 54s
2020/03/05 16:34:28 workers: 2, corpus: 266 (57s ago), crashers: 0, restarts: 1/9753, execs: 399892 (7012/sec), cover: 1252, uptime: 57s
2020/03/05 16:34:31 workers: 2, corpus: 266 (1m0s ago), crashers: 0, restarts: 1/9925, execs: 426793 (7109/sec), cover: 1252, uptime: 1m0s
2020/03/05 16:34:34 workers: 2, corpus: 266 (1m3s ago), crashers: 0, restarts: 1/9992, execs: 449681 (7134/sec), cover: 1252, uptime: 1m3s
2020/03/05 16:34:37 workers: 2, corpus: 266 (1m6s ago), crashers: 0, restarts: 1/9626, execs: 471711 (7144/sec), cover: 1523, uptime: 1m6s
2020/03/05 16:34:40 workers: 2, corpus: 266 (1m9s ago), crashers: 0, restarts: 1/9849, execs: 482609 (6991/sec), cover: 1523, uptime: 1m9s
2020/03/05 16:34:43 workers: 2, corpus: 266 (1m12s ago), crashers: 0, restarts: 1/9665, execs: 512249 (7111/sec), cover: 1523, uptime: 1m12s
2020/03/05 16:34:46 workers: 2, corpus: 266 (1m15s ago), crashers: 0, restarts: 1/9845, execs: 541514 (7217/sec), cover: 1523, uptime: 1m15s
2020/03/05 16:34:49 workers: 2, corpus: 266 (1m18s ago), crashers: 0, restarts: 1/9823, execs: 569790 (7302/sec), cover: 1523, uptime: 1m18s
2020/03/05 16:34:52 workers: 2, corpus: 266 (1m21s ago), crashers: 0, restarts: 1/9921, execs: 595270 (7346/sec), cover: 1523, uptime: 1m21s
2020/03/05 16:34:55 workers: 2, corpus: 266 (1m24s ago), crashers: 0, restarts: 1/9825, execs: 618999 (7366/sec), cover: 1523, uptime: 1m24s
2020/03/05 16:34:58 workers: 2, corpus: 266 (1m27s ago), crashers: 0, restarts: 1/9733, execs: 642418 (7381/sec), cover: 1523, uptime: 1m27s
2020/03/05 16:35:01 workers: 2, corpus: 266 (1m30s ago), crashers: 0, restarts: 1/9787, execs: 665572 (7393/sec), cover: 1523, uptime: 1m30s

And I figured out why the libfuzzer cases were using half as much CPU by default.

These jobs will be run across a set of worker processes, by default using half of the available CPU cores; the count of worker processes can be overridden by the -workers=N option.

@picatz
Copy link

@picatz picatz commented Mar 10, 2020

Modifications to my setup for the same png case:

  • Smaller containers using the builder pattern.
  • Using lscpu to dynamically figure out the total number of CPUs for the libfuzzer variants to use N workers/jobs where N is the total number of CPUs.
  • Pass in the corpus directory to the libfuzzer variants so all of them have the same 266 png images to start with.

By doing that I expected all the fuzzers would be using roughly the same amount of CPU/RAM. That does seem to be the case for the most part. More interestingly, the number of exec/s drastically changed for the libfuzzer variants. Some extra interesting stuff can be found when you start to visualize the exec/s log output over time / iterations.

📦 All of the Dockerfiles and logs are available here.zip

Exec/s Summary

Fuzzer Execs/s (Min) Execs/s (Max) Execs/s (Average)
new-libfuzzer 4 537 344
old-libfuzzer 24 1619 1172
old-go-fuzz 0 6061 1221
Commands I used to determine the ~Exec/s to help double check my findings.

Max

$ cat new-libfuzzer.txt | grep "exec/s" | awk '{print $10}' | sort -n | tail -n 1
537
$ cat old-libfuzzer.txt | grep "exec/s" | awk '{print $10}' | sort -n | tail -n 1
1619
$ cat old-go-fuzz.txt | grep "execs:" | awk '{print $15}' | grep -oh '[0-9]\+' | sort -n | tail -n 1
6061

Min

$ cat new-libfuzzer.txt | grep "exec/s" | awk '{print $10}' | sort -n | head -n 1
4
$ cat old-libfuzzer.txt | grep "exec/s" | awk '{print $10}' | sort -n | head -n 1
24
$ old-go-fuzz.txt | grep "execs:" | awk '{print $15}' | grep -oh '[0-9]\+' | sort -n | head -n 1
0

Average

$ cat new-libfuzzer.txt | grep "exec/s" | awk '{print $10}' |  ruby -e "numbers=[]; STDIN.each_line { |l| numbers << l.strip.to_i }; puts numbers.sum.fdiv(numbers.count).round(0)"
344
$ cat old-libfuzzer.txt | grep "exec/s" | awk '{print $10}' | ruby -e "numbers=[]; STDIN.each_line { |l| numbers << l.strip.to_i }; puts numbers.sum.fdiv(numbers.count).round(0)"
1172
$ cat old-go-fuzz.txt | grep "execs:" | awk '{print $15}' | grep -oh '[0-9]\+' | ruby -e "numbers=[]; STDIN.each_line { |l| numbers << l.strip.to_i }; puts numbers.sum.fdiv(numbers.count).round(0)"
1221

Note: I haven't added the -fsanitize-coverage=trace-cmp compiler option yet. That is up next to test for sure!

Update: Oh, that seems to be the default behavior I just need to enable with a runtime flag... 🤦‍♂

With -fsanitize-coverage=trace-cmp (default with -fsanitize=fuzzer) and extra run-time flag -use_value_profile=1 the fuzzer will collect value profiles for the parameters of compare instructions and treat some new values as new coverage.
-- https://llvm.org/docs/LibFuzzer.html#tracing-cmp-instructions

old-go-fuzz

Screen Shot 2020-03-10 at 9 54 53 AM

When we visualize the log output, we can actually see the number of exec/s (y-axis) gradually slowing down with each iteration (x-axis).

cat old-go-fuzz.txt | grep "execs:" | awk '{print $15}' | grep -oh '[0-9]\+' |  gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

old-go-fuzz

old-libfuzzer

Screen Shot 2020-03-10 at 9 55 15 AM

When we visualize the log output, we can see the number of exec/s going up with each iteration.

cat old-libfuzzer.txt | grep "exec/s" | awk '{print $10}'  | gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

old-libfuzzer

new-libfuzzer

Screen Shot 2020-03-10 at 9 55 05 AM

Similar to the old-libfuzzer setup, we also see the number of exec/s going up through each iteration with ~1000 less iterations. It also has significantly less exec/s, and maybe it looks a bit more linear?

cat new-libfuzzer.txt | grep "exec/s" | awk '{print $10}'  | gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

new-libfuzzer

Obviously, more testing is required!

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 10, 2020

@picatz Thanks. Any insight into why adding more CPU caused the libfuzzer performance to drop so significantly?

It would also be interesting if you could plot the corpus size over time for each fuzzer.

@picatz
Copy link

@picatz picatz commented Mar 10, 2020

@mdempsky My hunch right now (based on the following corpus plots) is that it probably has to do with doing more "work" per execution. 🤔

The libfuzzer variants from before I added the corpus directories were really just be "spamming" bytes, since they had no corpus to work from. Because of this, it also created a much smaller pool of bytes/corpus over time as it discovered interesting inputs. These smaller inputs would be subsequently much faster to handle/parse, minimize, mutate, etc.

To further show that might be the case, here are the numbers for the corpuses at the "end" of two runs, one with a corpus, one without:

  1. Before corpus was given to libfuzzer variants:
Fuzzer Corpus Inputs Total Corpus Size
old-libfuzzer 2 9b
new-libfuzzer 148 20Kb
  1. After corpus was given to libfuzzer variants:
Fuzzer Corpus Inputs Total Corpus Size
old-libfuzzer 454 3382Kb
new-libfuzzer 494 3787Kb

Both of the libfuzzer variants also seem to be doing some sort of minimization. When I pass in a corpus with 266 inputs, the corpus log output will start at 210 for both of them.

$ cat old-libfuzzer.txt | grep "corp:" | awk '{print $6}' | awk -F "/" '{print $1}' | head -n 1
210
$ cat new-libfuzzer.txt | grep "corp:" | awk '{print $6}' | awk -F "/" '{print $1}' | head -n 1
210
More Cluster Setup / Client CPU Info

The cluster is running three clients in a cloud provider with identical disk, (2) CPU and (4gb) RAM configuration. Each fuzzer is running on a uniq client.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             3399.963
BogoMIPS:            5999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

Here are the plots for the corpus size:

old-go-fuzz

cat old-go-fuzz.txt | grep "corpus:" | awk '{print $6}' | gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

old-go-fuzz-corpus

old-libfuzzer

cat old-libfuzzer.txt | grep "corp:" | awk '{print $6}' | awk -F "/" '{print $1}' | gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

old-libfuzzer-corpus

new-libfuzzer

cat new-libfuzzer.txt | grep "corp:" | awk '{print $6}' | awk -F "/" '{print $1}' | gnuplot -p -e 'plot "/dev/stdin" using 0:1 with lines'

new-libfuzzer-corpus

@smasher164
Copy link
Member

@smasher164 smasher164 commented Mar 11, 2020

@mdempsky Just wondering, does the coverage instrumentation preserve directionality of edges? That is, for the example in the AFL whitepaper, are the following traces distinguished?

  A -> B -> C -> D -> E (tuples: AB, BC, CD, DE)
  A -> B -> D -> C -> E (tuples: AB, BD, DC, CE)
@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 11, 2020

@smasher164 It inserts a counter after every control flow branch. I believe this is equivalent to directed location pairs within a function, but it does miss out on indirect cross-function tuples (eg, calling a function via a variable or interface, or returning from a function).

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Mar 11, 2020

I don't know if Go instrumentation already does this or not, but FWIW both go-fuzz and libfuzzer split critical CFG edges (go-fuzz by a very naive approach of adding missing else's to if's and default cases to all switches). this gives quite important signal too. This is not necessary if the fuzzer converts BBs to edges itself (e.g. AFL with limited binary instrumentation), but libfuzzer does not do this and relies on compiler to do good instrumentation.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 11, 2020

@dvyukov cmd/compile's libfuzzer mode does roughly the same approach as go-fuzz: it adds missing elses and default cases.

@picatz
Copy link

@picatz picatz commented Mar 13, 2020

Comparing LibFuzzers

With the previous new-libfuzzer container, I created a new one using the -use_value_profile=1 runtime option with some interesting results. From the documentation for value profile:

This feature has a potential to discover many interesting inputs, but there are two downsides. First, the extra instrumentation may bring up to 2x additional slowdown. Second, the corpus may grow by several times.

I've had these three containers ( not including old-go-fuzz from before ) running for ~2 days:

  • old-libfuzzer
  • new-libfuzzer
  • new-libfuzzer-trace-cmp-with-profile

Execs/s

The old-libfuzzer has significantly more execs/s (and iterations) compared to either new-libfuzzer.

execs-per-second

After the weekend (~2 more days)

execs-per-second

Features

libFuzzer uses different signals to evaluate the code coverage: edge coverage, edge counters, value profiles, indirect caller/callee pairs, etc. These signals combined are called features (ft:).

Interestingly, old-libfuzzer found significantly more features. Between the new-libfuzzer* variants, the value profile seemed to lead to more features... But, still far behind old-libfuzzer for some reason.

features

After the weekend (~2 more days)

features

Corpus Size ( Kb )

The new-libfuzzer* variants had a larger corpus size overall.

corpus-size

After the weekend (~2 more days).

corpus-size

Corpus Files

The new-libfuzzer* variants had more files being added to the corpus.

corpus-files

After the weekend (~2 more days)

corpus-files

Next Steps

  • I'm going to continue to let these cases run over the weekend and report back. Just updated with drop-downs for the plots from after the weekend.
  • I'm then going to try out a new (perhaps more interesting?) fuzzing target than just image/png.

Interpreting Results

@mdempsky Unless I'm misinterpreting, or messing up my graphs... old-libfuzzer seems to be performing better than the new-libfuzzer* variants? With less corupus size/files (starting with the same), it seems the older tooling is able to discover more features faster than the newer instrumentation with far more exec/s. Super curious to hear your (or other's) thoughts on this!

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Mar 14, 2020

What's on the X axis (iterations)?

@picatz
Copy link

@picatz picatz commented Mar 14, 2020

One "iteration" would be one line from a fuzzer's worker log output. Each variant has two workers/jobs, so I'm joining the log output (with tail -f) as as single line per fuzzer on the chart.

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Mar 15, 2020

I assume it prints a line every 10 secs or so. What I was getting at is that in my experience fuzzing behavior over a short period of time does not matter much. If a bug is found within seconds, minutes or hours it does not matter much. There may be some difference between day and days for some use cases. But if we are talking about finding longer tail of bugs, this involves continuous fuzzing that runs for weeks/months/years. What matters then is the rooftop after months of fuzzing on fully saturated corpus. And this usually conflicts with raw speed, "fast and dumb" is important during first hours of fuzzing, but then it becomes more important how smart the fuzzer is. And being smarter generally means being slower.
Unfortunately there is no easy way to measure this. One thick I used is starting fuzzers with fully saturated corpus and then checking if they can progress any further.

@picatz
Copy link

@picatz picatz commented Mar 15, 2020

I really appreciate the insight @dvyukov!

To determine if a corpus has become fully saturated, would I monitor the features/coverage for a fuzzer and wait for it to plateau for prolonged period of time?

@dvyukov
Copy link
Member Author

@dvyukov dvyukov commented Mar 16, 2020

There is no strict procedure and "fully saturated" is not strictly defined anyway. But I would at least run a fuzzer overnight and, yes, not getting new coverage for some time.
For benchmarking purposes it can make sense to not "over saturate it" (e.g. running for a month on a cluster), because then no variation may discover any new coverage, and then benchmarking won't give any useful signal. Benchmarking fuzzers is a bit of black magic...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.