Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add libFuzzer Implementation Documents #38

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/algorithms/libfuzzer/algorithm_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,77 @@ To support 1., libFuzzer will link the target built with `-fsanitize-coverage=ed

To achieve 2., libFuzzer rewrites the information held by the sanitizer before and after executing the harness, if necessary.
The implementation of LLVM's sanitizer is not an API that guarantees to be compatible in the future. Since libFuzzer itself is part of LLVM, it is implemented to be coupled with the corresponding version of LLVM's sanitizer.

As mentioned in the previous section on feature behavior, the original libFuzzer has some minor differences in behavior depending on the version. For example, up to LLVM 8, mt19937 is used as the random number generator, and after LLVM 9, minstd_rand is used.

## How libFuzzer works

libFuzzer performs the process shown in the following pseudo-code.

Where `initial_input` is the initial seed, `target` is the fuzzing target, `total_count` is the number of times to execute the target, and `mutation_depth` is the number of times to perform mutation on the same input value.

```cpp
count = 0;
// Array of IDs of "unique" features that have appeared more than once but less frequently.
unique_feature_set = {}
// A map holding the number of times a feature has appeared.
global_feature_freqs = {}
corpus = {}
// For all initial seeds:
for( input in initial_inputs ) {
// Execute the target once.
exec_result = execute( target, input );
// Add the execution result to the corpus.
add_to_corpus( corpus, exec_result, input );
}
// Update the distribution of the probability of selecting the input value.
dist = update_distribution( corpus );
// Until the number of attempts reaches total_count:
while( count < total_count ) {
// Until `i` reaches mutation_depth (default to 5 in libFuzzer):
for( i = 0; i < mutation_depth; ++i ) {
// Select a input value from corpus.
[old_exec_result,input] = corpus.select_seed();
// Perform mutation on the input value.
mut_input = mutate( dist, input );
// Execute the target.
exec_result = execute( target, mut_input );
// Collect the features from the execution result.
features = collect_features( old_exec_result, exec_result, unique_feature_set, global_feature_freqs );
// If a new feature has been discovered:
if( is_interesting( features ) ) {
// Add the execution result and the input value to corpus.
corpus.add( exec_result, mut_input );
// Update the distribution of the probability of selecting the input value.
dist = update_distribution( corpus );
// Increment the attempt counter.
++count;
// Exit the loop if the execution result is added to corpus, even if `i` does not reach mutation_depth.
break;
}
else {
// Increment the attempt counter.
++count;
}
}
}
```

libFuzzer treats "features of interest" in the target execution results and gives them IDs.
Features are the index of the edge in the edge coverage and the number of hits on the edge.

In the pseudo code above, `collect_features()` collects features from `exec_result` of the current seed execution. `collect_features()` updates `unique_feature_set` and `global_feature_freqs` each time a feature is found.

If there is a new feature in `features` collected by `collect_features()`, `corpus.add()` will add its execution result `exec_result` and the mutated input value `mut_input` to corpus. `is_interesting()` checks if `features` contains at least one "rare" feature from `unique_feature_set`. Finally, the probability `dist` of choosing the next input value is updated with `update_distribution()`.

There are two types of behavior of `update_distribution()`, vanilla scheduling and entropic scheduling, and the libFuzzer implementation of fuzzuf uses the former by default. In vanilla scheduling, libFuzzer uses a simple "select more recently found items with higher probability" policy. On the other hand, entropic scheduling evaluates the results of the seed runs and updates the probability distribution `dist` so that the results with higher evaluation are more likely to be selected in the next `select_seed()`. The evaluation value of the result of this seed run used in entropic scheduling is called energy, and the fuzzer calculates it while collecting features with `collect_features()` in the following terms:

* The number of "rare" features found.
* The number of mutations performed from the initial seed to this input value.
* How deviated from the average execution time is.

In entropic scheduling, libFuzzer selects the seed according to the following policy by using energy:

* Focus on the input that produced the rarest features
* Select inputs that have repeatedly found new features with each mutation.
* Select inputs that run more quickly if the same number of features are found.
120 changes: 120 additions & 0 deletions docs/algorithms/libfuzzer/implementation_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# libFuzzer Implementation in fuzzuf

This section describes the implementation of libFuzzer in fuzzuf.

## How to Construct the Standard libFuzzer

The fuzzer corresponding to `Fuzzer::Loop()` of the original libFuzzer is `createRunone()` in [include/fuzzuf/algorithms/libfuzzer/create.hpp](/include/fuzzuf/algorithms/libfuzzer/create.hpp). You can use the libFuzzer implementation in fuzzuf by using this function when you use it as a standard libFuzzer.

## HierarFlow Nodes for libFuzzer Implementation

fuzzuf has the following nodes to represent libFuzzer in HierarFlow. For more details about each node, refer to the comments in the source code or the documentation generated by Doxygen.

### Mutator Nodes

* [EraseBytes](/include/fuzzuf/algorithms/libfuzzer/mutation/erase_bytes.hpp)
* [InsertByte](/include/fuzzuf/algorithms/libfuzzer/mutation/insert_byte.hpp)
* [InsertRepeatedBytes](/include/fuzzuf/algorithms/libfuzzer/mutation/insert_repeated_bytes.hpp)
* [ChangeByte](/include/fuzzuf/algorithms/libfuzzer/mutation/change_byte.hpp)
* [ChangeBit](/include/fuzzuf/algorithms/libfuzzer/mutation/change_bit.hpp)
* [ShuffleBytes](/include/fuzzuf/algorithms/libfuzzer/mutation/shuffle_bytes.hpp)
* [ChangeASCIIInteger](/include/fuzzuf/algorithms/libfuzzer/mutation/change_ascii_integer.hpp)
* [ChangeBinaryInteger](/include/fuzzuf/algorithms/libfuzzer/mutation/change_binary_integer.hpp)
* [CopyPart](/include/fuzzuf/algorithms/libfuzzer/mutation/copy_part.hpp)
* CopyPartOf
* InsertPartOf
* [CrossOver](/include/fuzzuf/algorithms/libfuzzer/mutation/crossover.hpp)
* CrossOver
* CopyPartOf
* InsertPartOf
* [Dictionary](/include/fuzzuf/algorithms/libfuzzer/mutation/dictionary.hpp)
* Dictionary
* UpdateDictionary

### Control Nodes

The control node performs the control necessary to construct the libFuzzer in HierarFlow.

* [ForEach](/include/fuzzuf/algorithms/libfuzzer/hierarflow/for_each.hpp)
* [IfNewCoverage](/include/fuzzuf/algorithms/libfuzzer/hierarflow/if_new_coverage.hpp)
* [RandomCall](/include/fuzzuf/algorithms/libfuzzer/hierarflow/random_call.hpp)
* [Repeat](/include/fuzzuf/algorithms/libfuzzer/hierarflow/repeat.hpp)
* [RepeatUntilNewCoverage](/include/fuzzuf/algorithms/libfuzzer/hierarflow/repeat_until_new_coverage.hpp)
* [RepeatUntilMutated](/include/fuzzuf/algorithms/libfuzzer/hierarflow/repeat_until_mutated.hpp)
* [DoNothing](/include/fuzzuf/algorithms/libfuzzer/do_nothing.hpp)
* [Assign](/include/fuzzuf/algorithms/libfuzzer/hierarflow/assign.hpp)
* [Append](/include/fuzzuf/algorithms/libfuzzer/hierarflow/append.hpp)

### Execute Node

The Execute node has an Executor, which executes the target with input values and gets coverage, standard output, and execution results.

* [Execute](/include/fuzzuf/algorithms/libfuzzer/hierarflow/execute.hpp)

### Feedback Node

Feedback nodes are responsible for selecting execution results to be added to the corpus.

* [CollectFeatures](/include/fuzzuf/algorithms/libfuzzer/hierarflow/collect_features.hpp)
* [AddToCorpus](/include/fuzzuf/algorithms/libfuzzer/hierarflow/add_to_corpus.hpp)
* [AddToSolutions](/include/fuzzuf/algorithms/libfuzzer/hierarflow/add_to_solution.hpp)
* [UpdateDistribution](/include/fuzzuf/algorithms/libfuzzer/hierarflow/add_to_solution.hpp)
* [ChooseRandomSeed](/include/fuzzuf/algorithms/libfuzzer/hierarflow/choose_random_seed.hpp)

### Debug Nodes

Debug nodes make it easy to debug fuzzers in HierarFlow.

* [Dump](/include/fuzzuf/algorithms/libfuzzer/hierarflow/dump.hpp)
* [PrintStatusForNewUnit](/include/fuzzuf/algorithms/libfuzzer/hierarflow/print_status_for_new_unit.hpp)

## Unimplemented Features

Some features of libFuzzer are unimplemented in fuzzuf:

### Mutator

#### Mutate_AddWordFromTORC (CMP)

The original libFuzzer adds this mutator when option CMP is enabled. It records the information of branching by comparison operations and performs mutation.
If the compiler has the `-fsanitize-coverage=trace-cmp` option, the compiler will instrument the executable to record the comparison target and operator type at runtime when the executable branches on a comparison operation. CMP will look for a value in the input that is the same as the value used in the comparison and rewrites it so that the conditional branch goes into a different side. This mutation is based on the following idea: even if the input has the same value as the value used in the comparison, it does not guarantee the same value as the comparison target, but the fuzzer expects to change the branch direction with some probability.
This mutator is unimplemented in fuzzuf because it does not support `trace-cmp` to get information about comparison operations.

#### Mutate_AddWordFromPersistentAutoDictionary

libFuzzer has a dictionary that stores the values inserted by `Mutate_AddWordFromManualDictionary` and `Mutate_AddWordFromTORC` that led to the discovery of new coverage. Since CMP is unimplemented in fuzzuf, this mutator only stores input from `Mutate_AddWordFromManualDictionary`.

#### Custom Mutator

libFuzzer provides CustomMutator and CustomCrossOver to add your mutators. fuzzuf does not provide nodes for these mutators because it is easier to add nodes by yourself than provide special nodes.

### Corpus

libFuzzer can persist the complete state of corpus and resume fuzzing. Still, the implementation of fuzzuf only persists the input values, so it cannot completely restore the previous state even if it resumes from the persistent information.

### Feature

#### Data Flow Trace

Data Flow Trace uses a record of data movement obtained using LLVM's DataFlowSanitizer to determine which parts of the input values affect the branch. By masking the range of mutation based on this result, a fuzzer can concentrate on finding the input that exits a particular branch. However, the original implementation of libFuzzer does not use this information effectively to generate the mask.
This feature is not implemented in fuzzuf because there is no way to get the information equivalent to DataFlowSanitizer similar to CMP.

### Executor

#### Avoiding Child Process Creation

libFuzzer avoids the cost of creating child processes by linking the fuzzing target and the fuzzer into the same binary. Still, since there is no equivalent executor in fuzzuf, the implementation of fuzzuf creates child processes.

#### Support for Shared Libraries

libFuzzer has a mechanism to collect and combine the edge coverage of both executable binaries and shared libraries linked to them. This feature is not implemented in fuzzuf because there is no way to get coverage from shared libraries.

### Feedback

#### Leak Sanitizer

The original libFuzzer detects memory leaks. On the other hand, fuzzuf can treat this case as a failure if the target is compiled with Leak Sanitizer. Still, since the reason for the failure is abort, it is not possible to determine whether it was a memory leak or not, so fuzzuf cannot use the information of unreleased memory detection.

#### Stack Depth Tracing

The original libFuzzer uses the stack-depth of LLVM's SanitizerCoverage to get the stack depth used by the target, but fuzzuf has no way to get the stack depth used, so it is unimplemented.