Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add IJON Documents #40

Merged
merged 30 commits into from
Mar 7, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d7fe565
fix broken listing
potetisensei Mar 1, 2022
6dac187
format TODO.md and add To-Dos of IJON
potetisensei Mar 2, 2022
ed3fe07
add Japanese documents of IJON
potetisensei Mar 2, 2022
90350d6
format TODO.md more
potetisensei Mar 2, 2022
bd12b32
lint IJON documents
potetisensei Mar 2, 2022
0991d3e
add English IJON documents and modify Japanese ones accordingly
potetisensei Mar 2, 2022
de87b71
minor fixes
potetisensei Mar 2, 2022
3c8f532
Reflect review1
potetisensei Mar 2, 2022
c691dd7
Reflect review2
potetisensei Mar 2, 2022
9836bd6
Reflect review3
potetisensei Mar 2, 2022
6b1878a
Reflect review4
potetisensei Mar 2, 2022
a0cdab6
Reflect review5
potetisensei Mar 2, 2022
f00f93b
Reflect review6
potetisensei Mar 2, 2022
559a212
Reflect review7
potetisensei Mar 2, 2022
8a15b43
Reflect review7
potetisensei Mar 2, 2022
68f9663
Reflect review8
potetisensei Mar 2, 2022
53ffa8d
Reflect review9
potetisensei Mar 3, 2022
92230fa
Reflect review10
potetisensei Mar 3, 2022
ab66ec9
Reflect review11
potetisensei Mar 3, 2022
b4af63e
Reflect review12
potetisensei Mar 3, 2022
81bd189
Reflect review13
potetisensei Mar 3, 2022
9cb1f0f
add example usage
potetisensei Mar 3, 2022
546141d
Reflect review14
potetisensei Mar 3, 2022
da3558e
Reflect review15
potetisensei Mar 3, 2022
f6f46fa
add IJON in README
potetisensei Mar 3, 2022
be58343
remove a redundant line from IJON description in README
potetisensei Mar 3, 2022
162e4cb
fix diff
potetisensei Mar 3, 2022
98087a3
Reflect review15
potetisensei Mar 7, 2022
8574475
Reflect review16
potetisensei Mar 7, 2022
8aa518c
add outcome
potetisensei Mar 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ Note, when using fuzzuf from CLI, you have to separate global options (options a
|---|---|---|---|---|---
|AFL|Greybox|A re-implementation of general purpose fuzzer, representing a CGF. Also available as a template for its derivatives.|[How to use fuzzuf's AFL CLI](/docs/algorithms/afl/algorithm_en.md#how-to-use-fuzzufs-afl-cli)|[Algorithm Overview](/docs/algorithms/afl/algorithm_en.md#algorithm-overview)|:white_check_mark:
|AFLFast|Greybox|An implementation of AFLFast, utilizing an AFL template.<br/>The algorithm tries to increase its performance by manipulating the power schedule.|[CLI Usage](/docs/algorithms/aflfast/algorithm_en.md#cli-usage)|[Algorithm Overview](/docs/algorithms/aflfast/algorithm_en.md#algorithm-overview)|:white_check_mark:
|IJON|Greybox|A fuzzer that can fuzz PUTs in an *internal-state-aware* manner with manual annotations to PUTs.|[CLI Usage](/docs/algorithms/ijon/algorithm_en.md#how-to-use-fuzzufs-ijon-cli)|[Algorithm Overview](/docs/algorithms/ijon/algorithm_en.md#algorithm-overview)
|VUzzer|Greybox|A mutation-based fuzzer guess data structures by analyzing the PUT control flow and the data flow.|Read [Prerequisite](/docs/algorithms/vuzzer/algorithm_en.md#prerequisite) first, then [Usage on CLI](docs/algorithms/vuzzer/algorithm_en.md#usage-on-cli)|[Algorithm Overview](/docs/algorithms/vuzzer/algorithm_en.md#algorithm-overview)
|libFuzzer|Greybox|CGF included in the LLVM project's compiler-rt libraries.|[How to use libFuzzer on fuzzuf](/docs/algorithms/libfuzzer/manual.md#how-to-use-libfuzzer-on-fuzzuf)|[What is libFuzzer?](/docs/algorithms/libfuzzer/algorithm_en.md#what-is-libfuzzer)
|Nezha|Greybox|A fuzzer originates from libFuzzer that tries to find defects in the program by executing programs having different implementations with the same input and compares its execution results (differential fuzzing).|[How to use Nezha on fuzzuf](/docs/algorithms/nezha/manual.md#how-to-use-nezha-on-fuzzuf)|TBD
Expand Down
45 changes: 34 additions & 11 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,27 +130,37 @@ With this mode, the performance of fuzzuf's libFuzzer would be comparable to tha

It's too bad `Mutator` has some raw pointers as its members, such as `u8 *Mutator::outbuf` and `u8 *Mutator::tmpbuf`. These members can be smart pointers or `std::vector`. We just want to replace them.

### Implement resume mode and parallel fuzzing in AFL
### Add CODING\_RULE.md and refactor the code in accordance with CODING\_RULE.md

In the past, we didn't have no explicit coding rules. Nevertheless, we have continued developping fuzzuf simultaneously and almost independently of each other. As a result, the code base doesn't look well-organized. This would make the contributors and users confusing, so we must fix it. We have already almost finished creating CODING\_RULE.md internally. We will release it after review and formatting is complete. Especially, because we started implementing libFuzzer at a very early stage, the large part of the implementation of libFuzzer doesn't conform to that rules. We will resolve this issue gradually simply because they are too large to fix immediately.


## To-Dos in each fuzzing algorithm (most of which don't require careful consideration)

### AFL
This section documents To-Dos of AFL.

#### Implement resume mode and parallel fuzzing in AFL

They are just unimplemented.

### Implement SIGUSR1 Handling on AFL
#### Implement SIGUSR1 Handling on AFL

This feature is just unimplemented.

### Remove careless templates from AFL
#### Remove careless templates from AFL

In the implmentation of AFL, we use a lot of `template` to allow users to define the derived classes of `AFLTestcase` and `AFLState`. But this is just cutting corners. Let us explain what we've done with an example. Let's say, we want to define a function that takes a reference of some struct as an argument. The struct has a member named "x". The function would look like the following:

```
```cpp
void SomeFunc(const SomeStruct& stru) {
std::cout << stru.x << std::endl;
}
```

Next, we would like to generalize this function so that it can accept similar struct types. Specificallt, we should be able to pass to the function the instances of other structs that have the member "x". Obviously, we can do that in the following way:

```
```cpp
template<class Struct>
void SomeFunc(const Struct& stru) {
std::cout << stru.x << std::endl;
Expand All @@ -159,7 +169,7 @@ void SomeFunc(const Struct& stru) {

But, another possible solution would be to define the virtual member function `SomeStruct::GetX()`, and to make other structs derive it. Like this way:

```
```cpp
// Define SomeStruct::GetX() in advance
void SomeFunc(const SomeStruct& stru) {
std::cout << stru.GetX() << std::endl;
Expand All @@ -168,21 +178,34 @@ void SomeFunc(const SomeStruct& stru) {

We should rewrite the classes of AFL in the same way eventually.

### IJON

This section documents To-Dos of IJON.

#### Implement annotations

What IJON proposed is not just a fuzzer, but a set of a fuzzer and an annotation mechanism in PUTs.
Unfortunately, the annotation mechanism is not implemented in fuzzuf because fuzzuf doesn't have its own instrumentation tool yet.
This should be implemented immediately after fuzzuf-cc becomes ready.

#### Test with Super Mario Bros.

To prove that our IJON fuzzer works well to some extent, one of the most comprehensible tests would be check if the fuzzer can play Super Mario Bros. well, as done in the paper of IJON.

### Nautilus
This section documents To-Dos of the Nautilus mode.

This section documents To-Dos of Nautilus.

#### Use vector instead of string

The current parser/unparser of the grammar and rules uses `std::string` as its data pool instead of `std::vector<u8>`.
This should be changed to `std::vector<u8>` because `std::string` is originally not meant to hold unprintable strings.

#### Improve queue

The implementation of the seed queue in the original Nautilus has a lot of room for optimization.
The current implementation of fuzzuf is similar to the original one and should be improved.

### Add CODING\_RULE.md and refactor the code in accordance with CODING\_RULE.md

In the past, we didn't have no explicit coding rules. Nevertheless, we have continued developping fuzzuf simultaneously and almost independently of each other. As a result, the code base doesn't look well-organized. This would make the contributors and users confusing, so we must fix it. We have already almost finished creating CODING\_RULE.md internally. We will release it after review and formatting is complete. After Especially, because we started implementing libFuzzer at a very early stage, the large part of the implementation of libFuzzer doesn't conform to that rules. We will resolve this issue gradually simply because they are too large to fix immediately.

[^mopt]: Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOpt: Optimized Mutation Scheduling for Fuzzers. In Proceedings of the 28th USENIX Security Symposium (Security'19).
[^eclipser]: Jaeseung Choi, Joonun Jang, Choongwoo Han, and Sang K. Cha. 2019. Grey-box Concolic Testing on Binary Code. In Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE'19).
[^qsym]: Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In Proceedings of the 27th USENIX Security Symposium (Security'18).
Expand Down
2 changes: 1 addition & 1 deletion docs/algorithms/afl/algorithm_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ fuzzuf afl --in_dir=path/to/initial/seeds/ -- path/to/PUT @@
- `--log_file=path/to/log/file`
- ログ出力や、デバッグモードでビルドした場合のデバッグ出力を記録するファイルを指定します。
- 指定されない場合は、標準出力に出力されます。
- ローカルなオプション(AFLのみで有効)
- ローカルなオプション(AFLのみで有効)
- `--dict_file=path/to/dict/file`
- 追加の辞書ファイルへのパスを指定します。

Expand Down
99 changes: 99 additions & 0 deletions docs/algorithms/ijon/algorithm_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# IJON

## What is IJON?

[IJON](https://github.com/RUB-SysSec/ijon/)[^ijon] is an annotation mechanism that allows PUTs to return new types of feedback to fuzzers, and a fuzzer that supports those feedback, proposed by [SysSec](https://informatik.rub.de/syssec/). Many famous fuzzers are classified as coverage-guided fuzzers, which try to find new behavior in a program by receiving code coverage as feedback from PUTs. A typical coverage-guided fuzzer has the following weaknesses:

- It does not care about the order in which it obtained the code coverage. For example, suppose there is a bug whose triggering condition is "executing function B immediately after executing function A". Because the fuzzer cannot distinguish between the input that causes the bug and the one that causes the execution of function A after function B, it may try only the latter and overlook the bug. More to the point, many algorithms cannot distinguish between "the input that causes both function A and function B to be called" and "two inputs such that one of them executes only function A and the other executes only function B." If the latter two inputs are tested first, the fuzzer will not test the former one.
- Among the types of code coverage, path coverage can deal with this problem to some extent. However, there is a trade-off in that if a fuzzer over-stores inputs with different execution paths, it is more likely to retain similar inputs that yield the same fuzzing result, which will eventually reduce the overall efficiency of a fuzzing campaign. It is difficult to adjust this trade-off automatically.
- There may be some internal state changes that the code coverage cannot reveal. For example, as described in IJON's paper, consider the coordinates of the player in a game. The position of the player on the screen is likely to be important in discovering new states of the game. If the player is in the upper-left corner of the screen, he may be closer to the coordinates of the new event than if in the lower right corner of the screen. However, if the fuzzer just uses the code coverage, both of them will produce the same feedback regardless of which coordinate the player is at.

IJON proposes a simple solution to these problems: human annotation on PUTs. When building a PUT from source code and instrumenting it to obtain coverage, humans can add annotations to the source code to customize the feedback that the PUT gives to a fuzzer. There are various annotations provided by IJON that humans can use to specify what they consider to be important internal states. For example, annotations such as "record the maximum value of a variable in the feedback" or "record the minimum difference between two variables" are possible.

In practice, because IJON's fuzzer is implemented based on AFL, the feedback returned by PUT is (Hashed) Edge Coverage, which is passed on to the fuzzer via shared memory. Therefore, IJON specifically implemented the annotations, which users can write in the source code, as functions and macros that write values to the shared memory. These macros and functions are compiled together when the instrumentation tools instrument the Edge Coverage.
Thus, because IJON has an AFL-based fuzzer and an interface for harness description required in practical fuzzing, it has been implemented on fuzzuf to improve the applicability of fuzzuf.

## How to use fuzzuf's IJON CLI

To use IJON's fuzzer, first you need to prepare annotated PUTs with instrumentation tools.
Because fuzzuf doesn't have its own instrumentation tool, please visit [IJON's repo](https://github.com/RUB-SysSec/ijon/) and build the original instrumentation tool.

After you create a PUT and install `fuzzuf`, run

```bash
fuzzuf ijon -i path/to/initial/seeds/ path/to/PUT @@
```

to start IJON's fuzzer. The global options available are the same as for AFL.
For AFL options, see [AFL/algorithm_en.md](/docs/algorithms/afl/algorithm_en.md).

The local option for IJON is:

- `--forksrv 0|1`
- If 1 is specified, then fork server mode is enabled. It is enabled by default.

## Example Usage

You can test the original instrumentation tool and IJON's fuzzer in fuzzuf by building and fuzzing [test.c](https://github.com/RUB-SysSec/ijon/blob/master/test.c) and [test2.c](https://github.com/RUB-SysSec/ijon/blob/master/test2.c) found in IJON's repo. Note that, test.c, included in the latest commit (56ebfe34), may yield compilation errors and in that case you need to apply the following changes:

```diff
diff --git a/llvm_mode/afl-rt.h b/llvm_mode/afl-rt.h
index 616cbd8..28d5f9d 100644
--- a/llvm_mode/afl-rt.h
+++ b/llvm_mode/afl-rt.h
@@ -45,14 +45,14 @@ void ijon_enable_feedback();
void ijon_disable_feedback();

#define _IJON_CONCAT(x, y) x##y
-#define _IJON_UNIQ_NAME() IJON_CONCAT(temp,__LINE__)
+#define _IJON_UNIQ_NAME IJON_CONCAT(temp,__LINE__)
#define _IJON_ABS_DIST(x,y) ((x)<(y) ? (y)-(x) : (x)-(y))

#define IJON_BITS(x) ((x==0)?{0}:__builtin_clz(x))
#define IJON_INC(x) ijon_map_inc(ijon_hashstr(__LINE__,__FILE__)^(x))
#define IJON_SET(x) ijon_map_set(ijon_hashstr(__LINE__,__FILE__)^(x))

-#define IJON_CTX(x) ({ uint32_t hash = hashstr(__LINE__,__FILE__); ijon_xor_state(hash); __typeof__(x) IJON_UNIQ_NAME() = (x); ijon_xor_state(hash); IJON_UNIQ_NAME(); })
+#define IJON_CTX(x) ({ uint32_t hash = ijon_hashstr(__LINE__,__FILE__); ijon_xor_state(hash); __typeof__(x) IJON_UNIQ_NAME = (x); ijon_xor_state(hash); IJON_UNIQ_NAME; })

#define IJON_MAX(x) ijon_max(ijon_hashstr(__LINE__,__FILE__),(x))
#define IJON_MIN(x) ijon_max(ijon_hashstr(__LINE__,__FILE__),0xffffffffffffffff-(x))
diff --git a/test.c b/test.c
index 50b1b05..aa022f6 100644
--- a/test.c
+++ b/test.c
@@ -3,6 +3,7 @@
#include<assert.h>
#include<stdbool.h>
#include <stdlib.h>
+#include <stdint.h>

#define compare(x,y) IJON_CTX(compare_w((x),(y)))
bool compare_w(int x, int y){
```

For example, you can build test.c and fuzz the produced binary with the following commands:

```bash
$ (path_to_ijon)/llvm_mode/afl-clang-fast (path_to_ijon)/test.c -o test
potetisensei marked this conversation as resolved.
Show resolved Hide resolved
$ mkdir /tmp/ijon_test_indir/ && echo hello > /tmp/ijon_test_indir/hello
$ fuzzuf ijon -i /tmp/ijon_test_indir/ ./test
ptr-yudai marked this conversation as resolved.
Show resolved Hide resolved
```

Here, you don't need to specify `@@` in the last command because the binary receives inputs via stdin.

While test.c and test2.c gives you an idea how you can use annotations, you can check README and source code in IJON's repo to understand their further usage.

## Algorithm Overview

IJON is implemented in a way that retains most of the functions of AFL, and adds additional functions. Roughly speaking, the differences from AFL are as follows:

- Some cases of havoc mutation are modified.
- IJON has its own seed queue, apart from the AFL seed queue.
- For each element of a 64-bit non-negative integer array in shared memory, the IJON seed queue saves the seed that made a program record the maximum value in the element.
- At the beginning of the fuzzing loop, the procedure branches randomly.
- 80% of the time, a seed is selected from the IJON seed queue. In this case, the fuzzer immediately moves to the havoc stage, and returns to the beginning of the fuzzing loop after a certain number of havoc mutations.
- 20% of the time, a seed is selected from the AFL seed queue. In this case, mutation is performed in the same flow as the original AFL.
- After a PUT exits, the IJON seed queue is updated based on the feedback obtained from the PUT.
- Even when AFL is selected in 20% probability, the IJON seed queue is also updated.
- Some of the constants are changed.