Merge pull request #40 from fuzzuf/docs/fuzzuf-ijon

docs: Add IJON Documents
fuzzuf · Mar 7, 2022 · a40c0e2 · a40c0e2
2 parents 78ada7d + 8aa518c
commit a40c0e2
Show file tree

Hide file tree

Showing 7 changed files with 289 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -82,6 +82,7 @@ Note, when using fuzzuf from CLI, you have to separate global options (options a
 |---|---|---|---|---|---
 |AFL|Greybox|A re-implementation of general purpose fuzzer, representing a CGF. Also available as a template for its derivatives.|[How to use fuzzuf's AFL CLI](/docs/algorithms/afl/algorithm_en.md#how-to-use-fuzzufs-afl-cli)|[Algorithm Overview](/docs/algorithms/afl/algorithm_en.md#algorithm-overview)|:white_check_mark:
 |AFLFast|Greybox|An implementation of AFLFast, utilizing an AFL template.<br/>The algorithm tries to increase its performance by manipulating the power schedule.|[CLI Usage](/docs/algorithms/aflfast/algorithm_en.md#cli-usage)|[Algorithm Overview](/docs/algorithms/aflfast/algorithm_en.md#algorithm-overview)|:white_check_mark:
+|IJON|Greybox|A fuzzer that can fuzz PUTs in an *internal-state-aware* manner with manual annotations to PUTs.|[CLI Usage](/docs/algorithms/ijon/algorithm_en.md#how-to-use-fuzzufs-ijon-cli)|[Algorithm Overview](/docs/algorithms/ijon/algorithm_en.md#algorithm-overview)
 |VUzzer|Greybox|A mutation-based fuzzer guess data structures by analyzing the PUT control flow and the data flow.|Read [Prerequisite](/docs/algorithms/vuzzer/algorithm_en.md#prerequisite) first, then [Usage on CLI](docs/algorithms/vuzzer/algorithm_en.md#usage-on-cli)|[Algorithm Overview](/docs/algorithms/vuzzer/algorithm_en.md#algorithm-overview)
 |libFuzzer|Greybox|CGF included in the LLVM project's compiler-rt libraries.|[How to use libFuzzer on fuzzuf](/docs/algorithms/libfuzzer/manual.md#how-to-use-libfuzzer-on-fuzzuf)|[What is libFuzzer?](/docs/algorithms/libfuzzer/algorithm_en.md#what-is-libfuzzer)
 |Nezha|Greybox|A fuzzer originates from libFuzzer that tries to find defects in the program by executing programs having different implementations with the same input and compares its execution results (differential fuzzing).|[How to use Nezha on fuzzuf](/docs/algorithms/nezha/manual.md#how-to-use-nezha-on-fuzzuf)|TBD

diff --git a/TODO.md b/TODO.md
@@ -130,27 +130,37 @@ With this mode, the performance of fuzzuf's libFuzzer would be comparable to tha
 
 It's too bad `Mutator` has some raw pointers as its members, such as `u8 *Mutator::outbuf` and `u8 *Mutator::tmpbuf`. These members can be smart pointers or `std::vector`. We just want to replace them.
 
-### Implement resume mode and parallel fuzzing in AFL
+### Add CODING\_RULE.md and refactor the code in accordance with CODING\_RULE.md
+
+In the past, we didn't have no explicit coding rules. Nevertheless, we have continued developping fuzzuf simultaneously and almost independently of each other. As a result, the code base doesn't look well-organized. This would make the contributors and users confusing, so we must fix it. We have already almost finished creating CODING\_RULE.md internally. We will release it after review and formatting is complete. Especially, because we started implementing libFuzzer at a very early stage, the large part of the implementation of libFuzzer doesn't conform to that rules. We will resolve this issue gradually simply because they are too large to fix immediately.
+
+
+## To-Dos in each fuzzing algorithm (most of which don't require careful consideration)
+
+### AFL
+This section documents To-Dos of AFL.
+
+#### Implement resume mode and parallel fuzzing in AFL
 
 They are just unimplemented.
 
-### Implement SIGUSR1 Handling on AFL
+#### Implement SIGUSR1 Handling on AFL
 
 This feature is just unimplemented.
 
-### Remove careless templates from AFL
+#### Remove careless templates from AFL
 
 In the implmentation of AFL, we use a lot of `template` to allow users to define the derived classes of `AFLTestcase` and `AFLState`. But this is just cutting corners. Let us explain what we've done with an example. Let's say, we want to define a function that takes a reference of some struct as an argument. The struct has a member named "x". The function would look like the following:
 
-```
+```cpp
 void SomeFunc(const SomeStruct& stru) {
   std::cout << stru.x << std::endl;
 }
 ```
 
 Next, we would like to generalize this function so that it can accept similar struct types. Specificallt, we should be able to pass to the function the instances of other structs that have the member "x". Obviously, we can do that in the following way:
 
-```
+```cpp
 template<class Struct>
 void SomeFunc(const Struct& stru) {
   std::cout << stru.x << std::endl;
@@ -159,7 +169,7 @@ void SomeFunc(const Struct& stru) {
 
 But, another possible solution would be to define the virtual member function `SomeStruct::GetX()`, and to make other structs derive it. Like this way:
 
-```
+```cpp
 // Define SomeStruct::GetX() in advance
 void SomeFunc(const SomeStruct& stru) {
   std::cout << stru.GetX() << std::endl;
@@ -168,21 +178,34 @@ void SomeFunc(const SomeStruct& stru) {
 
 We should rewrite the classes of AFL in the same way eventually.
 
+### IJON
+
+This section documents To-Dos of IJON.
+
+#### Implement annotations
+
+What IJON proposed is not just a fuzzer, but a set of a fuzzer and an annotation mechanism in PUTs.
+Unfortunately, the annotation mechanism is not implemented in fuzzuf because fuzzuf doesn't have its own instrumentation tool yet.
+This should be implemented immediately after fuzzuf-cc becomes ready.
+
+#### Test with Super Mario Bros.
+
+To prove that our IJON fuzzer works well to some extent, one of the most comprehensible tests would be check if the fuzzer can play Super Mario Bros. well, as done in the paper of IJON.
+
 ### Nautilus
-This section documents To-Dos of the Nautilus mode.
+
+This section documents To-Dos of Nautilus.
 
 #### Use vector instead of string
+
 The current parser/unparser of the grammar and rules uses `std::string` as its data pool instead of `std::vector<u8>`.
 This should be changed to `std::vector<u8>` because `std::string` is originally not meant to hold unprintable strings.
 
 #### Improve queue
+
 The implementation of the seed queue in the original Nautilus has a lot of room for optimization.
 The current implementation of fuzzuf is similar to the original one and should be improved.
 
-### Add CODING\_RULE.md and refactor the code in accordance with CODING\_RULE.md
-
-In the past, we didn't have no explicit coding rules. Nevertheless, we have continued developping fuzzuf simultaneously and almost independently of each other. As a result, the code base doesn't look well-organized. This would make the contributors and users confusing, so we must fix it. We have already almost finished creating CODING\_RULE.md internally. We will release it after review and formatting is complete. After Especially, because we started implementing libFuzzer at a very early stage, the large part of the implementation of libFuzzer doesn't conform to that rules. We will resolve this issue gradually simply because they are too large to fix immediately.
-
 [^mopt]: Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOpt: Optimized Mutation Scheduling for Fuzzers. In Proceedings of the 28th USENIX Security Symposium (Security'19).
 [^eclipser]: Jaeseung Choi, Joonun Jang, Choongwoo Han, and Sang K. Cha. 2019. Grey-box Concolic Testing on Binary Code. In Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE'19).
 [^qsym]: Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In Proceedings of the 27th USENIX Security Symposium (Security'18).

diff --git a/docs/algorithms/afl/algorithm_ja.md b/docs/algorithms/afl/algorithm_ja.md
@@ -31,7 +31,7 @@ fuzzuf afl --in_dir=path/to/initial/seeds/ -- path/to/PUT @@
    - `--log_file=path/to/log/file`
      - ログ出力や、デバッグモードでビルドした場合のデバッグ出力を記録するファイルを指定します。
      - 指定されない場合は、標準出力に出力されます。
- - ローカルなオプション(AFLのみで有効)
+- ローカルなオプション(AFLのみで有効)
   - `--dict_file=path/to/dict/file`
     - 追加の辞書ファイルへのパスを指定します。
 

diff --git a/docs/algorithms/ijon/algorithm_en.md b/docs/algorithms/ijon/algorithm_en.md
@@ -0,0 +1,99 @@
+# IJON
+
+## What is IJON?
+
+[IJON](https://github.com/RUB-SysSec/ijon/)[^ijon] is an annotation mechanism that allows PUTs to return new types of feedback to fuzzers, and a fuzzer that supports those feedback, proposed by [SysSec](https://informatik.rub.de/syssec/). Many famous fuzzers are classified as coverage-guided fuzzers, which try to find new behavior in a program by receiving code coverage as feedback from PUTs. A typical coverage-guided fuzzer has the following weaknesses: 
+
+- It does not care about the order in which it obtained the code coverage. For example, suppose there is a bug whose triggering condition is "executing function B immediately after executing function A". Because the fuzzer cannot distinguish between the input that causes the bug and the one that causes the execution of function A after function B, it may try only the latter and overlook the bug. More to the point, many algorithms cannot distinguish between "the input that causes both function A and function B to be called" and "two inputs such that one of them executes only function A and the other executes only function B." If the latter two inputs are tested first, the fuzzer will not test the former one.
+  - Among the types of code coverage, path coverage can deal with this problem to some extent. However, there is a trade-off in that if a fuzzer over-stores inputs with different execution paths, it is more likely to retain similar inputs that yield the same fuzzing result, which will eventually reduce the overall efficiency of a fuzzing campaign. It is difficult to adjust this trade-off automatically.
+- There may be some internal state changes that the code coverage cannot reveal. For example, as described in IJON's paper, consider the coordinates of the player in a game. The position of the player on the screen is likely to be important in discovering new states of the game. If the player is in the upper-left corner of the screen, he may be closer to the coordinates of the new event than if in the lower right corner of the screen. However, if the fuzzer just uses the code coverage, both of them will produce the same feedback regardless of which coordinate the player is at.
+
+IJON proposes a simple solution to these problems: human annotation on PUTs. When building a PUT from source code and instrumenting it to obtain coverage, humans can add annotations to the source code to customize the feedback that the PUT gives to a fuzzer. There are various annotations provided by IJON that humans can use to specify what they consider to be important internal states. For example, annotations such as "record the maximum value of a variable in the feedback" or "record the minimum difference between two variables" are possible.
+
+In practice, because IJON's fuzzer is implemented based on AFL, the feedback returned by PUT is (Hashed) Edge Coverage, which is passed on to the fuzzer via shared memory. Therefore, IJON specifically implemented the annotations, which users can write in the source code, as functions and macros that write values to the shared memory. These macros and functions are compiled together when the instrumentation tools instrument the Edge Coverage.
+Thus, because IJON has an AFL-based fuzzer and an interface for harness description required in practical fuzzing, it has been implemented on fuzzuf to improve the applicability of fuzzuf.
+
+## How to use fuzzuf's IJON CLI
+
+To use IJON's fuzzer, first you need to prepare annotated PUTs with instrumentation tools.
+Because fuzzuf doesn't have its own instrumentation tool, please visit [IJON's repo](https://github.com/RUB-SysSec/ijon/) and build the original instrumentation tool.
+
+After you create a PUT and install `fuzzuf`, run
+
+```bash
+fuzzuf ijon -i path/to/initial/seeds/ path/to/PUT @@
+```
+
+to start IJON's fuzzer. The global options available are the same as for AFL.
+For AFL options, see [AFL/algorithm_en.md](/docs/algorithms/afl/algorithm_en.md).
+
+The local option for IJON is:
+
+- `--forksrv 0|1`
+  - If 1 is specified, then fork server mode is enabled. It is enabled by default.
+
+## Example Usage
+
+You can test the original instrumentation tool and IJON's fuzzer in fuzzuf by building and fuzzing [test.c](https://github.com/RUB-SysSec/ijon/blob/master/test.c) and [test2.c](https://github.com/RUB-SysSec/ijon/blob/master/test2.c) found in IJON's repo. Note that, test.c, included in the latest commit (56ebfe34), may yield compilation errors and in that case you need to apply the following changes:
+
+```diff
+diff --git a/llvm_mode/afl-rt.h b/llvm_mode/afl-rt.h
+index 616cbd8..28d5f9d 100644
+--- a/llvm_mode/afl-rt.h
++++ b/llvm_mode/afl-rt.h
+@@ -45,14 +45,14 @@ void ijon_enable_feedback();
+ void ijon_disable_feedback();
+
+ #define _IJON_CONCAT(x, y) x##y
+-#define _IJON_UNIQ_NAME() IJON_CONCAT(temp,__LINE__)
++#define _IJON_UNIQ_NAME IJON_CONCAT(temp,__LINE__)
+ #define _IJON_ABS_DIST(x,y) ((x)<(y) ? (y)-(x) : (x)-(y))
+
+ #define IJON_BITS(x) ((x==0)?{0}:__builtin_clz(x))
+ #define IJON_INC(x) ijon_map_inc(ijon_hashstr(__LINE__,__FILE__)^(x))
+ #define IJON_SET(x) ijon_map_set(ijon_hashstr(__LINE__,__FILE__)^(x))
+
+-#define IJON_CTX(x) ({ uint32_t hash = hashstr(__LINE__,__FILE__); ijon_xor_state(hash); __typeof__(x) IJON_UNIQ_NAME() = (x); ijon_xor_state(hash); IJON_UNIQ_NAME(); })
++#define IJON_CTX(x) ({ uint32_t hash = ijon_hashstr(__LINE__,__FILE__); ijon_xor_state(hash); __typeof__(x) IJON_UNIQ_NAME = (x); ijon_xor_state(hash); IJON_UNIQ_NAME; })
+
+ #define IJON_MAX(x) ijon_max(ijon_hashstr(__LINE__,__FILE__),(x))
+ #define IJON_MIN(x) ijon_max(ijon_hashstr(__LINE__,__FILE__),0xffffffffffffffff-(x))
+diff --git a/test.c b/test.c
+index 50b1b05..aa022f6 100644
+--- a/test.c
++++ b/test.c
+@@ -3,6 +3,7 @@
+ #include<assert.h>
+ #include<stdbool.h>
+ #include <stdlib.h>
++#include <stdint.h>
+
+ #define compare(x,y) IJON_CTX(compare_w((x),(y)))
+ bool compare_w(int x, int y){
+```
+
+For example, you can build test.c and fuzz the produced binary with the following commands:
+
+```bash
+$ (path_to_ijon)/afl-clang-fast (path_to_ijon)/test.c -o test
+$ mkdir /tmp/ijon_test_indir/ && echo hello > /tmp/ijon_test_indir/hello
+$ fuzzuf ijon -i /tmp/ijon_test_indir/ ./test
+```
+
+Here, you don't need to specify `@@` in the last command because the binary receives inputs via stdin. If IJON's instrumentation tool and fuzzer operate properly, the fuzzer will detect a crash within 3 to 5 minutes with a high probability.
+
+While test.c and test2.c gives you an idea how you can use annotations, you can check README and source code in IJON's repo to understand their further usage.
+
+## Algorithm Overview
+
+IJON is implemented in a way that retains most of the functions of AFL, and adds additional functions. Roughly speaking, the differences from AFL are as follows:
+
+- Some cases of havoc mutation are modified.
+- IJON has its own seed queue, apart from the AFL seed queue.
+  - For each element of a 64-bit non-negative integer array in shared memory, the IJON seed queue saves the seed that made a program record the maximum value in the element.
+- At the beginning of the fuzzing loop, the procedure branches randomly.
+  - 80% of the time, a seed is selected from the IJON seed queue. In this case, the fuzzer immediately moves to the havoc stage, and returns to the beginning of the fuzzing loop after a certain number of havoc mutations.
+  - 20% of the time, a seed is selected from the AFL seed queue. In this case, mutation is performed in the same flow as the original AFL.
+- After a PUT exits, the IJON seed queue is updated based on the feedback obtained from the PUT.
+  - Even when AFL is selected in 20% probability, the IJON seed queue is also updated.
+- Some of the constants are changed.