Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rejecting unwanted inputs? #214

Open
msyrota opened this issue Nov 27, 2022 · 9 comments
Open

Rejecting unwanted inputs? #214

msyrota opened this issue Nov 27, 2022 · 9 comments

Comments

@msyrota
Copy link

msyrota commented Nov 27, 2022

I use repeated messages in my fuzzing harness and length 0 (which it can supply) is unfortunately invalid for me. In a "classic" libFuzzer harness, whenever I get invalid input, I can reject it by returning -1, however for some reason libprotobuf-mutator ignores TestOneInput return code and I get invalid protobufs in my corpus. I've tried the following to propagate the return code, but to no avail:

diff --git a/src/libfuzzer/libfuzzer_macro.h b/src/libfuzzer/libfuzzer_macro.h
index b5cb201..70c3ff4 100644
--- a/src/libfuzzer/libfuzzer_macro.h
+++ b/src/libfuzzer/libfuzzer_macro.h
@@ -74,8 +74,8 @@
     using protobuf_mutator::libfuzzer::LoadProtoInput;                      \
     Proto input;                                                            \
     if (LoadProtoInput(use_binary, data, size, &input))                     \
-      TestOneProtoInput(input);                                             \
-    return 0;                                                               \
+      return TestOneProtoInput(input);                                      \
+    return -1;                                                              \
   }
 
 #define DEFINE_POST_PROCESS_PROTO_MUTATION_IMPL(Proto) \
@@ -83,7 +83,7 @@
       protobuf_mutator::libfuzzer::PostProcessorRegistration<Proto>;
 
 #define DEFINE_PROTO_FUZZER_IMPL(use_binary, arg)                 \
-  static void TestOneProtoInput(arg);                             \
+  static int TestOneProtoInput(arg);                              \
   using FuzzerProtoType =                                         \
       protobuf_mutator::libfuzzer::macro_internal::GetFirstParam< \
           decltype(&TestOneProtoInput)>::type;                    \
@@ -91,7 +91,7 @@
   DEFINE_CUSTOM_PROTO_CROSSOVER_IMPL(use_binary, FuzzerProtoType) \
   DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, FuzzerProtoType)   \
   DEFINE_POST_PROCESS_PROTO_MUTATION_IMPL(FuzzerProtoType)        \
-  static void TestOneProtoInput(arg)
+  static int TestOneProtoInput(arg)
@ligurio
Copy link
Contributor

ligurio commented Jul 11, 2023

@vitalybuka Vitaly, in Tarantool we are using an LPM-based fuzzer for Lua runtime and want to reject unwanted inputs too. It is possible to implement this?

@vitalybuka
Copy link
Collaborator

When LPM was implemented, it was not possible.
And I didn't notice when it was implemented, (even I was on review https://reviews.llvm.org/D128749).

Yes, we should do that.

@KanishAnand
Copy link

We also need to reject unwanted inputs and avoid them from polluting the corpus. Are there any updates on this feature?
We would be happy to contribute if we can agree on a design @vitalybuka

@vitalybuka
Copy link
Collaborator

Yes, we need to provide backward compatibility with existing fuzzers.

BTW why PostProcessorRegistration is not enough?

@KanishAnand
Copy link

Modifying the input via PostProcessRegistration seems infeasible, as correcting invalid inputs in our case is non-trivial. The input proto message resembles a repeated field of operations where values of each operation depends on the previous ones. Thus we believe rejecting invalid inputs to avoid polluting our corpus would be a more suitable approach.

Regarding supporting backward compatibility, we also have a few ideas it could be done:

  1. Adding another macro like #DEFINE_TEST_ONE_PROTO_INPUT_IMPL_WITH_RETURN which declares TestOneProtoInput(arg) with return type and this can be enabled via some preprocessor directive. So developers can use existing DEFINE_PROTO_FUZZER but define a macro if they want to use the implementation supporting return type.
#define LIBPROTOBUFMUTATOR_ENABLE_RETURN_CODE

#define DEFINE_PROTO_FUZZER_IMPL(use_binary, arg)
  ...
  #ifdef LIBPROTOBUFMUTATOR_ENABLE_RETURN_CODE
      DEFINE_TEST_ONE_PROTO_INPUT_IMPL_WITH_RETURN(use_binary, FuzzerProtoType)   
      static int TestOneProtoInput(arg)
  #else 
      DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, FuzzerProtoType)   
      static void TestOneProtoInput(arg)
  ...
  1. Adding an optional output argument depicting the return value to TestOneProtoInput which can be set by developers in
    cases of invalid inputs.
    static void TestOneProtoInput(arg, outputArg=0) 
    DEFINE_PROTO_FUZZER (...) {
         if(invalidInput()) {
             outputArg = -1;
         }
    }
  1. Adding a whole new macro like #DEFINE_PROTO_FUZZER_WITH_RETURN, which declares TestOneProtoInput with return value.

Would be happy to contribute/discuss further.

@vitalybuka
Copy link
Collaborator

Modifying the input via PostProcessRegistration seems infeasible, as correcting invalid inputs in our case is non-trivial. The input proto message resembles a repeated field of operations where values of each operation depends on the previous ones. Thus we believe rejecting invalid inputs to avoid polluting our corpus would be a more suitable approach.

You can modify to some unique constant value, e.g. empty proto.

Regarding supporting backward compatibility, we also have a few ideas it could be done:

  1. Adding another macro like #DEFINE_TEST_ONE_PROTO_INPUT_IMPL_WITH_RETURN which declares TestOneProtoInput(arg) with return type and this can be enabled via some preprocessor directive. So developers can use existing DEFINE_PROTO_FUZZER but define a macro if they want to use the implementation supporting return type.
#define LIBPROTOBUFMUTATOR_ENABLE_RETURN_CODE

#define DEFINE_PROTO_FUZZER_IMPL(use_binary, arg)
  ...
  #ifdef LIBPROTOBUFMUTATOR_ENABLE_RETURN_CODE
      DEFINE_TEST_ONE_PROTO_INPUT_IMPL_WITH_RETURN(use_binary, FuzzerProtoType)   
      static int TestOneProtoInput(arg)
  #else 
      DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, FuzzerProtoType)   
      static void TestOneProtoInput(arg)
  ...
  1. Adding an optional output argument depicting the return value to TestOneProtoInput which can be set by developers in
    cases of invalid inputs.
    static void TestOneProtoInput(arg, outputArg=0) 
    DEFINE_PROTO_FUZZER (...) {
         if(invalidInput()) {
             outputArg = -1;
         }
    }
  1. Adding a whole new macro like #DEFINE_PROTO_FUZZER_WITH_RETURN, which declares TestOneProtoInput with return value.

Would be happy to contribute/discuss further.

what if we do
static void TestOneProtoInput(arg) - > static auto TestOneProtoInput(arg)

and then some template magic in DEFINE_TEST_ONE_PROTO_INPUT_IMPL to return 0, or value, depending on TestOneProtoInput type?

@KanishAnand
Copy link

You can modify to some unique constant value, e.g. empty proto.

We wouldn't prefer doing this because it would require adding all input validation checks under PostProcessRegistration. This would result in duplicating all the pre-conditions checks that are currently performed separately under public API calls for each operation. Additionally, this approach would be quite expensive, as we would have to execute a lot of commands before determining they are invalid, and if they are valid we have to redo all that work.

what if we do static void TestOneProtoInput(arg) - > static auto TestOneProtoInput(arg)

and then some template magic in DEFINE_TEST_ONE_PROTO_INPUT_IMPL to return 0, or value, depending on TestOneProtoInput type?

We tried working with this suggestion but this might not be possible without adding an extra #define/ some structure change.
I have a basic code here which fails compilation because auto type needs to be deduced before defining DEFINE_TEST_ONE_PROTO_INPUT_IMPL which could have been corrected via two ways (as far as I know):

  1. Defining TestOneProtoInput before DEFINE_TEST_ONE_PROTO_INPUT_IMPL, which is not possible with current DEFINE_PROTO_FUZZER's macro structure as the auto function has to be the last thing for user to define.
  2. Templatizing LLVMFuzzerTestOneInput function which would delay it's compilation until it's instantiated which would be after TestOneProtoInput is already defined and would work as shown here. However this can't be done as LLVMFuzzerTestOneInput is a C-type function.

@KanishAnand
Copy link

KanishAnand commented Sep 21, 2023

I don't think it would be possible to deduce the return type as code will always appear at the end after macro. Would you be happy with any of these suggestions.

@KanishAnand
Copy link

KanishAnand commented Mar 7, 2024

Drafted a prototype PR with above solutions, happy to discuss further.
@vitalybuka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants