better heuristics for detecting un-instrumented code than raw edge count #2469

inferno-chromium · 2019-05-28T16:06:37Z

See WizardMac/ReadStat#181 (comment)
also, we have a bunch of hacks accumulating in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-runner/test_all#L121

evanmiller · 2019-05-28T17:26:57Z

Can you use sancov to count the number of covered functions?

https://clang.llvm.org/docs/SanitizerCoverage.html#default-implementation

jonathanmetzman · 2019-05-28T17:41:16Z

Can you use sancov to count the number of covered functions?

Wouldn't this suffer from a similar problem? That if we pick an arbitrary number, some legitimate cases will have less than that number? Maybe we can be smarter about the number by looking at binary size?

evanmiller · 2019-05-28T17:50:50Z

I was thinking in the case of counting functions, the threshold count would be 1 (or 2). But I don't understand coverage internals very well, and am grasping at straws here.

Dor1s · 2019-05-29T19:05:04Z

The real problem here with small fuzz targets is resource allocation. To give you an example:

#include <string>
#include <vector>

// Do some computations with 'str', return the result.
// This function contains a bug. Can you spot it?
size_t DoStuff(const std::string &str) {
  std::vector<int> Vec({0, 1, 2, 3, 4});
  size_t Idx = 0;
  if (str.size() > 5)
    Idx++;
  if (str.find("foo") != std::string::npos)
    Idx++;
  if (str.find("bar") != std::string::npos)
    Idx++;
  if (str.find("ouch") != std::string::npos)
    Idx++;
  if (str.find("omg") != std::string::npos)
    Idx++;
  return Vec[Idx];
}

// Simple fuzz target for DoStuff().
// See http://libfuzzer.info for details.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  std::string str(reinterpret_cast<const char *>(data), size);
  DoStuff(str);  // Disregard the output.
  return 0;
}

This fuzz target has ~73 coverage edges (no matter whether we grep or obtain that number by any other means, as @jonathanmetzman pointed out). The current threshold is 100.

I don't think we should allocate even one CPU-day for such a trivial fuzz target. If we could allocate fractional CPUs (e.g. run such a small fuzz target for just 1 hour a day), we might let them go through, but anyway that would be fuzzing done wrong.

Regarding hacky white listing in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-runner/test_all#L121, totally agree that we should get rid of it. I'm fine with removing / disabling those fuzz targets at all. At current scale of OSS-Fuzz, I think those minor and almost useless targets shouldn't distract us.

evanmiller · 2019-05-29T19:45:58Z

I agree that the marginal value of running a small fuzzing target for the 24th hour is very small. However, in terms of catching regressions, the marginal value of running the target on a new commit for even the first 60 seconds is quite high. These small targets may be sub-grammars buried deep in a complicated file format, and otherwise awkward to fuzz. Perhaps the scheduler should be more intelligent about prioritizing targets based on the marginal likelihood of finding a bug.

Dor1s · 2019-05-29T20:01:27Z

Absolutely. That's one of the key elements of our Ideal Integration guidance. The regression testing should happen upstream and/or be embedded into developer's workflow. From fuzzing you just need a good corpus. For a trivial fuzz target, a good corpus can be generated very quickly even without an infrastructure like ClusterFuzz.

evanmiller · 2019-05-29T20:09:13Z

That makes sense.

It would help me to have more clarity on what belongs in ClusterFuzz vs what belongs upstream. I was happy to include only the large, "full format" fuzzers until @Google-Autofuzz opened WizardMac/ReadStat#181 requesting the smaller fuzzers be included. Now that I have spent time on those smaller fuzzers, I am being told they are a potential waste of resources.

As a project owner, it would help me if Google-Autofuzz and OSS-Fuzz came up with a coherent policy on acceptable fuzzing targets before making these sorts of requests of project owners. I don't really care which fuzzers are ultimately included, but I do find the current lack of clarity (and the unexplained urgency for including more fuzzing targets) a bit frustrating.

Dor1s · 2019-05-29T20:29:54Z

@evanmiller, I'm really sorry for that confusion. We're still adjusting some elements of collaboration with @Google-Autofuzz, so this situation we've put you in is definitely unfortunate.

Also, note that the point regarding a potential waste of resources is just my opinion, we still need to discuss it more with the team, and maybe we'll end up accepting small fuzz targets and lowering the requirements. I'd say stay tuned :)

inferno-chromium mentioned this issue May 28, 2019

Activate remaining 7 fuzzers in OSSFuzz WizardMac/ReadStat#181

Closed

Dor1s mentioned this issue Feb 26, 2020

[llvm_libc] Add project files for llvm-libc fuzzing. #3418

Closed

DavidKorczynski mentioned this issue Mar 13, 2020

OSS-fuzz integration dovecot/core#116

Closed

inferno-chromium added the enhancement label May 31, 2020

evanmiller mentioned this issue Sep 3, 2020

Update readstat build.sh #4407

Closed

oliverchang added the wontfix label Dec 1, 2022

oliverchang closed this as completed Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better heuristics for detecting un-instrumented code than raw edge count #2469

better heuristics for detecting un-instrumented code than raw edge count #2469

inferno-chromium commented May 28, 2019

evanmiller commented May 28, 2019

jonathanmetzman commented May 28, 2019

evanmiller commented May 28, 2019

Dor1s commented May 29, 2019

evanmiller commented May 29, 2019

Dor1s commented May 29, 2019

evanmiller commented May 29, 2019

Dor1s commented May 29, 2019

better heuristics for detecting un-instrumented code than raw edge count #2469

better heuristics for detecting un-instrumented code than raw edge count #2469

Comments

inferno-chromium commented May 28, 2019

evanmiller commented May 28, 2019

jonathanmetzman commented May 28, 2019

evanmiller commented May 28, 2019

Dor1s commented May 29, 2019

evanmiller commented May 29, 2019

Dor1s commented May 29, 2019

evanmiller commented May 29, 2019

Dor1s commented May 29, 2019