Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler #2

gasche · 2017-12-08T18:00:07Z

I noticed that afl-persistent and crowbar have run into issues with
"stability" that are to be handled at the level of the compiler code
generation strategy:

ocaml/ocaml#1345
(disabling class caching in afl-instrument mode)

and

stedolan/crowbar#14
(yet-unfixed stability issue with `Lazy.force)

are instances of this.

Energized by my use of Crowbar during the excellent Mirage retreat
this year, I have wondered if it was something I could help with, and
tried to implement pieces of a solution at the level of the OCaml
compiler. Specifically I would like to make it easy to say, in the
compiler, "don't instrument this branch" or something like that, and
then this opt-out mechanism could be used to solve those stability
issues. This is the dream; what I have right now is a branch that does
what I wanted it to do, but does not solve stability issues in any way
that I was able to measure (so maybe I wanted the wrong thing). So,
very Work In Progress. I'm posting here to solicit feedback and
comments (cc @stedolan and @yomimono); in particular, I'm curious if
someone think that this approach has a chance to ever work.

The branch is afl on my personal fork:

https://github.com/gasche/ocaml/tree/afl

there are commits that describe what it does

https://github.com/gasche/ocaml/commits/afl
https://github.com/gasche/ocaml/compare/trunk...gasche:afl?expand=1

Understanding of the problem

The stability issues, if I understand correctly (but then I don't
actually understand what "stability" means in afl-fuzz parlance), come
from code that will run into different control path if they are
executed several times from the same input, when the change-of-control
mechanism are outside the view of the tool. (In the classes case, it
is caching; in the Lazy.force case, if @stedolan's analysis of it is
correct, it is the non-deterministic effect of a GC removing a Forward
tag before or after the user tries to force the already-evaluated
thunk for a second time).

So if we don't want to change how the compiler produces code (I would
rather not), we have to find how to selectively disable the
instrumentation point that are involved in these un-stable
conditionals. For class caching, I had the impression that if we could
make sure that neither the "has the method table already been
computed" test nor any of the method-table-computation code was
instrumented, we would recover stability. For the lazy forcing, I had
the impression that if the only instrumentation points to run were the
ones involved in forcing the trunk, and one when the final value is
returned, then that would be correct (I'm not convinced anymore).

Design of a solution attempt

It seems impossible or at least extremely fragile to try to annotate
individual control-flow expressions at the level of OCaml or Lambda
code, and hope to robustly control instrumentation with that, because
instrumentation happens at the Cmm level and further control-flow
constructs may be inserted by the compiler in the middle.

So I think one would need a way to disable all instrumentation within
one portion (defined as static code zone or dynamic fragment of
execution) of the program. (It would also be useful to be able to
share instrumentation points across two program locations, but that
sounded more difficult to implement so I haven't worked on it.)

My solution attempt is to introduce two primitives that would look
essentially as follows:

val suspend_afl : unit -> unit
val restore_afl : unit -> unit

Between a call to suspend_afl and the nest call to restore_afl, no
instrumentation happens.

I implement these two primitives in the commits

"add {suspend,restore}afl primitives to selectively silence path collection"
(this propagates the primitives from the frontend to cmm,
where I expect that they will be given a semantics in afl_instrument.ml)
"selectgen: interpret {suspend,restore}afl as no-ops"
(this ignores then in instruction selection, that is after cmm)

(I'm using commit names instead of hashes as I expect the hashes to
change as I keep rebasing the PR.)

Attempt 1: disabling instrumentation dynamically

The dumbest possible implementation for this is to have a piece of
global state, a boolean, to indicate whether instrumentation is
currently active or suspended:

the {suspend,restore} operations are compiled into writes that
change that global state
the instrumentation code is changed to first check this global flag
before performing any action

I implemented this approach in

"semantics of {suspend,restore}afl"

(I realize as I'm writing this report that this implementation does
not bracket well, calling "suspend" twice and then "restore" only once
restores execution, and that incrementing and decrementing an integer
would work better. Easy to implement.)

Attempt 2: disabling instrumentation statically

When I started thinking about {suspend,restore}_afl, I had in mind
something closer to static annotations that would direct the
non-emission of afl instrumentation code. Basically the idea is that
after you encounter suspend_afl, you stop emitting insturmentation
code, and you start again on the next restore_afl.

The naive implementation of this assumes that function calls always
start with instrumentation enabled, so that instrumentation always
restarts during inner function calls (so it is not equivalent to the
static semantics above). This is not good for the class caching
scenario, when we want the no-insturmentation zone to cross function
calls. But it would be easy to add a blacklist of specific functions
from CamlinternalOO to start in suspended mode, or to just add
well-placed (suspendafl) instructions at the beginning of those
functions (as a ppx extension or what not).

The way this "static" policy is implemented may qualify as a "cool
hack", as in: I think it's cool, but it is also a horrible
hack. Anyway, this was just for experiment, and I checked the output
with -dcmm and it seems to be doing what I want:

"afl: make the afl_status a purely static property"

Negative results

I haven't tried to measure stability of object method table
caching. I tried to measure stability of thunk forcing, and the
results are terrible: none of what I have done improves stability, in
fact it decreases it.

I made two attempts at artfully placing {suspend,restore}_afl
instructions in the Lazy.force generated code:

"try to use {suspend,restore}afl primitives to stabilize Lazy forcing"
"{suspend,restore}afl in lazy_force: try to better bracket the instructions"

The code emitted by the compiler for Lazy.force looks as follows:

match Obj.tag obj with
| Lazy_tag -> force_the_thunk_of obj
| Forward_tag -> obj.(0)
| _ -> obj

and I tried to change it as follows (in the first commit):

suspend_afl ();
match Obj.tag obj with
| Lazy_tag -> restore_afl (); force_the_thunk_of obj
| Forward_tag -> obj.(0)
| _ -> obj

or as follows (in the second commit)

suspend_afl ();
let result =
  match Obj.tag obj with
  | Lazy_tag -> restore_afl (); force_the_thunk_of obj
  | Forward_tag -> obj.(0)
  | _ -> obj
in
restore_afl ();
result

Finally, I decided that what I really wanted was to have both the
Forward_tag and the _ branches assigned the same instrumentation
point, so I wrote a different version of the lazy-forcing code
generation strategy that looks like this:

if Obj.tag obj = Lazy_tag
then begin
  force_the_thunk of obj
end else begin
  suspend_afl ();
  let result =
    if Obj.tag obj = Forward_tag then obj.(0) else obj
  in
  restore_afl ();
  result
end

None of that seems to work: without my changes, my reproduction code
(below) sports a stability of 57.14%, and some of what I've done bring
stability down to 0%, or 33%, or at best 52.94%.

It was not easy for me to find code to reproduce the stability issue
for laziness, below is my repro-case. Is it sensible?

let test = lazy (print_endline "foo")

let f () =
  ignore (List.init 10 (fun _ -> Lazy.force test))

let _ = AflPersistent.run f

The text was updated successfully, but these errors were encountered:

gasche · 2017-12-09T09:49:39Z

I realized that I had more questions while writing this. @stedolan, why did you choose to perform the afl-fuzz instrumentation at the cmm level rather than earlier in the pipeline? Here is my current understanding:

advantage: it is easier to generate the low-level instrumentation code at the Cmm level
disadvantage: you encounter control-flow branches that do not come from the user-written code, but are the result of compilation

Or is the latter point actually an advantage, that you cover more of the "implicit" control-flow of the program? Intuitively I would rather expect that instrumentation should correspond to user-visible control flow, and indeed all stability examples come from compiler-inserted control flow being instrumented "too much", but maybe instrumenting at the Lambda level would instrument "not enough"? (do we have an example?) (Also, if it's important to instrument more, then why isn't the C runtime instrumented as well? Would different allocation behaviors generate way too many paths within the GC for usable fuzzing?)

One implementation approach that I could see would be to introduce an instrument_afl primitive at the Lambda level, insert it on the Lambda code, propagate it to the backend and turn it into proper afl instrumentation code in Cmmgen. Tweaking the instrumentation could then be done by changing the way those primitives are placed in the lambda code.

(I think it would actually be even nicer to separate the instrumentation action from the instrumentation "location", so that we have the ability to share the instrumentation location across several distinct source positions. So it could be

val new_instrumentation_point : unit -> instrumentation_point
val instrument_afl : instrumentation_point -> unit

)

stedolan · 2017-12-11T13:59:15Z

Thanks for diving into this tricky problem, and thanks for the excellent write-up!

Your understanding of the problem matches mine. I do prefer the static to the dynamic approach here, (although maybe with a simpler implementation!). For testing, it's worth exploring the afl-whatsup script, which dumps the instrumentation output (in a bizarre format, sadly). There are some stability unit tests in testsuite/tests/afl-instrumentation on trunk which might be some help.

I think that your repro case is right, although it's dependent on when the GC runs. Here is a reproduction case that's a bit easier to reproduce, as a patch against the unit tests in trunk.

@stedolan, why did you choose to perform the afl-fuzz instrumentation at the cmm level rather than earlier in the pipeline?

This is a very good question. The honest answer is that this matches most closely what afl itself does (when instrumenting C++), and I didn't really understand the system all that well at the time. I think adding instrumentation at the Lambda level might well be a better approach. I think your instrument_afl primitive sounds promising.

gasche · 2017-12-11T14:34:26Z

Thanks for the repro case. I played with your testsuite a bit (it certainly served as inspiration for my own reproduction attempts), but I'm mildly suspicious of it because, unless I'm mistaken, "classes" is currently failing in trunk.

stedolan · 2017-12-11T14:43:15Z

"classes" is currently failing in trunk.

That's news to me! It works on my machine. Do you have some details of a configuration where it fails?

gasche · 2017-12-11T15:06:20Z

It fails when I configure the compiler with ./configure -afl-instrument (otherwise it works).

testsuite/tests/afl-instrumentation$
  OCAMLOPT="../../../ocamlopt.opt -nostdlib -I ../../../stdlib" sh test.sh | grep '\(passed\|failed\)'
         lists: passed.
     manylists: passed.
    exceptions: passed.
       objects: passed.
       classes: failed:
  obj_ordering: passed.

(current trunk state is 7a7bf0580d8ad84a44e2b495c400ce0e1e3a095c)

stedolan · 2017-12-11T15:56:46Z

Ah, so it does! I was compiling without configure -afl-instrument. The difference is in whether the standard library is instrumented. I suspect in this case it's instrumentation of camlInternalOo that's breaking things.

stedolan · 2018-05-03T09:19:23Z

I was thinking about the aflInstrument primitive idea again, and I'm no longer convinced it's the right place to put instrumentation, because of deep pattern matching. By putting instrumentation at the Cmm (or Lambda) level, the instrumentation picks up information about patterns that partially matched, but before Lambda deep patterns are atomic.

The ideal place to put instrumentation would be somewhere after pattern matches are expanded to branches, but before objects, Lazy, etc. are compiled away. Unfortunately, those are all done at the same time in the compiler.

gasche · 2018-05-03T09:30:41Z

We could have a primitive available at several places in the pipeline, with the social contract that producing a new atomic control-flow branch (if or plain switch) requires thinking about whether the destination of the branch should be instrumented or not.

The difficulty is that I don't know how to think of instrumentation in a modular way. As a code producer, how do I know whether I was called in a context where instrumentation should or should not happen? Should this intent (coming from my caller) be expressed as additional configuration parameters in the compiler codebase, or as special instructions to scope instrumentation in the emitted code? In either case, can/should this information cross function calls or other abstraction boundaries?

stedolan · 2023-05-22T21:12:05Z

(while I'm still open to continuing this discussion, I'm closing the issue because I don't think it's a thing to be fixed in this repo)

gasche changed the title ~~Soliciting feedback on a (currently failing attempt) to improve "stability" by tweaking the compiler~~ Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler Dec 9, 2017

gasche mentioned this issue Feb 13, 2018

afl-fuzz instrumentation fix for classes ocaml/ocaml#1345

Merged

stedolan mentioned this issue May 3, 2018

AFL stability fixes for objects (MPR#7725) and lazy values. ocaml/ocaml#1754

Merged

stedolan closed this as completed May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler #2

Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler #2

gasche commented Dec 8, 2017

gasche commented Dec 9, 2017

stedolan commented Dec 11, 2017

gasche commented Dec 11, 2017

stedolan commented Dec 11, 2017

gasche commented Dec 11, 2017

stedolan commented Dec 11, 2017

stedolan commented May 3, 2018

gasche commented May 3, 2018

stedolan commented May 22, 2023

Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler #2

Soliciting feedback on a (currently failing) attempt to improve "stability" by tweaking the compiler #2

Comments

gasche commented Dec 8, 2017

Understanding of the problem

Design of a solution attempt

Attempt 1: disabling instrumentation dynamically

Attempt 2: disabling instrumentation statically

Negative results

gasche commented Dec 9, 2017

stedolan commented Dec 11, 2017

gasche commented Dec 11, 2017

stedolan commented Dec 11, 2017

gasche commented Dec 11, 2017

stedolan commented Dec 11, 2017

stedolan commented May 3, 2018

gasche commented May 3, 2018

stedolan commented May 22, 2023