Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upMultiple changes; python 2 support, dictionary, mutator refactor #26
Conversation
The tool is now able to run on Python 2 with a suitable LRU Cache. Obviously Python 2 isn't as useful to many people, but when working with libraries that are Python 2-only, it's necessary to make the tools work with it.
Because the 'NEW' state was being logged before the coverage count was updated, the log line would not include the new coverage count. This change ensures that the log line reports the new coverage count.
The mutators were supplied inline with the corpus mutation loop. This makes it tedious to extend, and difficult to filter out mutations which are not interesting. The code has been reworked here so that... * Each mutator is its own class. * Each class can provide information about what it does, such as its name and the types of mutations it performs. * Each class is registered into a list of classes that are available. * The Corpus instantiates these classes when it is intialised, and could (but does not at this time) filter the list as necessary. The name isn't even used yet. * Mutators can return None to say that they're not appropriate. This means that adding a new mutator is a matter of creating a new class, in the same style as the existing ones, and giving information on what the mutator does. Mutators could be based on one another - so for example the 'swap' mutator could be reworked to exchange variable lengths of values, rather than only bytes, and then subclassed to produce short, long and longlong variants. This has not been done here. Previously, the code attempted to retry applying mutators if they were not deemed appropriate; this was ineffective because they merely tried to decrement the iteration count, which did not affect the iterations at all - it looks like the code was originally using C-style for loops where the variable controls the termination, whilst in Python the range controls the iteration of this loop. This has been replaced by the mutator returning None to signal that it is inappropriate, and a loop in the caller repeats the selection of a new mutator.
Mutators can now be listed as part of a help command, and may then be filtered by the user supplying a filter specification to disable or enable only certain mutators.
Previously there was always a likelihood that we would terminate our retries if the mutator said that it was unable to be used, because we had a number of mutators that were unconditional. However, now that the mutators are able to be filtered, it is possible to select a set of mutators which may always claim they are inappropriate. In such a case, we would loop forever. This change bounds the retries on looking for a mutator to 20 attempts - an arbitrary number I picked from the air as seeming reasonable.
This commits adds the support for dictionaries (https://llvm.org/docs/LibFuzzer.html#dictionaries), to help fuzzers increase their coverage faster. It seems that there is a bug in the _copy function, because the word is correctly inserted, but it seems that the padding after it is wrong, and I couldn't understand why. Although to be honest, I didn't spent much time on it, since I'd like to have feedback on this PR before investing more debug time. The implementation is pretty crude, it silently ignore invalid lines in the dictionary file, and is likely using words in the corpus a bit too often.
A small typo in the word dictionary, fixed.
The dictionary reader can now handle escaped strings in its vales, as given in the AFL examples.
Raw binary files are supported by the AFL dictionaries, and matter most for the cases where we're dealing with binary chunks that would otherwise be tedious to insert into the token file.
The dictionary mutator wasn't actually returning the correct values, so was always claiming to fail.
Knowing which type of mutators are in use helps to favour certain kinds of operations. In particular, if you start with a small dictionary of words and wish to generate longer strings, using just an option that appends to the dictionary is useful. The dictionary append operation has been added which allows the values from the dictionary to be strung together in increasingly longer sequences. It doesn't offer the option of appending multiple values from the dictionary so the operation my end up in a local minima, but such mutators could be added in the future.
The kill operation is never a good choice for stopping a subprocess - it does not give the subprocess any chance to clean up. It's more usual to try a terminate and later kill if the process did not stop. More importantly to me, the kill method isn't present in the python 2 multiprocessing module.
| @@ -134,9 +156,9 @@ def start(self): | |||
| break | |||
|
|
|||
| buf = self._corpus.generate_input() | |||
| parent_conn.send_bytes(buf) | |||
| parent_conn.send_bytes(bytes(buf)) | |||
This comment has been minimized.
This comment has been minimized.
yevgenypats
Jan 8, 2020
Contributor
Isn't it bytes already? I think this might slow down the fuzzer if it's an unnecessary copy.
This comment has been minimized.
This comment has been minimized.
gerph
Jan 8, 2020
Author
Contributor
There's a significant difference in how Python 2 and Python 3 handle 'bytes'.
In python 3, bytes is a list of vaules which when iterated over, or indexed, are just integers.
In python 2, bytes is a str, which when interated over, or indexed, contains single character str.
This difference means that working in the mutators with bytes is impractical - the code would need to handle both the cases of the elements of the iteration being integers and being str objects. That would be... tiresome and difficult to work with... but, if instead the elements are handled as bytearrays, this situation is simplified.
In both python 2 and python 3, bytearray is able to be initialised just like a bytes object, and iterates as a set of integers. It can be converted to a bytes object easily without encoding issues, and manipulated just like we were handling the bytes object. Because we're actually working with a list of bytes which we want to mutate, a bytearray is actually the ideal container to use. The python 3 documentation describes it as:
The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has...
(https://docs.python.org/3/library/functions.html#func-bytearray)
I originally moved to bytearray because 'it just worked', but as I looked at what it was, it became more obvious that it's actually the right tool for the job. There is, obviously, the fact that the conversion may incur a little cost, but compared to the other operations and the transfers between processes that are performed, I suspect that it's entirely negligable.
|
Thank you for this huge contribution! I'll try to review this shortly. |
| # Select a mutator from those we can apply | ||
| # We'll try up to 20 times, but if we don't find a | ||
| # suitable mutator after that, we'll just give up. | ||
| for n in range(20): |
This comment has been minimized.
This comment has been minimized.
yevgenypats
Jan 11, 2020
Contributor
why change to 20 instead of the nm? nm = self._rand_exp() was a magic number used both in go-fuzz and afl/libfuzzer and it worked well. I tried other numbers and it usually affected the speed of the fuzzer significantly
This comment has been minimized.
This comment has been minimized.
gerph
Jan 11, 2020
Author
Contributor
You're confusing the two loops, I think.
The outer loop (line 495 in the new changes, just above those change) selects the number of mutations to run. Before this change, it was actually the 'number of allocators we'll try to run', because if the mutator said 'sorry, I'm not appropriate' (by decrementing the iteration count) it wouldn't actually affect the number of mutations run.
In this new version, when the mutator says it's not appropriate (by returning None at line 504), we want to try the mutation sequence again. This is what the inner loop of 20 attempts does. 20 was just picked from the air as seeming like a good number.
This means that the number of mutators that we plan to add, as determined in by nm, will be the number we generate (unless we are exceptionally unlucky and repeatedly select inapprorpriate mutations within the inner loop - which can happen if you've filtered the mutations that are used to those that are more likely to complain).
This is a change in behaviour from the original - instead of just ignoring mutations that are unavailable, the code now tries to meet the number that were requested. I chose to follow what appeared to be the intent of the code (that the number of mutators we plan to apply will be the number that we aim for, and that we retry rather than drop those attempts when they fail).
This comment has been minimized.
This comment has been minimized.
gerph
Jan 11, 2020
Author
Contributor
In case it's not clear, the inner loop is bounded so that we don't get stuck continually retrying. I originally was lazy and used while True and then when testing the filtering, generated a filter that left only a mutator that worked on a non-empty input, but the string always started out as empty when no dictionary or corpus was supplied, so it ran forever.
|
I'm still reviewing but all in all it looks good. Due to this being a pretty big change will you able to run the fuzzer (at least manually) on the examples in the repo and provide some benchmarks to see that after this change the fuzzer still finds the bugs + the speed stays the same more or less? |
|
@jvoisin do you have any comments on PR? feel free to take a look as well. |
|
I was intending on writing some tests to do that, but there wasn't a framework in place to check the behaviour at the time I started, and I wanted to get something fed back as I thought it useful. Then I rebased and a couple of tests turned up... so yeah, I think I can add in some tests and run the examples. I'm not sure that the performance matters hugely on a fuzz test, but so long as this hasn't introduced an order of magnitude worse a set of results, I'll consider it a win - you might feel differently though, I understand :-) [aside, it'd be really cool if there were CI to trigger the tests, but my experience is with non-github CI, so I won't be able to do that] |
|
I understand there is no framework in place yet so I don't expect any
automated tests just a few manual runs and benchmarks on the examples:)
just like you said as long as it's not slow by order of magnitude this is
acceptable.
…On Sat, Jan 11, 2020, 1:55 PM Charles Ferguson ***@***.***> wrote:
I was intending on writing some tests to do that, but there wasn't a
framework in place to check the behaviour at the time I started, and I
wanted to get something fed back as I thought it useful. Then I rebased and
a couple of tests turned up... so yeah, I think I can add in some tests and
run the examples.
I'm not sure that the performance matters hugely on a fuzz test, but so
long as this hasn't introduced an order of magnitude worse a set of
results, I'll consider it a win - you might feel differently though, I
understand :-)
[aside, it'd be really cool if there were CI to trigger the tests, but my
experience is with non-github CI, so I won't be able to do that]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=AD52CDUMMWOANSQPHOEEA33Q5GXSLA5CNFSM4KEA5B32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIWAKMI#issuecomment-573310257>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD52CDQPU77O3TGWFRBYYOTQ5GXSLANCNFSM4KEA5B3Q>
.
|
The mutator filters were being treated as a string, even when they were using the request for the defaults.
|
Quick sampling tests for performance using aifc: Prior to my change (Python 3 only):
After this change, using Python 3:
After this change, using Python 2:
So Python 2 is ~2x the speed of python 3 within these tests. That's interesting. The change of a few seconds for Python3 is entirely acceptable to me, and actually quite surprising - I was expecting a greater slowdown. Test system for completeness:
This also makes me think I could do with a little wrapper that automates this form of performance test so that I don't have to faff around with it. I might do something with that too. |
|
I've pushed a group of changes to clean up the existing tests, put a makefile to let me test it with Python 2 and 3 in use, and then fixed a small bug that the existing tests showed up. I'd like to introduce some more tests for the bits that are there and try running through all of the examples. I've only run the aif one so far as part of the above test, so I'd like to know that it's safe on the others. I'll come back to that later today. The newly added commits...
[edited to remove the Makefile and tests, which have been moved to a separate PR] |
|
Just starting to write some tests for the mutators... I'm pretty certain that the copy function doesn't do what users of it expect, or it's always being used wrongly. At the very least that's the confusion that @jvoisin had with dictionary, and MutatorRemoveRange definitely wasn't doing what it expected. I would like to split the unit tests into a separate PR, because it's quite likely that in adding the tests and fixing up the behaviour I don't expect I'm going to do something wrong, independant of the other changes here. Would this be ok? |
|
Sure, this sounds reasonable.
…On Sun, Jan 12, 2020, 1:16 AM Charles Ferguson ***@***.***> wrote:
Just starting to write some tests for the mutators... I'm pretty certain
that the copy function doesn't do what users of it expect, or it's always
being used wrongly. At the very least that's the confusion that @jvoisin
<https://github.com/jvoisin> had with dictionary, and MutatorRemoveRange
definitely wasn't doing what it expected.
I would like to split the unit tests into a separate PR, because it's
quite likely that in adding the tests and fixing up the behaviour I don't
expect I'm going to do something wrong, independant of the other changes
here. Would this be ok?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=AD52CDTH35HTSTGVIAZIKVTQ5JHM3A5CNFSM4KEA5B32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIWNH2I#issuecomment-573363177>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD52CDQROEZLEYRGNDXFN43Q5JHM3ANCNFSM4KEA5B3Q>
.
|
|
Since this PR is already quite large ( |
|
Ok, I'll split up what I can; I think some of these bits can be moved to entirely distinct PRs if they don't depend on what went before. |
The setup.py and requirements.txt had been stripped to make them work on both python2 and 3, without regard for keeping explicit statements of what it works with. This has been updated to specify the enviroment checks, so that we only install the requirements if they're needed on a given python version. Similarly, the python_requires has been updated to say "I'll take any 2.x version over 2.7, OR any 3.x version over 3.5.3" which better matches with the original requirements.
The dictionary text insert wasn't working properly, despite looking just fine. I think the reason is that the parameters on 'copy' are transposed. The parameter called 'src' is actually the destination, (and vice-versa). This caused much confusion when trying to work out what was happening there. This change fixes the behaviour of the dictionary insert, but defers fixing the parameter names until it can be confirmed what was meant and that the strings are being manipulated correctly. The change also corrects a mistaken format character in the mutator __repr__ method.
|
Ok that's reduced to |
|
PR #27, #28, #29 have been split off from this PR, because on checking them I found that they're entirely independant, and can be committed without reference to the changes in here. PR #30 and #31 are dependant on this and/or the mentioned PRs to come in, but allowing for that are independant and do separate things. Those two are marked as draft PRs, as they merge to master with the inclusion of all the changes in the dependant PRs - once their dependants have been committed, their diffs reduce to just the commits that they add. With that, I shall do nothing further on this PR unless changes are requested. I realise this has been a bit of a flurry of new things and then reworking the PR, which is disruptive. I hope that the result here, and in the other changes is acceptable, but if not, I won't be offended :-D |
|
I'm closing this PR as it was split to different PRs. |
|
It wasn't split into other PRs; the parts of it were split off which could be, leaving the parts that could not easily be split away. I cannot see a button to reopen the PR, so I can re-push this as a different PR if you'd prefer. |
|
Sorry, I'll reopen this.
…On Sat, Jan 18, 2020, 3:17 PM Charles Ferguson ***@***.***> wrote:
It wasn't split into other PRs; the parts of it were split off which could
be, leaving the parts that could not easily be split away. I cannot see a
button to reopen the PR, so I can re-push this as a different PR if you'd
prefer.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=AD52CDQTROFYSZJKZRNQNDTQ6L6ODA5CNFSM4KEA5B32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJJYAYI#issuecomment-575897697>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD52CDVSONOWH4SK2UT43GDQ6L6ODANCNFSM4KEA5B3Q>
.
|
gerph commentedJan 8, 2020
This change incorporates a number of changes which I made to test a project which is stuck on Python 2 - the project uses a Python 2 focused library which has not yet been upgraded to Python 3. The changes here add python 2 support, refactor and add in features that I needed to test the tool I was working on. The individual commits are more descriptive and step through the changes, but the following summarises what was done and why:
Support for Python 2. This has been somewhat rough and ready, in just dropping the python 3 version requirements and adding in support where it was needed for python 2-specific things:
functools32python module to give its functionality.The 'NEW' report now reports after it has updated the coverage value, so that you see what the new coverage value is immediately, not on the new NEW or PULSE report.
Because I want to be able to add new mutators to be able to test a string-based expression evaluator, I didn't want to have to deal with the inline conditional list of mutators that are fixed in the corpus module.
Noneto indicate that they are not appropriate for the mutation. This addresses a bug in the original where it would attempt to retry by decrementing the iteration counter, which was ineffective as the counter is just a value from the range generator function.bytewill select just thebytemutators, whilstbyte -asciiwould select only the byte mutators that aren't alsoasciimutators.Integrated @jvoisin dictionary support, and enhanced. Because I'm testing an expression evaluator, having a collection of strings which are operands and operators in a dictionary and being able to get the fuzzer to use those as sequences to introduce seemed like an excellent idea. And I saw that jvoisin had already done it, and didn't see any point in reinventing it.
Tested with a bunch of my code and found a few bugs with it - thank you.
I wanted to add some tests for the new bits, to make sure that things were working, but I was more keen to continue my testing and pass this upstream in case it's useful to you. If not, then I have managed to solve a number of my own problems and have a really useful tool. I'm not sure that the python 2 changes are as useful to others, but suspect that the refactored mutators are more so. Dictionary support is all a credit to the original author as it made the testing I was doing MUCH easier.