-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Add example for how to write your own sanitizer #2889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Add example for how to write your own sanitizer #2889
Conversation
ec1854d
to
083dd43
Compare
The biggest thing I thing is needed here is a lot more written word. There's plenty of code but very little explanation. If this is to be an example for how to write a sanitizer, then a person who's never written one before will need to have a lot of framing context, lots of explanations, etc. for the overall picture of what's going on, and the reasons behind each part of the custom sanitizer. |
Agreed, that why I said Hopefully with little effort this can be added to our documentation at some point 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried about the performance of this. Apart from that, everything looks good.
Argh. Github ate my comment, it seems. Welp. Let me redo all of that again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made (through a bunch of suggestions, which probably look completely incomprehensible) the changes I think would be necessary in order to remove the final_test
argument, as I suggested in my previous review.
Note that even if you accept the suggestions, you'll probably need to autoformat the file again.
Too messy with all the separate suggestions. I think I've figured out a better way of doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, take two, wherein I put all of the suggestions in the same comment. Much better.
python/ql/test/library-tests/examples/custom-sanitizer/Taint.qll
Outdated
Show resolved
Hide resolved
rewrote the qldoc to explain it as well.
otherwise the helper predicate can (and sometimes will) be evaluated once _per_ instance of that class.
As far as I understand, after applying your suggestion, we will not change the number of tuples in the result of Anyway, to get back to the number of tuples:For a deeply nested chain of
In the new version we will get:
|
For the case where we consider only a call to This is because the "magic" context |
Right. So in the general case, including However, in our specific case, since we always use |
I think this is wrong. The base case doesn't know anything about the recursive case, because the base case precedes the recursive case when we're evaluating bottom-up. The right way to approach this is to look at each disjunct in isolation. So, for just the first disjunct, the body of the predicate is as follows: edge_refinement.getTest().getNode().(Expr).getASubExpression*() = test.getNode() and
test.getNode().(Expr).getASubExpression*() = final_test.getNode() and
final_test = test and
final_test = Value::named("test.is_safe").getACall() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
sense = true So what does this tell us? Let's rewrite it a bit to remove some redundancies: test.getNode().(Expr).getASubExpression*() = final_test.getNode() becomes (since final_test.getNode().(Expr).getASubExpression*() = final_test.getNode() But this we would expect to hold for any edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and
final_test = test and
sense = true We can also (basically) ignore the last two lines, as these only serve to fix some of the parts of the tuples we're producing, and cannot result in further tuples themselves. So the final form is edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and Because this is (essentially) one of the disjuncts of the original predicate, whatever tuples the above produces will be part of the final predicate. And there's nothing that limits (In fact, this now makes me think that your version had a bug in it — it would accept, say, |
Thanks for the detailed explanation. Glad to slightly improve my intuitive understanding of QL evaluation a bit more. I need to confess that you had me at
but the detailed explanation was very thoughtful ❤️ My version had a bug? 🐛
I don't think so. The original In the fixed version, |
I hope that we can gather a set of examples for how to achieve common things with the Python Library. Hopefully with little effort this can be added to our documentation at some point 😉
I think maybe this example went a little overboard, but the idea is to show how to achieve something common (in this case writing your own sanitizer), and which cases our analysis can and cannot handle. Let me know your thoughts 😊
This PR
I tried to improve the testing of taint even more, so now it will show if the actual result matches the expected. It only works for the binary choice of has taint vs. doesn't have taint, but so far I'm happy (again, makes it much easier to eyeball if everything is correct).
SanitizedEdges
EDIT: Looking back I have absolutely no clue what this is about 😕
I'm a bit confused by the fact that the SanitizedEdges contain
is there some CFG-splitting going on here? How do you even see the resulting ESSA?
showflow
I guess? (will investigate a bit more tomorrow)