Python: Add example for how to write your own sanitizer #2889

RasmusWL · 2020-02-20T16:06:56Z

I hope that we can gather a set of examples for how to achieve common things with the Python Library. Hopefully with little effort this can be added to our documentation at some point 😉

I think maybe this example went a little overboard, but the idea is to show how to achieve something common (in this case writing your own sanitizer), and which cases our analysis can and cannot handle. Let me know your thoughts 😊

This PR

I tried to improve the testing of taint even more, so now it will show if the actual result matches the expected. It only works for the binary choice of has taint vs. doesn't have taint, but so far I'm happy (again, makes it much easier to eyeball if everything is correct).

SanitizedEdges

EDIT: Looking back I have absolutely no clue what this is about 😕

I'm a bit confused by the fact that the SanitizedEdges contain

| MySanitizerHandlingNot | externally controlled string | test.py:40 | Pi(s_12) [false] |

is there some CFG-splitting going on here? How do you even see the resulting ESSA? showflow I guess? (will investigate a bit more tomorrow)

BekaValentine · 2020-03-12T05:25:34Z

The biggest thing I thing is needed here is a lot more written word. There's plenty of code but very little explanation. If this is to be an example for how to write a sanitizer, then a person who's never written one before will need to have a lot of framing context, lots of explanations, etc. for the overall picture of what's going on, and the reasons behind each part of the custom sanitizer.

RasmusWL · 2020-03-12T09:15:29Z

The biggest thing I thing is needed here is a lot more written word. There's plenty of code but very little explanation. If this is to be an example for how to write a sanitizer, then a person who's never written one before will need to have a lot of framing context, lots of explanations, etc. for the overall picture of what's going on, and the reasons behind each part of the custom sanitizer.

Agreed, that why I said Hopefully with little effort this can be added to our documentation at some point 😉

tausbn

I'm slightly worried about the performance of this. Apart from that, everything looks good.

tausbn · 2020-03-24T14:38:29Z

Argh. Github ate my comment, it seems. Welp. Let me redo all of that again.

python/ql/test/library-tests/examples/custom-sanitizer/Taint.qll

tausbn

I have made (through a bunch of suggestions, which probably look completely incomprehensible) the changes I think would be necessary in order to remove the final_test argument, as I suggested in my previous review.

Note that even if you accept the suggestions, you'll probably need to autoformat the file again.

Too messy with all the separate suggestions. I think I've figured out a better way of doing it.

tausbn

Okay, take two, wherein I put all of the suggestions in the same comment. Much better.

python/ql/test/library-tests/examples/custom-sanitizer/Taint.qll

rewrote the qldoc to explain it as well.

otherwise the helper predicate can (and sometimes will) be evaluated once _per_ instance of that class.

RasmusWL · 2020-03-30T16:40:40Z

As far as I understand, after applying your suggestion, we will not change the number of tuples in the result of clears_taint_on_true. We will reduce the number of elements in the tuple (which I guess is good), and I like your suggestion better, so it's an improvement overall 👍

Anyway, to get back to the number of tuples:

For a deeply nested chain of not( not( ... not( not is_safe(s) ) ... ) ) that is used in a edge_refinement in my version we would get the result tuples:

(CallNode for is_safe, UnaryExprNode for 'not' at depth 0, false, edge_refinement)
(CallNode for is_safe, UnaryExprNode for 'not' at depth 1, true, edge_refinement)
...
(CallNode for is_safe, UnaryExprNode for 'not' at depth n, true, edge_refinement)
(CallNode for is_safe, CallNode for is_safe, false, edge_refinement)

In the new version we will get:

(UnaryExprNode for 'not' at depth 0, false, edge_refinement)
(UnaryExprNode for 'not' at depth 1, true, edge_refinement)
...
(UnaryExprNode for 'not' at depth n, true, edge_refinement)
(CallNode for is_safe, false, edge_refinement)

tausbn · 2020-03-30T20:58:37Z

For the case where we consider only a call to is_safe nested inside a bunch of nots, I agree that the number of tuples should be the same. More generally, I think the version with final_test could have more tuples. For instance, if we had not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe, then I think you would end up with more tuples.

This is because the "magic" context test.getNode().(Expr).getASubExpression*() = final_test.getNode() captures more than just "nested sequence of nots" — getASubExpression* is too general.

RasmusWL · 2020-03-31T09:28:10Z

For instance, if we had not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe, then I think you would end up with more tuples.

Right. So in the general case, including final_test is not a good approach. And since this example should (hopefully) be used as a recipe, we should handle the general case properly.

However, in our specific case, since we always use
nested_test = test.(UnaryExprNode).getOperand() in the recursive-case, and
test = Value::named("test.is_safe").getACall() in the base-case, we would never match on not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe.

tausbn · 2020-03-31T09:56:35Z

However, in our specific case, since we always use
nested_test = test.(UnaryExprNode).getOperand() in the recursive-case, and
test = Value::named("test.is_safe").getACall() in the base-case, we would never match on not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe.

I think this is wrong. The base case doesn't know anything about the recursive case, because the base case precedes the recursive case when we're evaluating bottom-up. The right way to approach this is to look at each disjunct in isolation. So, for just the first disjunct, the body of the predicate is as follows:

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = test.getNode() and
test.getNode().(Expr).getASubExpression*() = final_test.getNode() and
final_test = test and
final_test = Value::named("test.is_safe").getACall() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
sense = true

So what does this tell us? Let's rewrite it a bit to remove some redundancies:

test.getNode().(Expr).getASubExpression*() = final_test.getNode()

becomes (since final_test = test):

final_test.getNode().(Expr).getASubExpression*() = final_test.getNode()

But this we would expect to hold for any final_test that happens to be an Expr (and that's all of them as far as we're concerned), so we can just elide this.
So we have (after a bit of reordering)

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and
final_test = test and
sense = true

We can also (basically) ignore the last two lines, as these only serve to fix some of the parts of the tuples we're producing, and cannot result in further tuples themselves. So the final form is

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and

Because this is (essentially) one of the disjuncts of the original predicate, whatever tuples the above produces will be part of the final predicate. And there's nothing that limits final_test in any way other than requiring it to be inside the Expr corresponding to the edge_refinement.

(In fact, this now makes me think that your version had a bug in it — it would accept, say, not(not(is_safe(x) or True)) as sanitizing x, even though it doesn't matter what is_safe(x) returns in this case. Though this might also have been caught by pruning.)

RasmusWL · 2020-03-31T11:24:09Z

Thanks for the detailed explanation. Glad to slightly improve my intuitive understanding of QL evaluation a bit more.

I need to confess that you had me at

I think this is wrong. The base case doesn't know anything about the recursive case, because the base case precedes the recursive case when we're evaluating bottom-up.

but the detailed explanation was very thoughtful ❤️

My version had a bug? 🐛

(In fact, this now makes me think that your version had a bug in it — it would accept, say, not(not(is_safe(x) or True)) as sanitizing x, even though it doesn't matter what is_safe(x) returns in this case. Though this might also have been caught by pruning.)

I don't think so. The original clears_taint_on_true would hold when using
not(not(is_safe(x) or True)) for test, giving us is_safe(x) or True as final_test. However, sanitizingEdge required that final_test must be a CallNode, so sanitizingEdge wouldn't hold.

In the fixed version, clears_taint_on_true would not hold when using not(not(is_safe(x) or True)) for test, since final_test is required to be a CallNode (and it is is_safe(x) or True).

RasmusWL added the Python label Feb 20, 2020

RasmusWL requested a review from a team as a code owner February 20, 2020 16:06

RasmusWL assigned tausbn Feb 20, 2020

Python: Add example for how to write your own sanitizer

083dd43

RasmusWL force-pushed the python-add-custom-sanitizer-example branch from ec1854d to 083dd43 Compare February 21, 2020 13:29

RasmusWL requested review from BekaValentine and tausbn and removed request for a team March 11, 2020 14:40

tausbn requested changes Mar 24, 2020

View reviewed changes

tausbn reviewed Mar 24, 2020

View reviewed changes

python/ql/test/library-tests/examples/custom-sanitizer/Taint.qll Show resolved Hide resolved

RasmusWL added 2 commits March 25, 2020 11:55

Python: Reduce result set in custom taint sanitizer

12c6997

Python: Autoformat

dc9dbf3

RasmusWL requested a review from tausbn March 30, 2020 09:25

tausbn previously requested changes Mar 30, 2020

View reviewed changes

tausbn requested changes Mar 30, 2020

View reviewed changes

python/ql/test/library-tests/examples/custom-sanitizer/Taint.qll Outdated Show resolved Hide resolved

RasmusWL added 3 commits March 30, 2020 18:29

Python: Apply suggestion from Taus

663dc24

rewrote the qldoc to explain it as well.

Python: Move helper predicate outside of class

fad03e7

otherwise the helper predicate can (and sometimes will) be evaluated once _per_ instance of that class.

Python: Fixup comment alignment

6127d8b

tausbn approved these changes Mar 30, 2020

View reviewed changes

tausbn merged commit e31143c into github:master Mar 30, 2020

RasmusWL deleted the python-add-custom-sanitizer-example branch March 31, 2020 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Add example for how to write your own sanitizer #2889

Python: Add example for how to write your own sanitizer #2889

Uh oh!

RasmusWL commented Feb 20, 2020 •

edited

Loading

Uh oh!

BekaValentine commented Mar 12, 2020

Uh oh!

RasmusWL commented Mar 12, 2020

Uh oh!

tausbn left a comment

Uh oh!

tausbn commented Mar 24, 2020

Uh oh!

Uh oh!

tausbn left a comment

Uh oh!

tausbn left a comment

Uh oh!

Uh oh!

RasmusWL commented Mar 30, 2020

Uh oh!

tausbn commented Mar 30, 2020

Uh oh!

RasmusWL commented Mar 31, 2020

Uh oh!

tausbn commented Mar 31, 2020

Uh oh!

RasmusWL commented Mar 31, 2020

Uh oh!

Uh oh!

Python: Add example for how to write your own sanitizer #2889

Python: Add example for how to write your own sanitizer #2889

Uh oh!

Conversation

RasmusWL commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

SanitizedEdges

Uh oh!

BekaValentine commented Mar 12, 2020

Uh oh!

RasmusWL commented Mar 12, 2020

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

tausbn commented Mar 24, 2020

Uh oh!

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RasmusWL commented Mar 30, 2020

Anyway, to get back to the number of tuples:

Uh oh!

tausbn commented Mar 30, 2020

Uh oh!

RasmusWL commented Mar 31, 2020

Uh oh!

tausbn commented Mar 31, 2020

Uh oh!

RasmusWL commented Mar 31, 2020

My version had a bug? 🐛

Uh oh!

Uh oh!

RasmusWL commented Feb 20, 2020 •

edited

Loading