Skip to content

Python: Add example for how to write your own sanitizer #2889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 30, 2020

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Feb 20, 2020

I hope that we can gather a set of examples for how to achieve common things with the Python Library. Hopefully with little effort this can be added to our documentation at some point 😉

I think maybe this example went a little overboard, but the idea is to show how to achieve something common (in this case writing your own sanitizer), and which cases our analysis can and cannot handle. Let me know your thoughts 😊


This PR

I tried to improve the testing of taint even more, so now it will show if the actual result matches the expected. It only works for the binary choice of has taint vs. doesn't have taint, but so far I'm happy (again, makes it much easier to eyeball if everything is correct).

SanitizedEdges

EDIT: Looking back I have absolutely no clue what this is about 😕

I'm a bit confused by the fact that the SanitizedEdges contain

| MySanitizerHandlingNot | externally controlled string | test.py:40 | Pi(s_12) [false] |

is there some CFG-splitting going on here? How do you even see the resulting ESSA? showflow I guess? (will investigate a bit more tomorrow)

@RasmusWL RasmusWL requested a review from a team as a code owner February 20, 2020 16:06
@RasmusWL RasmusWL force-pushed the python-add-custom-sanitizer-example branch from ec1854d to 083dd43 Compare February 21, 2020 13:29
@RasmusWL RasmusWL requested review from BekaValentine and tausbn and removed request for a team March 11, 2020 14:40
@BekaValentine
Copy link
Contributor

The biggest thing I thing is needed here is a lot more written word. There's plenty of code but very little explanation. If this is to be an example for how to write a sanitizer, then a person who's never written one before will need to have a lot of framing context, lots of explanations, etc. for the overall picture of what's going on, and the reasons behind each part of the custom sanitizer.

@RasmusWL
Copy link
Member Author

The biggest thing I thing is needed here is a lot more written word. There's plenty of code but very little explanation. If this is to be an example for how to write a sanitizer, then a person who's never written one before will need to have a lot of framing context, lots of explanations, etc. for the overall picture of what's going on, and the reasons behind each part of the custom sanitizer.

Agreed, that why I said Hopefully with little effort this can be added to our documentation at some point 😉

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about the performance of this. Apart from that, everything looks good.

@tausbn
Copy link
Contributor

tausbn commented Mar 24, 2020

Argh. Github ate my comment, it seems. Welp. Let me redo all of that again.

@RasmusWL RasmusWL requested a review from tausbn March 30, 2020 09:25
tausbn
tausbn previously requested changes Mar 30, 2020
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made (through a bunch of suggestions, which probably look completely incomprehensible) the changes I think would be necessary in order to remove the final_test argument, as I suggested in my previous review.

Note that even if you accept the suggestions, you'll probably need to autoformat the file again.

@tausbn tausbn dismissed their stale review March 30, 2020 13:05

Too messy with all the separate suggestions. I think I've figured out a better way of doing it.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, take two, wherein I put all of the suggestions in the same comment. Much better.

rewrote the qldoc to explain it as well.
otherwise the helper predicate can (and sometimes will) be evaluated once _per_
instance of that class.
@RasmusWL
Copy link
Member Author

As far as I understand, after applying your suggestion, we will not change the number of tuples in the result of clears_taint_on_true. We will reduce the number of elements in the tuple (which I guess is good), and I like your suggestion better, so it's an improvement overall 👍

Anyway, to get back to the number of tuples:

For a deeply nested chain of not( not( ... not( not is_safe(s) ) ... ) ) that is used in a edge_refinement in my version we would get the result tuples:

  • (CallNode for is_safe, UnaryExprNode for 'not' at depth 0, false, edge_refinement)
  • (CallNode for is_safe, UnaryExprNode for 'not' at depth 1, true, edge_refinement)
  • ...
  • (CallNode for is_safe, UnaryExprNode for 'not' at depth n, true, edge_refinement)
  • (CallNode for is_safe, CallNode for is_safe, false, edge_refinement)

In the new version we will get:

  • (UnaryExprNode for 'not' at depth 0, false, edge_refinement)
  • (UnaryExprNode for 'not' at depth 1, true, edge_refinement)
  • ...
  • (UnaryExprNode for 'not' at depth n, true, edge_refinement)
  • (CallNode for is_safe, false, edge_refinement)

@tausbn
Copy link
Contributor

tausbn commented Mar 30, 2020

For the case where we consider only a call to is_safe nested inside a bunch of nots, I agree that the number of tuples should be the same. More generally, I think the version with final_test could have more tuples. For instance, if we had not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe, then I think you would end up with more tuples.

This is because the "magic" context test.getNode().(Expr).getASubExpression*() = final_test.getNode() captures more than just "nested sequence of nots" — getASubExpression* is too general.

@tausbn tausbn merged commit e31143c into github:master Mar 30, 2020
@RasmusWL RasmusWL deleted the python-add-custom-sanitizer-example branch March 31, 2020 09:20
@RasmusWL
Copy link
Member Author

For instance, if we had not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe, then I think you would end up with more tuples.

Right. So in the general case, including final_test is not a good approach. And since this example should (hopefully) be used as a recipe, we should handle the general case properly.

However, in our specific case, since we always use
nested_test = test.(UnaryExprNode).getOperand() in the recursive-case, and
test = Value::named("test.is_safe").getACall() in the base-case, we would never match on not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe.

@tausbn
Copy link
Contributor

tausbn commented Mar 31, 2020

However, in our specific case, since we always use
nested_test = test.(UnaryExprNode).getOperand() in the recursive-case, and
test = Value::named("test.is_safe").getACall() in the base-case, we would never match on not(not(...not(E)...)) where E is some expression involving many occurrences of is_safe.

I think this is wrong. The base case doesn't know anything about the recursive case, because the base case precedes the recursive case when we're evaluating bottom-up. The right way to approach this is to look at each disjunct in isolation. So, for just the first disjunct, the body of the predicate is as follows:

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = test.getNode() and
test.getNode().(Expr).getASubExpression*() = final_test.getNode() and
final_test = test and
final_test = Value::named("test.is_safe").getACall() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
sense = true

So what does this tell us? Let's rewrite it a bit to remove some redundancies:

test.getNode().(Expr).getASubExpression*() = final_test.getNode()

becomes (since final_test = test):

final_test.getNode().(Expr).getASubExpression*() = final_test.getNode()

But this we would expect to hold for any final_test that happens to be an Expr (and that's all of them as far as we're concerned), so we can just elide this.
So we have (after a bit of reordering)

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and
final_test = test and
sense = true

We can also (basically) ignore the last two lines, as these only serve to fix some of the parts of the tuples we're producing, and cannot result in further tuples themselves. So the final form is

edge_refinement.getTest().getNode().(Expr).getASubExpression*() = final_test.getNode() and
edge_refinement.getInput().getAUse() = final_test.getAnArg() and
final_test = Value::named("test.is_safe").getACall() and

Because this is (essentially) one of the disjuncts of the original predicate, whatever tuples the above produces will be part of the final predicate. And there's nothing that limits final_test in any way other than requiring it to be inside the Expr corresponding to the edge_refinement.

(In fact, this now makes me think that your version had a bug in it — it would accept, say, not(not(is_safe(x) or True)) as sanitizing x, even though it doesn't matter what is_safe(x) returns in this case. Though this might also have been caught by pruning.)

@RasmusWL
Copy link
Member Author

Thanks for the detailed explanation. Glad to slightly improve my intuitive understanding of QL evaluation a bit more.

I need to confess that you had me at

I think this is wrong. The base case doesn't know anything about the recursive case, because the base case precedes the recursive case when we're evaluating bottom-up.

but the detailed explanation was very thoughtful ❤️

My version had a bug? 🐛

(In fact, this now makes me think that your version had a bug in it — it would accept, say, not(not(is_safe(x) or True)) as sanitizing x, even though it doesn't matter what is_safe(x) returns in this case. Though this might also have been caught by pruning.)

I don't think so. The original clears_taint_on_true would hold when using
not(not(is_safe(x) or True)) for test, giving us is_safe(x) or True as final_test. However, sanitizingEdge required that final_test must be a CallNode, so sanitizingEdge wouldn't hold.

In the fixed version, clears_taint_on_true would not hold when using not(not(is_safe(x) or True)) for test, since final_test is required to be a CallNode (and it is is_safe(x) or True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants