New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Not an issue right now) Handle multiple returns #53
Comments
Okay, so, I was going to suggest we do something to sort of OR all the different return values, such that if one was tainted then we would say it returned a tainted value. (Maybe this could be accomplished with e.g. But, it turns out there's another issue that we would need to fix for this to even be an issue, because we currently do the ORing above even when we should not, on all variables e.g. def maybe_return_the_arg(foo):
whatup = foo
whatup = 'no vuln'
return whatup says the return value is tainted. Where it should only say it's tainted when e.g. def maybe_return_the_arg(foo):
if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
return 'huh' + foo
else:
return 'apple' if we decide to fix the 1st example in this comment, then I'm pretty sure we'll introduce a false-negative in the 2nd example, because it will see Overall, finding out that the 2nd example works right now, and thinking the 1st example is contrived and people don't normally write code like that, I care a lot less about any of this. But I'm glad I doc'd it ;) If we were going to fix the first example, it would probably be in https://github.com/python-security/pyt/blob/master/pyt/reaching_definitions_taint.py but I don't plan on working on this any time soon. |
To do the ORing ( e.g. |
Disclaimer: I may be saying things here that are known by the contributors to pyt. I'm sorry if this is just too obvious at this point. I think this is an example of flow/path insensitivity, right? The first example says that pyt is flow insensitive, and the second one that pyt is path insensitive. Most commercial analyzers are flow sensitive, but not path sensitive due to the exponential explosion of path constraints involved. I just started looking into In the example: def maybe_return_the_arg(foo):
whatup = foo
whatup = 'no vuln'
return whatup we would need to assign each assignment to Although I agree with @KevinHock that code like the above is rare, this issue can materialize in different ways that are more plausible. value = request.form.get("value")
while(some_criteria(value)):
r = sink(value)
value = r name = request.form.get('name')
name = sanitize(name)
sink(name) I'd argue those examples, specially the second one warrant trying to address this issue. On the other hand, in the example: def maybe_return_the_arg(foo):
if foo.startswith('h'): # Ignore the actual condition, we can statically SMT solve a year from now.
return 'huh' + foo
else:
return 'apple' we would need to add a path condition to each constraint. That will allow pyt to solve for those constraints that have compatible path conditions (e.g. Again sorry if I'm being captain obvious or too naive. I'm just trying to understand how pyt works and finding areas I can contribute to and I got to this open issue. |
You're definitely right about everything you just said @adrianbn! Thank you for contributing to and being interested in PyT :) It turns out that PyT currently is flow-sensitive, and I didn't update this GitHub issue, my bad! 🤐 Here is where the vulnerability finding happens, I initially thought I fixed this when I added def-use chains, in this PR, but I put a print statement after Repro: I re-read what I had written in this issue and thought, "That seems weird, we should kill the previous definition? Maybe it is specific to return values of functions." I went to that part of the code, and figured I'd run it on an example to sanity check. Before changing, we run: (code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
> User input at line 15, source "form[":
param = request.form['suggestion']
Reassigned in:
File: examples/vulnerable_code/command_injection.py
> Line 16: command = 'echo ' + param + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
> reaches line 18, sink "subprocess.call(":
~call_1 = ret_subprocess.call(command, shell=True) After changing, let's run: (code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..8025f84 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -13,6 +13,7 @@ def index():
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
+ param = 'kill that def!'
command = 'echo ' + param + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True)
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
No vulnerabilities found. So that's good, maybe this is specific to return values, let me try my example. (code) Kevins-MacBook-Pro:pyt kevin$ git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..7745b0a 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
return render_template('command_injection.html', menu=menu)
+def is_it_just_return_values(tainted_param):
+ ret_me = tainted_param
+ ret_me = 'no vuln'
+ return ret_me
+
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
- command = 'echo ' + param + ' >> ' + 'menu.txt'
+ hey = is_it_just_return_values(param)
+ command = 'echo ' + hey + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True)
(code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
No vulnerabilities found. Okay, so that's good, I should have updated this Github issue. Comment out the overwriting assignment: (code) Kevins-MacBook-Pro:pyt kevin$ clear;git diff
diff --git a/examples/vulnerable_code/command_injection.py b/examples/vulnerable_code/command_injection.py
index 44c45d9..02d8ba8 100644
--- a/examples/vulnerable_code/command_injection.py
+++ b/examples/vulnerable_code/command_injection.py
@@ -10,10 +10,16 @@ def index():
return render_template('command_injection.html', menu=menu)
+def is_it_just_return_values(tainted_param):
+ ret_me = tainted_param
+ # ret_me = 'no vuln'
+ return ret_me
+
@app.route('/menu', methods=['POST'])
def menu():
param = request.form['suggestion']
- command = 'echo ' + param + ' >> ' + 'menu.txt'
+ hey = is_it_just_return_values(param)
+ command = 'echo ' + hey + ' >> ' + 'menu.txt'
subprocess.call(command, shell=True) and we get (code) Kevins-MacBook-Pro:pyt kevin$ python -m pyt examples/vulnerable_code/command_injection.py
1 vulnerability found:
Vulnerability 1:
File: examples/vulnerable_code/command_injection.py
> User input at line 20, source "form[":
param = request.form['suggestion']
Reassigned in:
File: examples/vulnerable_code/command_injection.py
> Line 13: save_1_param = param
File: examples/vulnerable_code/command_injection.py
> Line 21: temp_1_tainted_param = param
File: examples/vulnerable_code/command_injection.py
> Line 13: tainted_param = temp_1_tainted_param
File: examples/vulnerable_code/command_injection.py
> Line 14: ret_me = tainted_param
File: examples/vulnerable_code/command_injection.py
> Line 16: ret_is_it_just_return_values = ret_me
File: examples/vulnerable_code/command_injection.py
> Line 21: ~call_1 = ret_is_it_just_return_values
File: examples/vulnerable_code/command_injection.py
> Line 21: hey = ~call_1
File: examples/vulnerable_code/command_injection.py
> Line 22: command = 'echo ' + hey + ' >> ' + 'menu.txt'
File: examples/vulnerable_code/command_injection.py
> reaches line 24, sink "subprocess.call(":
~call_2 = ret_subprocess.call(command, shell=True) 💥 |
I'm not certain what constraints/qualifiers mean, but the we mostly do reaching definitions + fixed point. I wrote some documentation in this readme, but let me know if you'd like me to clarify anything :D
We sort of kind of do that with the following: pyt/pyt/vulnerabilities/vulnerabilities.py Lines 431 to 432 in 4b495ad
pyt/pyt/vulnerabilities/vulnerabilities.py Lines 311 to 312 in 4b495ad
No worries! I'm glad you are interested @adrianbn 😁 One area you could contribute to, although it's rather difficult, is that we currently use inlining instead of summaries, for inter-procedural analysis, which makes PyT slower than it needs to be. Here are some videos, specifically the last one, explains function summaries well: Feel free to ping me with any questions. My current open PR tries to address certain expressions, so we can handle |
We could try to turn Python code into SSA, and prune it via liveness, and add predicates so it's PSSA, and try to solve those predicates with e.g. |
I'm gonna go ahead and close this issue 👍 b/c it was solved before. |
I'll try to work on this relatively soon, butto think out loud..In interprocedural_cfg.py, we have
which cleaned is
Firstly, the for loop and the if statement seem to just serve the purpose of "Is there a node of type Return or Raise in the function?" But I think all functions should have at least one return node, right? I'm not sure if I understand the original intention that well e.g. what was going to be in the else?
Secondly, here is an example to illustrate the problem/need to handle multiple returns:
TODO
The text was updated successfully, but these errors were encountered: