-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Aiohttp improvements #13731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Aiohttp improvements #13731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me 👍
The one major thing about the addition of remote-flow-sources based off type-annotations is that we will introduce new shorter paths. In the example below, it's much more interesting to see the whole path starting at handler
with a bad sanitizer, than seeing the path that only starts at the req
parameter in foo
. Would like to think a little about this aspect 🤔
@routes.post(...)
async def handler(request: aiotthp.web.Request):
if request.host == "127.0.0.1":
foo(request)
...
async def foo(req: aiohttp.web.Request):
db.cursor().execute(await req.content.read())
Can you please add tests? I expect in these two files
- python/ql/test/library-tests/frameworks/aiohttp/client_request.py
- pyhton/ql/test/library-tests/frameworks/aiohttp/response_test.py
- python/ql/test/library-tests/frameworks/aiohttp/taint_test.py (for the new remote-flow-source)
Actually, we should also add a test in python/ql/test/library-tests/frameworks/aiohttp/taint_test.py (for the new remote-flow-source) |
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Thanks @RasmusWL, I added the tests, but not sure how the inline tests expect. Can you please take a look? |
But notice that keyword argument is not handled yet
However, notice that the concepts tests use the HttpResponse location for the `responseBody` tag, which seems a little odd in this situation, where they are actually separate. Will fix in next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I fixed up tests a bit (checking if we handle keyword arguments as well), and while I had the code checked out locally, I improved the modeling a little more as well.
Overall this LGTM, but let us just think a little more about the potential problem with what paths are shwon before merging. (will also run a performance test, just to be sure)
Thanks for fixing the tests @RasmusWL! As for the heuristic sources, in JS they have in a different file that you can enable to opt-in for more results. Perhaps thats a way to implement it until the subflow filtering is implemented |
I think calling it a "heuristic sources" is not doing enough justice to what you're doing here. For me, a heuristic source would be something like: "we see that a parameter is called request, so it's probably an incoming HTTP request, and we'll model all attributes as tainted". So I think 100% we should include the new sources, it's just a matter of question how to do it. |
Evaluation was fine 👍 I've made a minor naming update to highlight that this is not a heuristic source, and changed the char-pred so it's not overlapping with |
I am a bit worried about the ability to jump into the middle of a path; not just for display reasons, but also for missing upstream sanitisers. If there is no easy way to fix it now, we should probably merge as is, though... |
I understand the concern, but this is something we already have problems with when the sanitizer is a middleware/filter function |
But that is mainly because we do not have a model of middleware 🙂 I would love a list of examples of this, though, since we might look into this soon... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requested changes are buried in comments. To be clear, they are
- adding a test showing that we can now skip sanitisers by jumping past them to a type-annotated node.
(We will accept this for now and merge the PR once we have a test documenting the behaviour.) - simplifying using
getSubscript
exists(DataFlow::Node headers, Dict d | | ||
headers = this.getArgByName("headers").getALocalSource() | ||
| | ||
headers.asExpr() = d and | ||
d.getAKey().(StrConst).getText().toLowerCase() = "content-type" and | ||
d.getAValue() = result.asExpr() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the toLowerCase
is crucial, you can simply do
exists(DataFlow::Node headers, Dict d | | |
headers = this.getArgByName("headers").getALocalSource() | |
| | |
headers.asExpr() = d and | |
d.getAKey().(StrConst).getText().toLowerCase() = "content-type" and | |
d.getAValue() = result.asExpr() | |
) | |
result = | |
this.getKeywordParameter("headers").getSubscript("content-type").getAValueReachingSink() |
If toLowerCase
is crucial, we should add it either as default or as an option to getSubscript
(say getSubscriptCaseInsensitive
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP headers are case insensitive, so we would need to implement the getSubscriptCaseInsensitive
predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now you can perhaps do
exists(string key | key.toLowerCase() = "content-type" |
result =
this.getKeywordParameter("headers").getSubscript(key).getAValueReachingSink()
)
DataFlow::ParameterNode, RemoteFlowSource::Range | ||
{ | ||
AiohttpRequestParamFromTypeAnnotation() { | ||
not this instanceof AiohttpRequestHandlerRequestParam and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the API graph overhaul has been merged, we should be able to change like this:
not this instanceof AiohttpRequestHandlerRequestParam and | |
not this = any(AiohttpRequestHandlerRequestParam p).track() and |
This should remove the problem of jumping into the middle of a path.
Until then, it would be good to have a test illustrating the problem (similar to the example given, but with a proper sanitiser so it is actually an FP); then we can see the FP disappear when we make this change.
(And with the overhaul, we should also be able to make these InstanceSource
s a fair bit nicer, I imagine.)
This is an example of a security filter middleware used in Home Assistant and therefore this code is called before dispatching the request to any request handler |
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, given the tests all pass..
Few improvements for the aiohttp framework support:
FileResponse
ClientSession.ws_connect