Skip to content

Conversation

RasmusWL
Copy link
Member

Encountered this in Gruyere. It was not essential to anything, but we should handle it none the less.

reference: https://docs.python.org/3.8/library/urllib.parse.html#urllib.parse.parse_qs

@RasmusWL RasmusWL requested a review from a team as a code owner April 22, 2020 06:57
@RasmusWL RasmusWL requested review from BekaValentine and removed request for a team April 22, 2020 06:57
@BekaValentine
Copy link
Contributor

BekaValentine commented Apr 22, 2020

This all looks good. I take it that we were unable to track taint through the definitions of any of these things, or if we could, it wasn't enough b/c of not being able to analyze certain imports, etc.?

Could be useful to add a node why it's necessary to special case this as a taint source instead of just the URL itself and propagating from there.

BekaValentine
BekaValentine previously approved these changes Apr 22, 2020
@RasmusWL
Copy link
Member Author

What an excellent question! I did not think about it, simply found that we didn't propagate taint properly, and made a fix.

So I looked over the urllib.parse code more in depth. Currently the taint stops at the call the _coerce_args -- to handle that case, we would need to track taint through star args (tracked in https://github.com/github/codeql-python-team/issues/49)

If that was solved the taint would stop because we currently don't propagate taint through str.split (fix in #3302).

Once these two problems have been fixed, it would be interesting to see if we can handle this case out-of-the-box without any special handling.

@RasmusWL
Copy link
Member Author

No matter what, we should also handle parse_qsl 👍

@RasmusWL RasmusWL marked this pull request as draft April 23, 2020 10:06
Since this test inheriently has `--max-import-depth=1`, by using six, we would
never look at the actual source-code of urllib.parse/urlparse and therefore the
test would never show if we understood the library code good enough that we
could propagate taint out-of-the-box.

All tests moved by one line... that is why the diff is so big
@RasmusWL
Copy link
Member Author

RasmusWL commented Apr 23, 2020

After adding support for parse_qsl I thought that this might be enough to get the taint for parse_qs working, but alas; although we figure out that pairs is tainted, neither name nor value gets tainted (source code) 😞 I'll try to investigate this a bit more in a different issue/PR.

In conclusion, for now, we need both of these special cases.

@RasmusWL RasmusWL marked this pull request as ready for review April 23, 2020 11:01
@RasmusWL RasmusWL requested a review from tausbn April 27, 2020 08:40
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise LGTM.

@RasmusWL RasmusWL requested a review from tausbn May 1, 2020 08:22
@tausbn tausbn merged commit 40def2a into github:master May 1, 2020
@RasmusWL RasmusWL deleted the python-parse_qs branch May 1, 2020 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants