-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: Improve various library modeling #6349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Intended for use with dca
Before, results from `dca` would look something like ## + py/meta/alerts/remote-flow-sources-reach - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48 reachable with taint-tracking from RemoteFlowSource - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12 reachable with taint-tracking from RemoteFlowSource now it should make it easier to spot _what_ it is that actually changed, since we pretty-print the node.
So it matches the new style we're using in aiohttp/twisted/...
Such that the result of `request.FILES["key"].file.read()` is tainted
since UploadedFile is the abstract base class, all real usage would be of one of the subclasses, so removing this to not provide a false hope that it actually works. I don't think investing the time into making this work would give any value, so that's why I didn't do it ;)
Like before, omitted ClassInstantiation
Having the additional taint step just next to the other definitions, so everything is together.
InstanceSourceApiNode is a really good idea, but it just happened too soon. I can't do what I need if I have to supply an API-node. So to avoid confusion between deprecating to/from InstanceSource in those classes, I opted to do some major reorganizing as well 👍 Due to aliasing restrictions, I had to use a little trick with the `WerkzeugOld` module.
Also removed a misleading comment link to method on wrong class :D
I know that the TODO about not having the tools to handling `meth = obj.meth; meth()` is outdated now that we `DataFlow::MethodCallNode`, but I'm planning to deal with that later on ;)
It would probably have been easier to do this as the _first_ thing... but that's too late now 😓
Such that it should be next to the other class-related predicates (such as `instance()`), the class is called `AdditionalTaintStep`, and it marked private. I also moved any modeling of attributes as well, while I was at it.
A few stragglers that did not have the same TODO comments as the others
These were written way before the ones in DataFlowPrivate, but apparently didn't cover quite as much :|
I realized that if you ever wanted to the way taint-steps works again, you would have to go to all the 117 places it has been implemented, and change EVERY ONE OF THEM :( so trying to solve that problem here. Not super happy with the name, but that was just the best I could come up with :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, thanks for shaping up our libraries!
I suppose a by now sort-of-standard name for InstanceTaintStepHelper
would be InstanceTaintStepConfiguration
, but I am not sure it sends a better signal.
|
||
/** A direct instantiation of `django.utils.datastructures.MultiValueDict`. */ | ||
private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode { | ||
override CallNode node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This override should be unnecessary, given extends DataFlow::CallCfgNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
/** An attribute read on an django request that is a `MultiValueDict` instance. */ | ||
class DjangoHttpRequestMultiValueDictInstances extends Django::MultiValueDict::InstanceSource { | ||
DjangoHttpRequestMultiValueDictInstances() { | ||
this.(DataFlow::AttrRead).getObject() = django::http::request::HttpRequest::instance() and | ||
this.(DataFlow::AttrRead).getAttributeName() in ["GET", "POST", "FILES"] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to extend DataFlow::AttrRead
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice, but DataFlow::AttrRead
is an abstract class, so it doesn't work that well in practice 😞 we could have applied the ::Range
pattern, but that seems a bit late now 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the new-fangled extends ... instanceof ...
instead? (I've still not fully internalised how that affects things.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough, then I do not require a change here, although the suggestion by @tausbn is interesting.
/** | ||
* Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related `await`. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps show the syntax here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
Hm, reviewing commit-by-commit might not have been the best choice here... |
As pointed out in review, we don't need this override any more!
Seeing |
Co-authored-by: Rasmus Wriedt Larsen <rasmuswl@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It got rather involved in the end, but I think the end result is both a more in-depth modeling, and a better way to model things moving forwards.
Performance test in https://github.com/dsp-testing/RasmusWL-dca/issues/29