Skip to content

Conversation

RasmusWL
Copy link
Member

It got rather involved in the end, but I think the end result is both a more in-depth modeling, and a better way to model things moving forwards.

Performance test in https://github.com/dsp-testing/RasmusWL-dca/issues/29

RasmusWL added 30 commits July 21, 2021 14:53
Intended for use with dca
Before, results from `dca` would look something like

    ## + py/meta/alerts/remote-flow-sources-reach

    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:16:38:48
        reachable with taint-tracking from RemoteFlowSource
    - django/django@c2250cf_cb8f: tests/messages_tests/urls.py:38:9:38:12
        reachable with taint-tracking from RemoteFlowSource

now it should make it easier to spot _what_ it is that actually changed,
since we pretty-print the node.
So it matches the new style we're using in aiohttp/twisted/...
Such that the result of `request.FILES["key"].file.read()` is tainted
since UploadedFile is the abstract base class, all real usage would be
of one of the subclasses, so removing this to not provide a false hope
that it actually works.

I don't think investing the time into making this work would give any
value, so that's why I didn't do it ;)
Like before, omitted ClassInstantiation
Having the additional taint step just next to the other definitions, so
everything is together.
InstanceSourceApiNode is a really good idea, but it just happened too
soon. I can't do what I need if I have to supply an API-node. So to
avoid confusion between deprecating to/from InstanceSource in those
classes, I opted to do some major reorganizing as well 👍

Due to aliasing restrictions, I had to use a little trick with the
`WerkzeugOld` module.
Also removed a misleading comment link to method on wrong class :D
I know that the TODO about not having the tools to handling
`meth = obj.meth; meth()` is outdated now that we `DataFlow::MethodCallNode`,
but I'm planning to deal with that later on ;)
It would probably have been easier to do this as the _first_ thing...
but that's too late now 😓
Such that it should be next to the other class-related predicates (such
as `instance()`), the class is called `AdditionalTaintStep`, and it
marked private.

I also moved any modeling of attributes as well, while I was at it.
A few stragglers that did not have the same TODO comments as the others
These were written way before the ones in DataFlowPrivate, but
apparently didn't cover quite as much :|
I realized that if you ever wanted to the way taint-steps works again,
you would have to go to all the 117 places it has been implemented, and
change EVERY ONE OF THEM :( so trying to solve that problem here.

Not super happy with the name, but that was just the best I could come up with :D
@RasmusWL RasmusWL added the no-change-note-required This PR does not need a change note label Jul 22, 2021
@RasmusWL RasmusWL requested a review from a team as a code owner July 22, 2021 12:27
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thanks for shaping up our libraries!
I suppose a by now sort-of-standard name for InstanceTaintStepHelper would be InstanceTaintStepConfiguration, but I am not sure it sends a better signal.


/** A direct instantiation of `django.utils.datastructures.MultiValueDict`. */
private class ClassInstantiation extends InstanceSource, DataFlow::CallCfgNode {
override CallNode node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override should be unnecessary, given extends DataFlow::CallCfgNode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Comment on lines 2015 to 2021
/** An attribute read on an django request that is a `MultiValueDict` instance. */
class DjangoHttpRequestMultiValueDictInstances extends Django::MultiValueDict::InstanceSource {
DjangoHttpRequestMultiValueDictInstances() {
this.(DataFlow::AttrRead).getObject() = django::http::request::HttpRequest::instance() and
this.(DataFlow::AttrRead).getAttributeName() in ["GET", "POST", "FILES"]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extend DataFlow::AttrRead here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice, but DataFlow::AttrRead is an abstract class, so it doesn't work that well in practice 😞 we could have applied the ::Range pattern, but that seems a bit late now 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the new-fangled extends ... instanceof ... instead? (I've still not fully internalised how that affects things.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough, then I do not require a change here, although the suggestion by @tausbn is interesting.

Comment on lines 207 to 209
/**
* Holds if taint can flow from `nodeFrom` to `nodeTo` with a step related `await`.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps show the syntax here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@yoff
Copy link
Contributor

yoff commented Sep 2, 2021

Hm, reviewing commit-by-commit might not have been the best choice here...

As pointed out in review, we don't need this override any more!
@RasmusWL
Copy link
Member Author

RasmusWL commented Sep 2, 2021

I suppose a by now sort-of-standard name for InstanceTaintStepHelper would be InstanceTaintStepConfiguration, but I am not sure it sends a better signal.

Seeing ...Configuration instinctively makes me think of data-flow/taint tracking, so I'm not convinced it would be better.

@RasmusWL RasmusWL requested a review from yoff September 2, 2021 14:00
@RasmusWL RasmusWL requested a review from yoff September 3, 2021 14:02
Co-authored-by: Rasmus Wriedt Larsen <rasmuswl@github.com>
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yoff yoff merged commit 138a7ae into github:main Sep 6, 2021
@RasmusWL RasmusWL deleted the more-modeling branch September 6, 2021 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants