Skip to content

Conversation

@RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Dec 8, 2020

I manually tested this against a few projects, and it does bring back missing results for Code Injection 🎉

Just a port of the old tests, except for the fact that I learned
`cgi.FieldStorage()` _should_ be tainted when not specifying any arguments. (and
moved taint-test to own function)

Also clarified how imports of all the .*HTTPRequestHandler works in Python2
Comment on lines +1070 to +1076
* A source of an instance of `cgi.FieldStorage`.
*
* This can include instantiation of the class, return value from function
* calls, or a special parameter that will be set when functions are call by external
* library.
*
* Use `FieldStorage::instance()` predicate to get references to instances of `cgi.FieldStorage`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A source of an instance of `cgi.FieldStorage`.
*
* This can include instantiation of the class, return value from function
* calls, or a special parameter that will be set when functions are call by external
* library.
*
* Use `FieldStorage::instance()` predicate to get references to instances of `cgi.FieldStorage`.
* An instance of `cgi.FieldStorage`, extend this class to model new instances.
*
* This can include instantiations of the class, return values from function
* calls, or a special parameter that will be set when functions are called by an external
* library.
*
* Use the predicate `FieldStorage::instance()` to get references to instances of `cgi.FieldStorage`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree with changing this, you probably want to update your snippet...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-A source of an instance of `cgi.FieldStorage`.
+An instance of `cgi.FieldStorage`

I don't agree with this part. This pattern of having a source of an instance doesn't make too much sense here, but really helps out in other places, for example for the MultiDict in werkzeug. I would really like us to not change this pattern, so it's easy to recognize it's just this one pattern being used all over the place.

The other suggestions looks like an improvement. Let's do that in an other PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to relegate these to a separate PR if we are planning to change multiple instances anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "A source of instances of...", then? I wanted to make it clear, that if you extend this class, you should extend it with things that can be considered instances since the code below does so. The "source of" part was confusing this a bit for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A source of instances of...

sounds good to me 👍

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra edges that I hope we can avoid.
Apart from this, I wonder if we do want the instances themselves to be the sources. Since we identify the reads that actually pull out the user controlled data, should those points not be the sources?
(I guess if someone were to print the instance, that would also expose the data.)

So there isn't flow from *any* instance to *any* access of the methods,
but only from the _actual_ instance where the method is accessed.
Co-authored-by: yoff <lerchedahl@gmail.com>
@RasmusWL RasmusWL requested a review from yoff December 15, 2020 11:03
@yoff
Copy link
Contributor

yoff commented Dec 15, 2020

Ok, no more details from me. The high-level question still stands: Is there an obvious downside to only mark the actual data extraction points as sinks? It seems it would allow us to skip adding extra taint steps altogether?

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
As discussed verbally, the path from the initial instance to the field extractions can be useful both in path explanations and to facilitate sanitizers.

@yoff yoff merged commit be5dbf2 into github:main Dec 15, 2020
@RasmusWL RasmusWL deleted the stdlib-http-source-modeling branch December 15, 2020 13:55
RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants