-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Python: support pathlib
#5681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: support pathlib
#5681
Conversation
6cc8a5b
to
52a9040
Compare
This because `resolve` accesses the file system, I am open to not include that fact in the modelling.
that were only listed as file sytem accesses.
// Special handling of the `/` operator | ||
exists(BinaryExprNode slash, DataFlow::Node left | | ||
slash.getOp() instanceof Div and | ||
left.asCfgNode() = slash.getLeft() and | ||
left.getALocalSource() = pathlibPath() | ||
| | ||
nodeTo.asCfgNode() = slash and | ||
( | ||
// type-preserving call | ||
nodeFrom = left | ||
or | ||
// data injection | ||
nodeFrom.asCfgNode() = slash.getRight() | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this to also handle if the path is the right operator?
A demonstration:
In [1]: from pathlib import Path
In [2]: "bar" / Path("foo")
Out[2]: PosixPath('bar/foo')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, thanks for catching this!
Will also support both operands being paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions for improvements. Apart from that I think this looks good. 👍
// Type-preserving call | ||
exists(DataFlow::AttrRead returnsPath | | ||
returnsPath.getAttributeName() = pathlibPathMethod() | ||
| | ||
nodeTo.(DataFlow::CallCfgNode).getFunction() = returnsPath and | ||
nodeFrom = returnsPath.getObject() | ||
) | ||
or | ||
// Type-preserving attribute | ||
exists(DataFlow::AttrRead isPath | isPath.getAttributeName() = pathlibPathAttribute() | | ||
nodeTo = isPath and | ||
nodeFrom = isPath.getObject() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's a lot of overlap between this and the type tracker. Could we perhaps factor out these steps appropriately, and avoid repeating ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, see if you like...it could potentially also be done for the injection part.
) | ||
) | ||
or | ||
// Data injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, it's not immediately obvious to me what "Data injection" means here. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that we should (taint-)track data from a non-Path-object (in)to a Path-object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the refactoring helped a lot, but I still think it could be improved further. I have added a few suggestions.
/** | ||
* Flow for type presering mehtods. | ||
*/ | ||
private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very generic-sounding name (and QLDoc comment) for something that's specific to pathlib
. Perhaps something with pathlib
in the name would be better?
(Also, two typos: preserving methods
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I had the generalisation of this in my head...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue for this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib
module could really use some disambiguation.)
/** | ||
* Flow for type presering attributes. | ||
*/ | ||
private predicate typePreservingAttribute(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my other comment (and also another typo).
slash.getOp() instanceof Div and | ||
( | ||
pathOperand.asCfgNode() = slash.getLeft() and | ||
dataOperand.asCfgNode() = slash.getRight() | ||
or | ||
pathOperand.asCfgNode() = slash.getRight() and | ||
dataOperand.asCfgNode() = slash.getLeft() | ||
) and | ||
pathOperand.getALocalSource() = pathlibPath() | ||
| | ||
nodeTo.asCfgNode() = slash and | ||
nodeFrom in [ | ||
// type-preserving call | ||
pathOperand, | ||
// data injection | ||
dataOperand | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not just boil down to the following?
slash.getOp() instanceof Div and
nodeFrom.asCfgNode() = slash.getAnOperand() and
nodeFrom.getALocalSource() = pathlibPath() and
nodeTo.asCfgNode() = slash
(In which case, more sharing with the pathlibPath
typetracker might be possible.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost, except the argument of type path lib.Path
does not have to be nodeFrom
. It still does become much shorter this way, though 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing is a bit tricky for data injection, because the taint step would prefer to refer to the type tracker to limit the set of nodes.
Co-authored-by: Taus <tausbn@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the PR again. There are still some points I would like to see addressed, but I don't want this to turn into an eternal PR, so feel free to ignore my suggestions if you disagree with them. We can always clean this up later.
I believe the code is functionally correct.
With that in mind, I've marked this as approved.
/** | ||
* Flow for type presering mehtods. | ||
*/ | ||
private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib
module could really use some disambiguation.)
Co-authored-by: Taus <tausbn@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's see if it floats. 🚢
Adds support for
pathlib.Path
s by tracking these objects with a type tracker and adding appropriate taint steps.This enables a new extension of
FileSystemAccess
base onpathlib.Path
objects.I included most of the API since the QLDoc for
FileSystemAccess
is quite broad, but I am open to dropping calls such asresolve
oris_fifo
which may be harder to exploit.Currently missing:
cwd
andhome