Skip to content

Python: support pathlib #5681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 4, 2021
Merged

Python: support pathlib #5681

merged 18 commits into from
May 4, 2021

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Apr 15, 2021

Adds support for pathlib.Paths by tracking these objects with a type tracker and adding appropriate taint steps.
This enables a new extension of FileSystemAccess base on pathlib.Path objects.

I included most of the API since the QLDoc for FileSystemAccess is quite broad, but I am open to dropping calls such as resolve or is_fifo which may be harder to exploit.

Currently missing:

  • class methods cwd and home
  • methods returning lists

@yoff yoff force-pushed the python-support-pathlib branch from 6cc8a5b to 52a9040 Compare April 15, 2021 07:47
@yoff yoff marked this pull request as ready for review April 15, 2021 07:51
@yoff yoff requested a review from a team as a code owner April 15, 2021 07:51
yoff added 3 commits April 15, 2021 10:47
This because `resolve` accesses the file system,
I am open to not include that fact in the modelling.
that were only listed as file sytem accesses.
Comment on lines 1032 to 1046
// Special handling of the `/` operator
exists(BinaryExprNode slash, DataFlow::Node left |
slash.getOp() instanceof Div and
left.asCfgNode() = slash.getLeft() and
left.getALocalSource() = pathlibPath()
|
nodeTo.asCfgNode() = slash and
(
// type-preserving call
nodeFrom = left
or
// data injection
nodeFrom.asCfgNode() = slash.getRight()
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this to also handle if the path is the right operator?

A demonstration:

In [1]: from pathlib import Path

In [2]: "bar" / Path("foo")
Out[2]: PosixPath('bar/foo')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, thanks for catching this!

Will also support both operands being paths
@yoff yoff requested a review from RasmusWL April 15, 2021 14:08
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for improvements. Apart from that I think this looks good. 👍

Comment on lines 1018 to 1030
// Type-preserving call
exists(DataFlow::AttrRead returnsPath |
returnsPath.getAttributeName() = pathlibPathMethod()
|
nodeTo.(DataFlow::CallCfgNode).getFunction() = returnsPath and
nodeFrom = returnsPath.getObject()
)
or
// Type-preserving attribute
exists(DataFlow::AttrRead isPath | isPath.getAttributeName() = pathlibPathAttribute() |
nodeTo = isPath and
nodeFrom = isPath.getObject()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's a lot of overlap between this and the type tracker. Could we perhaps factor out these steps appropriately, and avoid repeating ourselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, see if you like...it could potentially also be done for the injection part.

)
)
or
// Data injection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, it's not immediately obvious to me what "Data injection" means here. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that we should (taint-)track data from a non-Path-object (in)to a Path-object.

@yoff yoff requested a review from tausbn April 16, 2021 15:02
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the refactoring helped a lot, but I still think it could be improved further. I have added a few suggestions.

/**
* Flow for type presering mehtods.
*/
private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very generic-sounding name (and QLDoc comment) for something that's specific to pathlib. Perhaps something with pathlib in the name would be better?
(Also, two typos: preserving methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I had the generalisation of this in my head...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue for this now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)

/**
* Flow for type presering attributes.
*/
private predicate typePreservingAttribute(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my other comment (and also another typo).

Comment on lines 1030 to 1046
slash.getOp() instanceof Div and
(
pathOperand.asCfgNode() = slash.getLeft() and
dataOperand.asCfgNode() = slash.getRight()
or
pathOperand.asCfgNode() = slash.getRight() and
dataOperand.asCfgNode() = slash.getLeft()
) and
pathOperand.getALocalSource() = pathlibPath()
|
nodeTo.asCfgNode() = slash and
nodeFrom in [
// type-preserving call
pathOperand,
// data injection
dataOperand
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not just boil down to the following?

          slash.getOp() instanceof Div and
          nodeFrom.asCfgNode() = slash.getAnOperand() and
          nodeFrom.getALocalSource() = pathlibPath() and
          nodeTo.asCfgNode() = slash

(In which case, more sharing with the pathlibPath typetracker might be possible.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost, except the argument of type path lib.Path does not have to be nodeFrom. It still does become much shorter this way, though 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing is a bit tricky for data injection, because the taint step would prefer to refer to the type tracker to limit the set of nodes.

@yoff yoff requested a review from tausbn April 22, 2021 09:12
tausbn
tausbn previously approved these changes Apr 22, 2021
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the PR again. There are still some points I would like to see addressed, but I don't want this to turn into an eternal PR, so feel free to ignore my suggestions if you disagree with them. We can always clean this up later.

I believe the code is functionally correct.

With that in mind, I've marked this as approved.

/**
* Flow for type presering mehtods.
*/
private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)

Co-authored-by: Taus <tausbn@github.com>
@yoff yoff dismissed stale reviews from tausbn via 1954c0b April 23, 2021 08:20
@yoff yoff requested a review from tausbn April 23, 2021 18:27
@RasmusWL RasmusWL removed their request for review April 29, 2021 12:42
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's see if it floats. 🚢

@codeql-ci codeql-ci merged commit 95f26aa into github:main May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants