-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Attribute access API #4423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b905a3d
e9ecc00
31596ef
ceb2496
df447c0
d46453c
60eec7b
b07c7ab
3288cf1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,256 @@ | ||
| /** This module provides an API for attribute reads and writes. */ | ||
|
|
||
| import DataFlowUtil | ||
| import DataFlowPublic | ||
| private import DataFlowPrivate | ||
|
|
||
| /** | ||
| * A data flow node that reads or writes an attribute of an object. | ||
| * | ||
| * This abstract base class only knows about the base object on which the attribute is being | ||
| * accessed, and the attribute itself, if it is statically inferrable. | ||
| */ | ||
| abstract class AttrRef extends Node { | ||
| /** | ||
| * Gets the data flow node corresponding to the object whose attribute is being read or written. | ||
| */ | ||
| abstract Node getObject(); | ||
|
|
||
| /** | ||
| * Gets the expression node that defines the attribute being accessed, if any. This is | ||
| * usually an identifier or literal. | ||
| */ | ||
| abstract ExprNode getAttributeNameExpr(); | ||
|
|
||
| /** | ||
| * Holds if this attribute reference may access an attribute named `attrName`. | ||
| * Uses local data flow to track potential attribute names, which may lead to imprecision. If more | ||
| * precision is needed, consider using `getAttributeName` instead. | ||
| */ | ||
| predicate mayHaveAttributeName(string attrName) { | ||
| attrName = this.getAttributeName() | ||
| or | ||
| exists(Node nodeFrom | | ||
| localFlow(nodeFrom, this.getAttributeNameExpr()) and | ||
| attrName = nodeFrom.asExpr().(StrConst).getText() | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Gets the name of the attribute being read or written. For dynamic attribute accesses, this | ||
| * method is not guaranteed to return a result. For such cases, using `mayHaveAttributeName` may yield | ||
| * better results. | ||
| */ | ||
| abstract string getAttributeName(); | ||
| } | ||
|
|
||
| /** | ||
| * A data flow node that writes an attribute of an object. This includes | ||
| * - Simple attribute writes: `object.attr = value` | ||
| * - Dynamic attribute writes: `setattr(object, attr, value)` | ||
| * - Fields written during class initialization: `class MyClass: attr = value` | ||
| */ | ||
| abstract class AttrWrite extends AttrRef { | ||
| /** Gets the data flow node corresponding to the value that is written to the attribute. */ | ||
| abstract Node getValue(); | ||
| } | ||
|
|
||
| /** | ||
| * Represents a control flow node for a simple attribute assignment. That is, | ||
| * ```python | ||
| * object.attr = value | ||
| * ``` | ||
| * Also gives access to the `value` being written, by extending `DefinitionNode`. | ||
| */ | ||
| private class AttributeAssignmentNode extends DefinitionNode, AttrNode, DataFlowCfgNode { | ||
| override ControlFlowNode getValue() { result = DefinitionNode.super.getValue() } | ||
| } | ||
|
|
||
| /** A simple attribute assignment: `object.attr = value`. */ | ||
| private class AttributeAssignmentAsAttrWrite extends AttrWrite, CfgNode { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a very long class name exposing information the type-system also knows. Would something simpler work? I initially suggested
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding the long name, yeah, I agree it's a bit of a mouthful. The JS libraries has a similar affliction (e.g. |
||
| override AttributeAssignmentNode node; | ||
|
|
||
| override Node getValue() { result.asCfgNode() = node.getValue() } | ||
|
|
||
| override Node getObject() { result.asCfgNode() = node.getObject() } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { | ||
| // Attribute names don't exist as `Node`s in the control flow graph, as they can only ever be | ||
| // identifiers, and are therefore represented directly as strings. | ||
| // Use `getAttributeName` to access the name of the attribute. | ||
| none() | ||
| } | ||
|
|
||
| override string getAttributeName() { result = node.getName() } | ||
| } | ||
|
|
||
| import semmle.python.types.Builtins | ||
|
|
||
| /** Represents `CallNode`s that may refer to calls to built-in functions or classes. */ | ||
| private class BuiltInCallNode extends CallNode, DataFlowCfgNode { | ||
| string name; | ||
|
|
||
| BuiltInCallNode() { | ||
| // TODO disallow instances where the name of the built-in may refer to an in-scope variable of that name. | ||
| exists(NameNode id | this.getFunction() = id and id.getId() = name and id.isGlobal()) and | ||
| name = any(Builtin b).getName() | ||
| } | ||
|
|
||
| /** Gets the name of the built-in function that is called at this `CallNode` */ | ||
| string getBuiltinName() { result = name } | ||
| } | ||
|
|
||
| /** | ||
| * Represents a call to the built-ins that handle dynamic inspection and modification of | ||
| * attributes: `getattr`, `setattr`, `hasattr`, and `delattr`. | ||
| */ | ||
| private class BuiltinAttrCallNode extends BuiltInCallNode { | ||
| BuiltinAttrCallNode() { name in ["setattr", "getattr", "hasattr", "delattr"] } | ||
|
|
||
| /** Gets the control flow node for object on which the attribute is accessed. */ | ||
| ControlFlowNode getObject() { result in [this.getArg(0), this.getArgByName("object")] } | ||
|
|
||
| /** | ||
| * Gets the control flow node for the value that is being written to the attribute. | ||
| * Only relevant for `setattr` calls. | ||
| */ | ||
| ControlFlowNode getValue() { | ||
| // only valid for `setattr` | ||
| name = "setattr" and | ||
| result in [this.getArg(2), this.getArgByName("value")] | ||
| } | ||
|
|
||
| /** Gets the control flow node that defines the name of the attribute being accessed. */ | ||
| ControlFlowNode getName() { result in [this.getArg(1), this.getArgByName("name")] } | ||
| } | ||
|
|
||
| /** Represents calls to the built-in `setattr`. */ | ||
| private class SetAttrCallNode extends BuiltinAttrCallNode { | ||
| SetAttrCallNode() { name = "setattr" } | ||
| } | ||
|
|
||
| /** Represents calls to the built-in `getattr`. */ | ||
| private class GetAttrCallNode extends BuiltinAttrCallNode { | ||
| GetAttrCallNode() { name = "getattr" } | ||
| } | ||
|
|
||
| /** An attribute assignment using `setattr`, e.g. `setattr(object, attr, value)` */ | ||
| private class SetAttrCallAsAttrWrite extends AttrWrite, CfgNode { | ||
| override SetAttrCallNode node; | ||
|
|
||
| override Node getValue() { result.asCfgNode() = node.getValue() } | ||
|
|
||
| override Node getObject() { result.asCfgNode() = node.getObject() } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { result.asCfgNode() = node.getName() } | ||
|
|
||
| override string getAttributeName() { | ||
| result = this.getAttributeNameExpr().asExpr().(StrConst).getText() | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Represents an attribute of a class that is assigned statically during class definition. For instance | ||
| * ```python | ||
| * class MyClass: | ||
| * attr = value | ||
| * ... | ||
| * ``` | ||
| * Instances of this class correspond to the `NameNode` for `attr`, and also gives access to `value` by | ||
| * virtue of being a `DefinitionNode`. | ||
| */ | ||
| private class ClassAttributeAssignmentNode extends DefinitionNode, NameNode, DataFlowCfgNode { } | ||
|
|
||
| /** | ||
| * An attribute assignment via a class field, e.g. | ||
| * ```python | ||
| * class MyClass: | ||
| * attr = value | ||
| * ``` | ||
| * is treated as equivalent to `MyClass.attr = value`. | ||
| */ | ||
| private class ClassDefinitionAsAttrWrite extends AttrWrite, CfgNode { | ||
| ClassExpr cls; | ||
| override ClassAttributeAssignmentNode node; | ||
|
|
||
| ClassDefinitionAsAttrWrite() { node.getScope() = cls.getInnerScope() } | ||
|
|
||
| override Node getValue() { result.asCfgNode() = node.getValue() } | ||
|
|
||
| override Node getObject() { result.asCfgNode() = cls.getAFlowNode() } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { none() } | ||
|
|
||
| override string getAttributeName() { result = node.getId() } | ||
| } | ||
|
|
||
| /** | ||
| * A read of an attribute on an object. This includes | ||
| * - Simple attribute reads: `object.attr` | ||
| * - Dynamic attribute reads using `getattr`: `getattr(object, attr)` | ||
| * - Qualified imports: `from module import attr as name` | ||
tausbn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| */ | ||
| abstract class AttrRead extends AttrRef, Node { } | ||
|
|
||
| /** | ||
| * A convenience class for embedding `AttrNode` into `DataFlowCfgNode`, as the former is not | ||
| * obviously a subtype of the latter. | ||
| */ | ||
| private class DataFlowAttrNode extends AttrNode, DataFlowCfgNode { } | ||
|
|
||
| /** A simple attribute read, e.g. `object.attr` */ | ||
| private class AttributeReadAsAttrRead extends AttrRead, CfgNode { | ||
| override DataFlowAttrNode node; | ||
|
|
||
| override Node getObject() { result.asCfgNode() = node.getObject() } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { | ||
| // Attribute names don't exist as `Node`s in the control flow graph, as they can only ever be | ||
| // identifiers, and are therefore represented directly as strings. | ||
| // Use `getAttributeName` to access the name of the attribute. | ||
| none() | ||
| } | ||
|
|
||
| override string getAttributeName() { result = node.getName() } | ||
| } | ||
|
|
||
| /** An attribute read using `getattr`: `getattr(object, attr)` */ | ||
| private class GetAttrCallAsAttrRead extends AttrRead, CfgNode { | ||
| override GetAttrCallNode node; | ||
|
|
||
| override Node getObject() { result.asCfgNode() = node.getObject() } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { result.asCfgNode() = node.getName() } | ||
|
|
||
| override string getAttributeName() { | ||
| result = this.getAttributeNameExpr().asExpr().(StrConst).getText() | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * A convenience class for embedding `ImportMemberNode` into `DataFlowCfgNode`, as the former is not | ||
| * obviously a subtype of the latter. | ||
| */ | ||
| private class DataFlowImportMemberNode extends ImportMemberNode, DataFlowCfgNode { } | ||
|
|
||
| /** | ||
| * Represents a named import as an attribute read. That is, | ||
| * ```python | ||
| * from module import attr as attr_ref | ||
| * ``` | ||
| * is treated as if it is a read of the attribute `module.attr`, even if `module` is not imported directly. | ||
| */ | ||
| private class ModuleAttributeImportAsAttrRead extends AttrRead, CfgNode { | ||
| override DataFlowImportMemberNode node; | ||
|
|
||
| override Node getObject() { result.asCfgNode() = node.getModule(_) } | ||
|
|
||
| override ExprNode getAttributeNameExpr() { | ||
| // The name of an imported attribute doesn't exist as a `Node` in the control flow graph, as it | ||
| // can only ever be an identifier, and is therefore represented directly as a string. | ||
| // Use `getAttributeName` to access the name of the attribute. | ||
| none() | ||
| } | ||
|
|
||
| override string getAttributeName() { exists(node.getModule(result)) } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not call this
hasAttributeName? Are we any more uncertain of the values reported here than ingetAttributeNameorgetAttributeNameExpr?I am asking because I think this is the surface predicate, and we want to encourage users to use this one for normal use. Perhaps we should also mention this in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main reason is: to mimic the JavaScript API. I think the
mayis useful in this case (at least with how the library functions currently) in thatgetAttributeNamewill only yield a result if the name of the attribute is fixed, butmayHaveAttributeNamecan hold for several attribut names. Thus, I would expect something liketo have (at least) two values for
mayHaveAttributeName.Actually, this has me wondering. Perhaps
getAttributeNameshould be rewritten to do the same kind of local flow asmayHaveAttributeName, but only yield a value if the result is unique. 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CodeQL, I have come to expect that a predicate named
getAttributeNamemay yield more than one value. Is the case where the name of the attribute is fixed important?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, I think this will depend on whether we end up seeing false positives because of attribute confusion. I can certainly see an argument for using
mayHaveAttributeNamein something like type tracking, where we want to propagate types as much as possible (even at the cost of a bit of imprecision), but I can also imagine a situation where conflating two attribute names leads to an erroneous flow of taint. (So in particular, the data flow library itself should probably usegetAttributeNamefor precision.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I was curious, and added a test case to see if we were getting the intended behaviour for
mayHaveAttributeName, and it seems we're not.Note the
f-annotations above. It seems we only consider local flow fromattr = fooand notattr = bar. This is true even if I negate the conditional, so I expect it's simply always picking the first branch. This feels like it might be a bug in the implementation of local flow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yes, we will have to sort that out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you try the same with a conditional expression, do you get the expected behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following test code passes (with
f-annotations):So, no. Same behaviour.☹️
(Also this was after manually applying 0f077f5 manually, since that commit is not present on this branch.)
I'm wondering if the problem is elsewhere, though. I'll have to debug this a bit.