-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pysrc2cpg] Model Field-like Behaviour of Module Variables #3750
Conversation
The motivation for this PR is that module-level variables in Python can be accessed by other files by some implicit global scope. This behaviour is more akin to fields rather than identifiers, and should be modelled closer to that direction. Given the caveat that the data-flow engine sees fields a bit differently, especially if the flow goes across closures, this aims to be the least intrusive change to achieve this result. ### Main changes * Create a module-level identifier assigned to the module's type ref to be used as the base of module-level member field accesses. * At module-level identifier STOREs, create a block that writes to the module field, then aliases the result to the identifier, e.g., `x = 1` becomes ```python x = { <module>.x = 1 tmp = <module>.x tmp } ``` * Added helper methods, e.g., `isModuleContext` to `ContextStack` * Separated `VariableReference` class into `Reference` trait shared between `VariableReference` and `FieldReference` * During `createMemberLinks`, `FieldReference` calls will have a `REF` edge to their referencing member. TODO: Recover these `REF` edges during the type recovery pass. * ## Misc Changes * Fix `pysrc2cpg/ImportResolver` bug when an `__init__.py` entity is imported from a non-sibling module. * Added shortcut in `PythonTypeRecovery.visitStatementsInBlock` to detect these strong-update blocks and "re-sugar" them to skip propagating the type via member * Removed Python slicer tests, I would rather be intentional in how I support Python usage slicing at a later stage
joern-cli/frontends/pysrc2cpg/src/test/scala/io/joern/pysrc2cpg/dataflow/DataFlowTests.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this contain a test for a flow we can now find which we could not find previously? It looks to me like a few test expectations were edited, but there is no new test. Also, can you comment on why the number of flows changes? Are we ok with suddenly getting 2 flows in a place where we previously only had one?
@@ -214,6 +215,23 @@ class TypeRecoveryPassTests extends PySrc2CpgFixture(withOssDataflow = false) { | |||
"bar.py" | |||
).cpg | |||
|
|||
"be able to traverse from `foo.[x|y|db]` to its members" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional (non-data-flow) test to show recovered member references that were previously not there
…e handled for members
joern-cli/frontends/pysrc2cpg/src/test/scala/io/joern/pysrc2cpg/dataflow/DataFlowTests.scala
Outdated
Show resolved
Hide resolved
|
||
val source = cpg.literal("42").l | ||
val sink = cpg.call.code("print.*").argument.l | ||
sink.reachableByFlows(source).size shouldBe 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this PR, we can now obtain the following flow:
/*
____________________________________________________________
| nodeType| tracked | lineNumber| method | file |
|===========================================================|
| Literal | <module>.a = 42 | 2 | <module> | foo.py |
| Call | print(foo.a) | 4 | <module> | bar.py |
*/
_.method.typeDecl | ||
_.method.typeDecl, | ||
// for Python, we have moved to replacing strong updates of module-level variables with their members | ||
_.target.isCall.nameExact(Operators.fieldAccess).argument(1).isIdentifier.name("<module>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added to compensate for the de-sugared blocks to reason between the module-level variable identifier and its field access
expr.method.typeDecl.map { typeDecl => (typeDecl, expr) } ++ | ||
expr.inCall.fieldAccess.referencedMember.flatMap { member => | ||
member.typeDecl.map { typeDecl => (typeDecl, member) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extends on the Expression
case to access the referenced members, effectively combining the behaviour of both cases below this one
…same as before and give no fluctuations
@@ -37,6 +37,18 @@ class ExtendedCfgNode(val traversal: Iterator[CfgNode]) extends AnyVal { | |||
reachedSources.cast[NodeType] | |||
} | |||
|
|||
/** Trims out path elements that represent module-level variable assignment desugaring | |||
*/ | |||
private def trimModuleVariableDesugarBlock(path: Vector[PathElement]): Vector[PathElement] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabsx00 I've added this to remove the visibility of the desugaring, now the existing Python data-flows no longer need to be +1'd in certain places since they were duplicates in any case
This query package, `moduelvariable`, adds the necessary complexity to reason and handle the changes introduced by #3750 See the README.md for additional context. One can now navigate the module members, the block references, the locals and identifiers referencing these variables, and go navigate back concisely.
This query package, `modulevariable`, adds the necessary complexity to reason and handle the changes introduced by #3750 See the README.md for additional context. One can now navigate the module members, the block references, the locals and identifiers referencing these variables, and go navigate back concisely.
…Steps * Reverts #3750's module block structure in Python and performs more on-the-fly interprocedural usage resolution * Adds `MODULE` modifier on module-defining methods in JS and Ruby (Python already has this). Note: Other frontends don't seem to have a module-defining method, though I may be wrong. * Transferred import resolution query steps and classes to `semanticcpg` under `importresolver` to perform on-the-fly CPG entity retrieval of resolved imports. * Modified existing `modulevariable` query steps to now reflect `LOCAL` as the base of the `ModuleVariable` node extension and created a bunch of queries around that. * Modified `SourcesToStartingPoints` to handle module variables similarly to fields for interprocedural flows.
…Steps (#3839) In an ongoing effort to consolidate the interprocedural behaviour of "global" or "module-level" variables, this PR allows frontends to define module-defining methods the same as other modules, while treating the `LOCAL` variables on these methods closer to `TypeDecl.Member` nodes in the data-flow engine. This change allows all frontends that specify the `MODULE` modifier to benefit from inter-module flows from module-level local variables, i.e., a module importing another module will now benefit from data-flow tracking across importing modules. ### Changelist * Reverts #3750 module block structure in Python and performs more on-the-fly interprocedural usage resolution * Adds `MODULE` modifier on module-defining methods in JS and Ruby (Python already has this). Note: Other frontends don't seem to have a module-defining method, though I may be wrong. * Transferred import resolution query steps and classes to `semanticcpg` under `importresolver` to perform on-the-fly CPG entity retrieval of resolved imports. * Modified existing `modulevariable` query steps to now reflect `LOCAL` as the base of the `ModuleVariable` node extension and created a bunch of queries around that. * Modified `SourcesToStartingPoints` to handle module variables similarly to fields for interprocedural flows. * Query steps of node: * `isModule` added to `Method` nodes * `method` step added to `Declaration` nodes * `isModuleVariable` added to `Identifier` and `Local` nodes * `Expression` nodes have `moduleVariables` step which filters in nodes that relate to some module variable, and casts these nodes to `OpNodes.ModuleVariable` (extends `Local`) * `references` added to `OpNodes.ModuleVariable` which behaves like `Local.referencingIdentifiers` but also includes `fieldIdentifier` nodes whenever a module variable is referenced off an imported module field access. ### Things to Note * In "lowered" code blocks, including lambdas, there were a few flow increases due to method variable scoping behaving more like field accesses: * For Ruby, in the deprecated frontend, there was 1 fewer in some tests. In the new frontend, the assignment flows doubled. * JavaScript flows also increased here and there in 1 lowering example and 2 lambdas.
The motivation for this PR is that module-level variables in Python can be accessed by other files by some implicit global scope. This behaviour is more akin to fields rather than identifiers, and should be modelled closer to that direction.
Given the caveat that the data-flow engine sees fields a bit differently, especially if the flow goes across closures, this aims to be the least intrusive change to achieve this result. The benefits of this would be:
<module>
type declaration's members.Main changes
x = 1
becomesisModuleContext
toContextStack
VariableReference
class intoReference
trait shared betweenVariableReference
andFieldReference
createMemberLinks
,FieldReference
calls will have aREF
edge to their referencing member.FieldAccess
had an untyped base that is recovered during the type recovery, theREF
edge will be recovered to the target type and member, if either are present.reachableByFlows
ignore the desugaring when returning path elements to keep the result as if there was no desugaring occuring.Misc Changes
pysrc2cpg/ImportResolver
bug when an__init__.py
entity is imported from a non-sibling module.PythonTypeRecovery.visitStatementsInBlock
to detect these strong-update blocks and "re-sugar" them to skip propagating the type via member