Dataflow: Refactor access paths to split TypedContent into an explicit pair #12930

aschackmull · 2023-04-26T08:58:03Z

In data flow we're tracking both a type and an access path. These two things are somewhat intertwined as we want to be able to reestablish the tracked type after a store-to-read step sequence.
We used to achieve this by merging the type into the front of the access paths resulting in a list of the form (typ1, content1) :: (typ2, content2) :: ... :: nil(typn). This refactor pulls the front type out of the access path, such that it becomes an explicit pair rather than being merged with Content. We still need the types nested inside the access path tails, so now, instead of being a list of TypedContents, access paths are a list of Contents where each nested tail is paired with a DataFlowType, i.e. (typ1, content1 :: (typ2, content2 :: ... :: (typn, nil)) ... )).
In the automaton view of access paths, we're switching from a graph with access-path nodes and type-content-pair-labelled edges to a graph where the nodes are type-access-path pairs and the edges are merely content-labelled.
This refactor is intended to pave the way for a future enhancement where we'll be able to update the tracked type independently from the access path to improve precision.

This PR is intended to be reviewed commit-by-commit. The commits progress by first duplicating the type information from the access paths where needed before replacing TypedContent with Content.

…AccessPath.

aschackmull · 2023-04-28T09:21:41Z

DCA is looking good. The few additional results for Java appear to be caused by loss of precise type tracking stored in MapValueContent, and can be fixed by adding this to forceHighPrecision.

hvitved

Looks really great; thanks a lot for splitting the work up into individual commits, that helped a lot. I only have one small question, but happy to merge as-is.

hvitved · 2023-05-01T11:46:06Z

java/ql/lib/semmle/code/java/dataflow/internal/DataFlowImpl.qll

@@ -2138,6 +2163,8 @@ module Impl<FullStateConfigSig Config> {
      PrevStage::revFlow(node, _) and result = TApproxFrontNil(node.getDataFlowType())
    }

+    Typ getTyp(DataFlowType t) { result = t }


And same further down.

I expect it to be extremely likely that the optimiser will in fact inline these, so no need to be explicit about that, I think.

aschackmull requested review from a team as code owners April 26, 2023 08:58

github-actions bot added C# C++ DataFlow Library Go Java Python Ruby Swift labels Apr 26, 2023

aschackmull force-pushed the dataflow/split-typedcontent branch from fd8e96d to 6f21781 Compare April 26, 2023 09:16

aschackmull mentioned this pull request Apr 27, 2023

Dataflow: Add type to PathNode.toString. #12948

Merged

aschackmull added 13 commits April 27, 2023 14:33

Dataflow: Split TypedContent in store relation.

cda26ba

Dataflow: Duplicate accesspath type info as separate column.

b84b1a4

Dataflow: Duplicate accesspath type info of the tail in cons relations.

c79daf0

Dataflow: Add type column to filter predicate

209d914

Dataflow: Duplicate accesspath type info in PathNode and pathStep.

5a027b9

Dataflow: Add type to PathNode.toString

fd36304

Dataflow: Duplicate accesspath type info in partial flow.

11c0525

Dataflow: Add type to partial flow summary context

77b09f3

Dataflow: Add type to stage 2-5 summary ctx.

e5d36ff

Dataflow: Remove type from PartialAccessPath.

2cf58fc

Dataflow: Replace RevPartialAccessPath with the now identical Partial…

933d2fb

…AccessPath.

Dataflow: Include type in post-stage-5 tail relation.

69202d2

Dataflow: Duplicate type info for AccessPath tails.

142479e

aschackmull added 13 commits April 27, 2023 14:52

Dataflow: Replace AccessPath push/pop with isCons.

52f50b8

Dataflow: Duplicate type info for AccessPathApprox tails.

95b95e5

Dataflow: Eliminate now-redundant type in nil accesspath approximations.

748bcba

Dataflow: Eliminate TypedContentApprox.

ff3e45e

Dataflow: Eliminate front type in AccessPathFront.

123534a

Dataflow: Replace TypedContent with Content in access paths.

a2fa97a

Dataflow: Remove superfluous columns

b534e7b

Dataflow: Remove superfluous predicates.

5373b4d

Dataflow: Eliminate TypedContent.

4f2d236

Java: Fix reference to TypedContent.

9ad2da6

Dataflow: Autoformat

a761eea

Dataflow: Sync.

9140cbe

Dataflow: Enforce type pruning in all forward stages.

71ae090

aschackmull force-pushed the dataflow/split-typedcontent branch from 6f21781 to 71ae090 Compare April 27, 2023 12:55

aschackmull added the no-change-note-required This PR does not need a change note label May 1, 2023

hvitved approved these changes May 1, 2023

View reviewed changes

aschackmull merged commit 6c8cb0d into github:main May 1, 2023

aschackmull deleted the dataflow/split-typedcontent branch May 1, 2023 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataflow: Refactor access paths to split TypedContent into an explicit pair #12930

Dataflow: Refactor access paths to split TypedContent into an explicit pair #12930

Uh oh!

aschackmull commented Apr 26, 2023

Uh oh!

aschackmull commented Apr 28, 2023

Uh oh!

hvitved left a comment

Uh oh!

hvitved May 1, 2023

Uh oh!

hvitved May 1, 2023

Uh oh!

aschackmull May 1, 2023

Uh oh!

Uh oh!

Dataflow: Refactor access paths to split TypedContent into an explicit pair #12930

Dataflow: Refactor access paths to split TypedContent into an explicit pair #12930

Uh oh!

Conversation

aschackmull commented Apr 26, 2023

Uh oh!

aschackmull commented Apr 28, 2023

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

hvitved May 1, 2023

Choose a reason for hiding this comment

Uh oh!

hvitved May 1, 2023

Choose a reason for hiding this comment

Uh oh!

aschackmull May 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!