Data flow: Track precise types during field flow #3456

hvitved · 2020-05-12T14:11:00Z

Overview

This PR factors type information out of the language-specific Content entities, and instead uses the actual stores to record type information. That is, when tracking flow through a field-write, x.f = y, we use the type of x during the subsequent flow calculations, rather than the result of getContainerType() on the content entity corresponding to f.

This means that we are now able to eliminate flow like in the example below:

class FieldA
{
    public object Field;

    public virtual void M() { }

    public void CallM() => this.M();

    static void M1(FieldB b, FieldC c)
    {
        b.Field = new object();
        b.CallM(); // no flow            <- NEW
        c.Field = new object();
        c.CallM(); // flow
    }
}

class FieldB : FieldA { }

class FieldC : FieldA
{
    public override void M() => Sink(this.Field);
}

While this change may not have a big impact on field flow, it is crucial in the implementation of (typed) collection flow: If we kept type information in the Content class:

newtype TContent = ... or TCollectionContent(DataFlowType elementType)

then a generic element read such as

T GetFirstElement<T>(IList<T> l) => l[0];

would have to match all TCollectionContent(t) entities, which does not scale.

Performance

I have started Difference jobs for

I also ran @aschackmull 's tuple counting benchmark on JDK, which revealed that most relations had similar sizes; most notably, flowCandFwdConsCand increased by 56%, but from a mere 10711 tuples to 16710 tuples.

Notes for review

Commit-by-commit review is encouraged. The actual work happens in f1cd535; the other PRs deal with syncing, renaming, and follow-up changes.

…field-types

jbj · 2020-05-15T13:58:33Z

I haven't looked at the code, but I note that the CPP-Differences job shows only one result change, which is caused by build-system wobble, and mostly unchanged performance. The slowdown on Wireshark may be significant, but it's not large enough to block this PR.

hvitved · 2020-05-18T06:34:19Z

I haven't looked at the code, but I note that the CPP-Differences job shows only one result change, which is caused by build-system wobble, and mostly unchanged performance. The slowdown on Wireshark may be significant, but it's not large enough to block this PR.

👍 FYI, the C++ specific parts are in 2d7470f.

hvitved · 2020-06-03T08:29:56Z

Ping @aschackmull.

hvitved · 2020-06-15T13:39:34Z

Ping.

aschackmull · 2020-06-16T07:44:50Z

I'm looking at this, but it's slow going, sorry. Found a bug in flowCandFwdConsCand to possibly explain the tuple explosion though, but I'll investigate further before reviewing in full.

aschackmull · 2020-06-17T13:44:00Z

PR against this PR: hvitved#2

Dataflow: Record content types

hvitved · 2020-06-18T06:43:09Z

Updated Differences jobs:
C#: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/222/
C++: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1200/
Java: https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/785/

C++ and Java have completed, and they still show now change in neither performance nor results, as expected.

hvitved · 2020-06-18T10:31:58Z

I had to restart the C# job (https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/224/), which also shows no change in performance nor results.

cf github/codeql#3456

hvitved force-pushed the dataflow/precise-field-types branch 5 times, most recently from fd10536 to 7b2796b Compare May 14, 2020 12:13

hvitved added Java C# C++ labels May 14, 2020

hvitved marked this pull request as ready for review May 14, 2020 12:48

hvitved requested review from a team as code owners May 14, 2020 12:48

hvitved assigned aschackmull May 14, 2020

hvitved added 6 commits May 14, 2020 15:58

Data flow: Track precise types during field flow

f1cd535

Data flow: Rename Content variables from f to c

a0d1004

Data flow: Sync files

aa83cc1

Java: Follow-up changes

e608c53

C++: Follow-up changes

2d7470f

C#: Add data-flow test

2c243ad

hvitved force-pushed the dataflow/precise-field-types branch from 7b2796b to 2c243ad Compare May 14, 2020 13:59

Merge remote-tracking branch 'upstream/master' into dataflow/precise-…

cd9538d

…field-types

hvitved mentioned this pull request May 15, 2020

C#: Precise data-flow for collections #3366

Merged

hvitved mentioned this pull request Jun 16, 2020

Data flow: Use accessPathLimit() in partial flow as well #3494

Merged

aschackmull added 2 commits June 17, 2020 15:40

Dataflow: Record content type for stores.

10b64fc

Dataflow: Sync.

d28b5ac

aschackmull and others added 4 commits June 17, 2020 17:03

Dataflow: minor review fixes.

543ab71

Dataflow: autoformat

cedfaf6

Dataflow: Fix qltest.

74eab3c

Merge pull request #2 from aschackmull/dataflow/content-type-tracking

ad56f17

Dataflow: Record content types

aschackmull approved these changes Jun 19, 2020

View reviewed changes

aschackmull merged commit 8107fba into github:master Jun 19, 2020

hvitved deleted the dataflow/precise-field-types branch June 19, 2020 09:50

max-schaefer pushed a commit to max-schaefer/codeql-go that referenced this pull request Jun 22, 2020

Data flow: Track precise types during field flow

ff842ca

cf github/codeql#3456

max-schaefer pushed a commit to max-schaefer/codeql-go that referenced this pull request Jun 22, 2020

Data flow: Track precise types during field flow

d3e6e5c

cf github/codeql#3456

max-schaefer mentioned this pull request Jun 22, 2020

Data flow: Track precise types during field flow github/codeql-go#223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data flow: Track precise types during field flow #3456

Data flow: Track precise types during field flow #3456

Uh oh!

hvitved commented May 12, 2020 •

edited

Loading

Uh oh!

jbj commented May 15, 2020

Uh oh!

hvitved commented May 18, 2020

Uh oh!

hvitved commented Jun 3, 2020

Uh oh!

hvitved commented Jun 15, 2020

Uh oh!

aschackmull commented Jun 16, 2020

Uh oh!

aschackmull commented Jun 17, 2020

Uh oh!

hvitved commented Jun 18, 2020

Uh oh!

hvitved commented Jun 18, 2020

Uh oh!

Uh oh!

Data flow: Track precise types during field flow #3456

Data flow: Track precise types during field flow #3456

Uh oh!

Conversation

hvitved commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Performance

Notes for review

Uh oh!

jbj commented May 15, 2020

Uh oh!

hvitved commented May 18, 2020

Uh oh!

hvitved commented Jun 3, 2020

Uh oh!

hvitved commented Jun 15, 2020

Uh oh!

aschackmull commented Jun 16, 2020

Uh oh!

aschackmull commented Jun 17, 2020

Uh oh!

hvitved commented Jun 18, 2020

Uh oh!

hvitved commented Jun 18, 2020

Uh oh!

Uh oh!

hvitved commented May 12, 2020 •

edited

Loading