Skip to content

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented May 12, 2020

Overview

This PR factors type information out of the language-specific Content entities, and instead uses the actual stores to record type information. That is, when tracking flow through a field-write, x.f = y, we use the type of x during the subsequent flow calculations, rather than the result of getContainerType() on the content entity corresponding to f.

This means that we are now able to eliminate flow like in the example below:

class FieldA
{
    public object Field;

    public virtual void M() { }

    public void CallM() => this.M();

    static void M1(FieldB b, FieldC c)
    {
        b.Field = new object();
        b.CallM(); // no flow            <- NEW
        c.Field = new object();
        c.CallM(); // flow
    }
}

class FieldB : FieldA { }

class FieldC : FieldA
{
    public override void M() => Sink(this.Field);
}

While this change may not have a big impact on field flow, it is crucial in the implementation of (typed) collection flow: If we kept type information in the Content class:

newtype TContent = ... or TCollectionContent(DataFlowType elementType)

then a generic element read such as

T GetFirstElement<T>(IList<T> l) => l[0];

would have to match all TCollectionContent(t) entities, which does not scale.

Performance

I have started Difference jobs for

I also ran @aschackmull 's tuple counting benchmark on JDK, which revealed that most relations had similar sizes; most notably, flowCandFwdConsCand increased by 56%, but from a mere 10711 tuples to 16710 tuples.

Notes for review

Commit-by-commit review is encouraged. The actual work happens in f1cd535; the other PRs deal with syncing, renaming, and follow-up changes.

@hvitved hvitved force-pushed the dataflow/precise-field-types branch 5 times, most recently from fd10536 to 7b2796b Compare May 14, 2020 12:13
@hvitved hvitved marked this pull request as ready for review May 14, 2020 12:48
@hvitved hvitved requested review from a team as code owners May 14, 2020 12:48
@hvitved hvitved force-pushed the dataflow/precise-field-types branch from 7b2796b to 2c243ad Compare May 14, 2020 13:59
@jbj
Copy link
Contributor

jbj commented May 15, 2020

I haven't looked at the code, but I note that the CPP-Differences job shows only one result change, which is caused by build-system wobble, and mostly unchanged performance. The slowdown on Wireshark may be significant, but it's not large enough to block this PR.

@hvitved
Copy link
Contributor Author

hvitved commented May 18, 2020

I haven't looked at the code, but I note that the CPP-Differences job shows only one result change, which is caused by build-system wobble, and mostly unchanged performance. The slowdown on Wireshark may be significant, but it's not large enough to block this PR.

👍 FYI, the C++ specific parts are in 2d7470f.

@hvitved
Copy link
Contributor Author

hvitved commented Jun 3, 2020

Ping @aschackmull.

@hvitved
Copy link
Contributor Author

hvitved commented Jun 15, 2020

Ping.

@aschackmull
Copy link
Contributor

I'm looking at this, but it's slow going, sorry. Found a bug in flowCandFwdConsCand to possibly explain the tuple explosion though, but I'll investigate further before reviewing in full.

@aschackmull
Copy link
Contributor

PR against this PR: hvitved#2

@hvitved
Copy link
Contributor Author

hvitved commented Jun 18, 2020

Updated Differences jobs:
C#: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/222/
C++: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1200/
Java: https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/785/

C++ and Java have completed, and they still show now change in neither performance nor results, as expected.

@hvitved
Copy link
Contributor Author

hvitved commented Jun 18, 2020

I had to restart the C# job (https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/224/), which also shows no change in performance nor results.

@aschackmull aschackmull merged commit 8107fba into github:master Jun 19, 2020
@hvitved hvitved deleted the dataflow/precise-field-types branch June 19, 2020 09:50
max-schaefer pushed a commit to max-schaefer/codeql-go that referenced this pull request Jun 22, 2020
max-schaefer pushed a commit to max-schaefer/codeql-go that referenced this pull request Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants