Skip to content

Conversation

dave-bartolomeo
Copy link
Contributor

@dave-bartolomeo dave-bartolomeo commented Mar 1, 2019

The diffs in the SSAConstruction.qll are pretty extensive. While the overall approach is similar to what was there before, you're probably better off just looking at the new implementation without trying to worry about line-by-line differences from the old implementation.

If you look at the diffs to the test expectations commit-by-commit, you'll see how the newly-generated SSA differs from the previous one. Primarily, you'll see uses getting hooked up to exact definitions if possible, instead of always being hooked up to the result of a Chi. You'll also see Phi instructions inserted for fields, instead of just for entire virtual variables.

Most of the changes are described in the markdown files that I've added alongside the code.

@dave-bartolomeo dave-bartolomeo added this to the 1.20 milestone Mar 1, 2019
@dave-bartolomeo dave-bartolomeo requested review from jbj and rdmarsh2 March 1, 2019 01:05
@jbj
Copy link
Contributor

jbj commented Mar 1, 2019

I looked at the changes to Operand and test cases, and they LGTM. Some of the changes don't seem to show up in test results, so maybe we need more tests. The new def-use implementation is going to be hard to review.

@aibaars aibaars changed the base branch from master to rc/1.20 March 1, 2019 17:54
@dave-bartolomeo
Copy link
Contributor Author

I've pushed new changes, mostly to SSAConstruction.qll. SSA construction is now grouped into separate modules for Phi Insertion and Use/Def connection, which will hopefully make it easier to review. Each module has an overview comment that outlines the general approach used. Also, I believe that all predicates in SSA construction now have QLDoc.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few more superficial comments. Let's arrange to review this with screen sharing. I'm also interested in performance.

IntValue endBitOffset) {
resultPointsTo(instr.getResultAddressOperand().getDefinitionInstruction(), var, startBitOffset) and
type = instr.getResultType() and
if exists(instr.getResultSize()) then
endBitOffset = Ints::add(startBitOffset, Ints::mul(instr.getResultSize(), 8))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Ints::mul and not just *? Also below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should probably remain. The startBitOffset bound by resultPointsTo() could be the result of arbitrary pointer arithmetic. Overflow is unlikely, but possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Ints::add should remain, but my question was about Ints::mul.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow in the mul calculation is even less likely, but still theoretically possible if the result of the instruction is a very large array type. Is there likely to be a significant performance cost from this use of mul? If so, it's probably OK to remove it, but if there's not likely to be much performance difference, I'd prefer to keep it for (pedantic) correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hasResultMemoryAccess is fast, then it should be fine.

@dave-bartolomeo
Copy link
Contributor Author

I've added a .md file that gives an overview of the whole SSA construction process, from alias analysis to the memory model to SSA itself. That doc is missing the details on the actual connection of defs to uses, but I wanted to get the other parts of it into the PR to give the necessary background to reviewers.

definitionReachesRank(vvar, defBlock, defRank, useRank)
/**
* Holds if the specified `useLocation` is live on entry to `block`. This holds if there is a use of `useLocation`
* that is reachable from the start of `block` without passing through a definition that overlaps `useLocation`.
Copy link
Contributor

@rdmarsh2 rdmarsh2 Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "without passing through a definition that totally overlaps useLocation"? If not, can you add a clarification about how this interacts with Chi nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification added

@dave-bartolomeo
Copy link
Contributor Author

Initial perf info on ChakraCore: Running constant_func.ql, which consumes aliased SSA, the total slowdown was <3%, which is actually much faster than I expected. However, it doesn't seem to be improving the precision of the use/def connections as much as I expected it to. I'm investigating that now.

Wireshark numbers to come.

@jbj
Copy link
Contributor

jbj commented Mar 12, 2019

We decided yesterday that this is not ready for 1.20, so I'll remove that milestone. We can retarget the PR when there's been a mergeback.

@jbj jbj removed this from the 1.20 milestone Mar 12, 2019
@jbj
Copy link
Contributor

jbj commented Mar 12, 2019

The mergeback #1082 just went in, so I'll try to retarget this PR to master.

@jbj jbj changed the base branch from rc/1.20 to master March 12, 2019 14:01
@dave-bartolomeo dave-bartolomeo force-pushed the dave/PreciseDefs branch 2 times, most recently from 794e1f6 to 7879a53 Compare March 13, 2019 21:27
@dave-bartolomeo dave-bartolomeo force-pushed the dave/PreciseDefs branch 2 times, most recently from 6618503 to 80ad11b Compare April 9, 2019 18:06
@dave-bartolomeo dave-bartolomeo marked this pull request as ready for review April 9, 2019 18:15
@dave-bartolomeo dave-bartolomeo requested a review from a team as a code owner April 9, 2019 18:15
@dave-bartolomeo dave-bartolomeo requested review from jbj and rdmarsh2 April 9, 2019 18:15
@jbj
Copy link
Contributor

jbj commented Apr 9, 2019

Can you summarise what has changed since I last looked at this?

@dave-bartolomeo
Copy link
Contributor Author

There have been few real changes. Mostly, it's a rebase, plus updating test expectations for IR construction improvements that have been merged since the previous review. The dataflow test expectation updates do show a couple concrete examples of where we get better results than the AST-based dataflow because we can model field accesses more precisely.

@dave-bartolomeo
Copy link
Contributor Author

OK, now there's been one significant change: in an IR dump, an operand that is not an exact use of its definition instruction (e.g. load of a field defined by a store of the entire struct) now has a "~" prefix. I had implemented this earlier while working on this PR, but only enabled it just now to avoid polluting the IR diffs caused by real SSA changes.

@jbj
Copy link
Contributor

jbj commented Apr 10, 2019

There are still two unaddressed comments from earlier reviews by @rdmarsh2 and me. Also, you promised performance testing on Wireshark. I think this kind of change ought to be benchmarked on a handful of big snapshots since it may be sensitive to specific patterns that confuse the alias analysis in very large functions.

@dave-bartolomeo
Copy link
Contributor Author

Latest run on Wireshark shows an 8% overall slowdown to evaluate all the way to aliased SSA. This is more than the 2.5% slowdown in ChakraCore, but still consistent with Wireshark's reputation as a stress test for this sort of change.

@dave-bartolomeo
Copy link
Contributor Author

@rdmarsh2 I believe this is ready for final approval and merge, if the 8% slowdown on Wireshark seems reasonable.

@rdmarsh2
Copy link
Contributor

Have you looked at the change in the most expensive predicates with and without this PR? I think we can merge with 8%, but we shouldn't introduce low-hanging fruit for performance improvements.

@jbj
Copy link
Contributor

jbj commented May 2, 2019

I think I remember Dave saying that no single predicate stands out as being slow(er). Is that right?

@dave-bartolomeo
Copy link
Contributor Author

There are a couple SSA predicates that show up near the top on Wireshark. SSAConstruction::DefUse::definitionReachesRank#ffff in particular takes ~3x as long as the next predicate. However, that same predicate isn't even the top predicate on ChakraCore, where the timing distribution across the top 10 or so predicates is much more flat. I think that based on our discussion at the most recent stream meeting, this PR is ready to go, and we'll do more in-depth performance investigation on Wireshark once we've fixed or worked around enough of the invalid AST/IR problems that we can believe that our perf numbers are representative.

@jbj jbj merged commit 82a6629 into github:master May 3, 2019
@jbj
Copy link
Contributor

jbj commented May 8, 2019

@dave-bartolomeo I've investigated the slowness in definitionReachesRank and filed https://jira.semmle.com/browse/CPP-390.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants