Skip to content

Java: Implement union type flow and replace ad-hoc variable tracking in dispatch #10334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Sep 14, 2022

Conversation

aschackmull
Copy link
Contributor

@aschackmull aschackmull commented Sep 7, 2022

This provides a proper replacement for the ad-hoc pre-SSA variable tracking that dispatch used to rely on.

Union types arising from disjunctive instanceof checks are left as future work also supported.

@aschackmull aschackmull requested a review from a team as a code owner September 7, 2022 13:11
@github-actions github-actions bot added the Java label Sep 7, 2022
@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Sep 13, 2022
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! I have some questions.

/**
* Holds if `n` has type `t` and this information is discarded, such that `t`
* might be a better type bound for nodes where `n` flows to. This only includes
* the best such bound for each node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment suggests that there is at most one best bound; could there be multiple (two types where neither is a sub type of the other)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to have multiple best bounds, yes. But it is fairly rare, to the point where we mostly ignore the possibility.

te = t.getErasure() and
not exists(RefType better |
typeFlowBaseCand(n, better) and
better != t and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this implied from the two disjuncts after the |?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trickiness is that there can be loops in the subtyping relation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

}

/**
* Holds if `ioe` checks `v`, its true-successor is `bb`, and `bb` has 2 or more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "2 or more" -> "multiple"

*/
private predicate instanceofDisjunct(InstanceOfExpr ioe, BasicBlock bb, BaseSsaVariable v) {
ioe.getExpr() = v.getAUse() and
strictcount(bb.getABBPredecessor()) > 1 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, is something like not exists(unique(...)) more efficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the additional negation, I'd guess not. And strictcount is certainly efficient enough that I think this sort of tweak would lie firmly in the realm of micro-optimisations that's best handled by the optimiser.

exists(ConditionBlock cb | cb.getCondition() = ioe and cb.getTestSuccessor(true) = bb)
}

/** Holds if `bb` is disjunctively guarded by two or more `instanceof` tests on `v`. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, multiple

| UnionTypes.java:11:5:11:5 | m | 2 | ConcurrentHashMap<String,String> | false |
| UnionTypes.java:11:5:11:5 | m | 2 | LinkedHashMap<String,String> | false |
| UnionTypes.java:26:10:26:10 | x | 2 | A2 | true |
| UnionTypes.java:26:10:26:10 | x | 2 | A3 | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because A3 has no subtypes - in those cases we just report upper bounds and don't bother with the lower bounds.

exprTypeFlow(arg, srctype, exact)
or
not exprTypeFlow(arg, _, _) and
exprUnionTypeFlow(arg, srctype, exact)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice with a data flow test that exhibits this; i.e., a call context that provides a disjunctive type bound, which is better than the disjunctive type bound inside the callee.

or
exists(TypeFlowNode mid | step(mid, n) and hasUnionTypeFlow(mid))
or
instanceofDisjunctionGuarded(n, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be part of unionTypeFlowBaseCand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because unionTypeFlowBaseCand consists of nodes that have a single bound that may downstream be combined to disjunctive bounds, whereas instanceofDisjunctionGuarded is already a disjunctive bound.

unionTypeFlow(n, weaker, false) and
t.getASupertype*() = weaker
|
exact = true or not weaker.getASupertype*() = t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why this restriction is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to determine whether we can reduce A | B | ... to just B | .... We can do this either if B is a strictly weaker type than A or if A and B are similar types only distinguished by A being exact and B being an upper bound. Note that we have to be careful not to drop both A and B at the same time if they are equivalent types (in that case we could drop one of them, but that requires an arbitrary choice, which is tricky, so we don't bother with that case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess this is again because sub typing is not acyclic.

@aschackmull
Copy link
Contributor Author

Looks awesome! I have some questions.

I think I've addressed everything.

@aschackmull aschackmull merged commit d713910 into github:main Sep 14, 2022
@aschackmull aschackmull deleted the java/uniontypeflow branch September 14, 2022 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants