-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Java: Implement union type flow and replace ad-hoc variable tracking in dispatch #10334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
73363c1
to
b8a1818
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! I have some questions.
/** | ||
* Holds if `n` has type `t` and this information is discarded, such that `t` | ||
* might be a better type bound for nodes where `n` flows to. This only includes | ||
* the best such bound for each node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment suggests that there is at most one best bound; could there be multiple (two types where neither is a sub type of the other)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to have multiple best bounds, yes. But it is fairly rare, to the point where we mostly ignore the possibility.
te = t.getErasure() and | ||
not exists(RefType better | | ||
typeFlowBaseCand(n, better) and | ||
better != t and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this implied from the two disjuncts after the |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trickiness is that there can be loops in the subtyping relation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
} | ||
|
||
/** | ||
* Holds if `ioe` checks `v`, its true-successor is `bb`, and `bb` has 2 or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "2 or more" -> "multiple"
*/ | ||
private predicate instanceofDisjunct(InstanceOfExpr ioe, BasicBlock bb, BaseSsaVariable v) { | ||
ioe.getExpr() = v.getAUse() and | ||
strictcount(bb.getABBPredecessor()) > 1 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget, is something like not exists(unique(...))
more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the additional negation, I'd guess not. And strictcount
is certainly efficient enough that I think this sort of tweak would lie firmly in the realm of micro-optimisations that's best handled by the optimiser.
exists(ConditionBlock cb | cb.getCondition() = ioe and cb.getTestSuccessor(true) = bb) | ||
} | ||
|
||
/** Holds if `bb` is disjunctively guarded by two or more `instanceof` tests on `v`. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, multiple
| UnionTypes.java:11:5:11:5 | m | 2 | ConcurrentHashMap<String,String> | false | | ||
| UnionTypes.java:11:5:11:5 | m | 2 | LinkedHashMap<String,String> | false | | ||
| UnionTypes.java:26:10:26:10 | x | 2 | A2 | true | | ||
| UnionTypes.java:26:10:26:10 | x | 2 | A3 | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because A3
has no subtypes - in those cases we just report upper bounds and don't bother with the lower bounds.
exprTypeFlow(arg, srctype, exact) | ||
or | ||
not exprTypeFlow(arg, _, _) and | ||
exprUnionTypeFlow(arg, srctype, exact) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice with a data flow test that exhibits this; i.e., a call context that provides a disjunctive type bound, which is better than the disjunctive type bound inside the callee.
or | ||
exists(TypeFlowNode mid | step(mid, n) and hasUnionTypeFlow(mid)) | ||
or | ||
instanceofDisjunctionGuarded(n, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this rather be part of unionTypeFlowBaseCand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because unionTypeFlowBaseCand
consists of nodes that have a single bound that may downstream be combined to disjunctive bounds, whereas instanceofDisjunctionGuarded
is already a disjunctive bound.
unionTypeFlow(n, weaker, false) and | ||
t.getASupertype*() = weaker | ||
| | ||
exact = true or not weaker.getASupertype*() = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why this restriction is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to determine whether we can reduce A | B | ...
to just B | ...
. We can do this either if B
is a strictly weaker type than A
or if A
and B
are similar types only distinguished by A
being exact and B
being an upper bound. Note that we have to be careful not to drop both A
and B
at the same time if they are equivalent types (in that case we could drop one of them, but that requires an arbitrary choice, which is tricky, so we don't bother with that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess this is again because sub typing is not acyclic.
I think I've addressed everything. |
This provides a proper replacement for the ad-hoc pre-SSA variable tracking that dispatch used to rely on.
Union types arising from disjunctive instanceof checks are
left as future workalso supported.