-
Notifications
You must be signed in to change notification settings - Fork 251
[compiler] rewrite ExtractIntervalFilters to be more robust #13355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def meet(l: Value, r: Value): Value | ||
} | ||
|
||
object AbstactLattice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing an r
: AbstractLattice
# Conflicts: # hail/python/test/hail/extract_intervals/test_key_prefix.py # hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala # hail/testng.xml
ad01276
to
4248416
Compare
# Conflicts: # hail/src/main/scala/is/hail/expr/ir/lowering/LowerDistributedSort.scala # hail/src/test/scala/is/hail/expr/ir/BlockMatrixIRSuite.scala # hail/src/test/scala/is/hail/expr/ir/ExtractIntervalFiltersSuite.scala # hail/src/test/scala/is/hail/expr/ir/IRSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool and like this change a lot. Sorry it's taken a while to post a review.
Most of my comments are on implementation details. I have no real gripes with the design.
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
hail/src/main/scala/is/hail/expr/ir/ExtractIntervalFilters.scala
Outdated
Show resolved
Hide resolved
eba3393
to
de36c88
Compare
KeySetLattice.meet(x.naBound, acc) | ||
} | ||
BoolValue(trueBound, falseBound, naBound) | ||
aVals.asInstanceOf[Seq[BoolValue]].reduce(BoolValue.coalesce) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
CHANGELOG: make hail's optimization rewriting filters to interval-filters smarter and more robust
Completely rewrites ExtractIntervalFilters. Instead of matching against very specific patterns, and failing completely for things that don't quite match (e.g. an input is let bound, or the fold implementing "locus is contained in a set of intervals" is written slightly differently), this uses a standard abstract interpretation framework, which is almost completely insensitive to the form of the IR, only depending on the semantics. It also correctly handles missing key fields, where the previous implementation often produced an unsound transformation of the IR.
Also adds a much more thorough test suite than we had before.
At the top level, the analysis takes a boolean typed IR
cond
in an environment where there is a reference to somekey
, and produces a setintervals
, such thatcond
is equivalent tocond & intervals.contains(key)
(in other wordscond
impliesintervals.contains(key)
, orintervals
contains all rows wherecond
is true). This means for instance it is safe to replaceTableFilter(t, cond)
withTableFilter(TableFilterIntervals(t, intervals), cond)
.Then in a second pass it rewrites
cond
tocond2
, such thatcond & (intervals.contains(key))
is equivalent tocond2 & intervals.contains(key)
(in other wordscond
impliescond2
, andcond2 & intervals.contains(key)
impliescond
). This means it is safe to replace theTableFilter(t, cond)
withTableFilter(TableFilterIntervals(t, intervals), cond2)
. A common example is whencond
can be completely captured by the interval filter, i.e.cond
is equivant tointervals.contains(key)
, in which case we can takecond2 = True
, and theTableFilter
can be optimized away.This all happens in the function
trueSet
is the set of intervals which contains all rows wherecond
is true. This set is passed back intoanalyze
in a second pass, which asks it to rewritecond
to something equivalent, under the assumption that all keys are contained intrueSet
.The abstraction of runtime values tracks two types of information:
k
, thenk
must be contained in the "true" set of intervals. But it's completely fine if the set of intervals contains keys of rows where the bool is not true. In particular, a boolean about which we know nothing (e.g. it's just some non-key boolean field in the dataset) is represented by an abstract boolean value where all three sets are the set of all keys.