-
Notifications
You must be signed in to change notification settings - Fork 250
[hail] Add ArrayZip IR node. #7830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Needs a bit more infrastructure before this can help deforest several more lowered IRs. timings: ``` $ hail-bench compare /tmp/pre.json /tmp/post.json Name Ratio Time 1 Time 2 ---- ----- ------ ------ matrix_table_array_arithmetic 81.2% 9.967 8.095 matrix_table_nested_annotate_rows_annotate_entries 23.6% 31.287 7.369 ```
So cool |
hail/python/hail/ir/ir.py
Outdated
return ArrayZip(children[:-1], self.names, children[-1], self.behavior) | ||
|
||
def head_str(self): | ||
return escape_id(self.behavior) + ' ({})'.format(' '.join(map(escape_id, self.names))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably wouldn't make a noticeable difference, but I think f-strings are significantly faster than any alternative like format
or +
, so as a matter of style we should maybe get in the habit of doing string building using f-strings wherever possible? (So in this case f'{ escape_id(self.behavior) } ({ ' '.join(map(escape_id, self.names)) })
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of f-strings, just not sure what to think about having 2 different styles in the same file. I'll change it.
case Yield(elt, s) => | ||
ab += ((elt, s)) | ||
loop(i + 1, ab) | ||
case EOS => k(EOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull k(EOS)
out into a join point to avoid code duplication.
val eltm = fb.newField[Boolean](name + "_missing") | ||
val eltv = fb.newField(name)(ti) | ||
(t, ti, eltm, eltv) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer with the definition of mv
moved outside the EmitStream.zip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got rid of mv
per your other comment.
EmitStream.zip[Any](streams, behavior, { (xs, k) => | ||
val mv = names.zip(childEltTypes).map { case (name, t) => | ||
val ti = typeToTypeInfo(t) | ||
val eltm = fb.newField[Boolean](name + "_missing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that we need the eltm
and eltv
fields. Can we just pass the missingness and values from the emit triplets generated by the stream to the body env?
GetField(Ref("global", gt), colsFieldName)), | ||
FastIndexedSeq("g", "sa"), | ||
Subst(lower(newEntries, ab), BindingEnv(Env( | ||
"global" -> SelectFields(Ref("global", gt), child.typ.globalType.fieldNames), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for doing a substitution (requiring a complete traversal of the subtree) rather than just a let?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to do this, but it caused problems, possibly due to some weird interaction between shadowing and optimization. I'd have to dig in to remember exactly why.
@@ -141,6 +141,8 @@ object Simplify { | |||
|
|||
case x@ArrayMap(NA(_), _, _) => NA(x.typ) | |||
|
|||
case ArrayZip(as, names, body, _) if as.length == 1 => ArrayMap(as.head, names.head, body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also do
case ArrayMap(ArrayZip(as, names, body1, b), name, body2) =>
ArrayZip(as, names, Let(name, body1, body2), b)
to allow optimizations across body1
and body2
.
If one of the as
is an ArrayZip
, we can fuse them, but I'm not sure how to write that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call. Fusing zips doesn't seem hard but will leave that for another PR.
@@ -251,6 +251,9 @@ object TypeCheck { | |||
case x@ArrayMap(a, name, body) => | |||
assert(a.typ.isInstanceOf[TStreamable]) | |||
assert(x.elementTyp == body.typ) | |||
case x@ArrayZip(as, names, body, _) => | |||
assert(as.length == names.length) | |||
assert(as.forall(_.typ.isInstanceOf[TStreamable])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check x.elementTyp == body.typ
like ArrayMap
does? Or should TypeCheck
not overlap with InferType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, good thing to check here I guess. It's a bit unclear what's the right code for typecheck vs infer.
I got a bad local variable compilation error without the fields for the bindings. I suspect this is a locals/fields problem elsewhere but don't want to debug right now, so reverted. |
Needs a bit more infrastructure before this can help deforest several
more lowered IRs.
timings: