Skip to content

[hail] Add ArrayZip IR node. #7830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 15, 2020
Merged

[hail] Add ArrayZip IR node. #7830

merged 6 commits into from
Jan 15, 2020

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Jan 9, 2020

Needs a bit more infrastructure before this can help deforest several
more lowered IRs.

timings:

$ hail-bench compare /tmp/pre.json /tmp/post.json
                                              Name      Ratio    Time 1    Time 2
                                              ----      -----    ------    ------
                     matrix_table_array_arithmetic      81.2%     9.967     8.095
matrix_table_nested_annotate_rows_annotate_entries      23.6%    31.287     7.369

Needs a bit more infrastructure before this can help deforest several
more lowered IRs.

timings:
```
$ hail-bench compare /tmp/pre.json /tmp/post.json
                                              Name      Ratio    Time 1    Time 2
                                              ----      -----    ------    ------
                     matrix_table_array_arithmetic      81.2%     9.967     8.095
matrix_table_nested_annotate_rows_annotate_entries      23.6%    31.287     7.369
```
@akotlar
Copy link
Contributor

akotlar commented Jan 9, 2020

So cool

return ArrayZip(children[:-1], self.names, children[-1], self.behavior)

def head_str(self):
return escape_id(self.behavior) + ' ({})'.format(' '.join(map(escape_id, self.names)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably wouldn't make a noticeable difference, but I think f-strings are significantly faster than any alternative like format or +, so as a matter of style we should maybe get in the habit of doing string building using f-strings wherever possible? (So in this case f'{ escape_id(self.behavior) } ({ ' '.join(map(escape_id, self.names)) }))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of f-strings, just not sure what to think about having 2 different styles in the same file. I'll change it.

case Yield(elt, s) =>
ab += ((elt, s))
loop(i + 1, ab)
case EOS => k(EOS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull k(EOS) out into a join point to avoid code duplication.

val eltm = fb.newField[Boolean](name + "_missing")
val eltv = fb.newField(name)(ti)
(t, ti, eltm, eltv)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer with the definition of mv moved outside the EmitStream.zip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got rid of mv per your other comment.

EmitStream.zip[Any](streams, behavior, { (xs, k) =>
val mv = names.zip(childEltTypes).map { case (name, t) =>
val ti = typeToTypeInfo(t)
val eltm = fb.newField[Boolean](name + "_missing")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that we need the eltm and eltv fields. Can we just pass the missingness and values from the emit triplets generated by the stream to the body env?

GetField(Ref("global", gt), colsFieldName)),
FastIndexedSeq("g", "sa"),
Subst(lower(newEntries, ab), BindingEnv(Env(
"global" -> SelectFields(Ref("global", gt), child.typ.globalType.fieldNames),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for doing a substitution (requiring a complete traversal of the subtree) rather than just a let?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do this, but it caused problems, possibly due to some weird interaction between shadowing and optimization. I'd have to dig in to remember exactly why.

@@ -141,6 +141,8 @@ object Simplify {

case x@ArrayMap(NA(_), _, _) => NA(x.typ)

case ArrayZip(as, names, body, _) if as.length == 1 => ArrayMap(as.head, names.head, body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also do

case ArrayMap(ArrayZip(as, names, body1, b), name, body2) =>
  ArrayZip(as, names, Let(name, body1, body2), b)

to allow optimizations across body1 and body2.

If one of the as is an ArrayZip, we can fuse them, but I'm not sure how to write that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. Fusing zips doesn't seem hard but will leave that for another PR.

@@ -251,6 +251,9 @@ object TypeCheck {
case x@ArrayMap(a, name, body) =>
assert(a.typ.isInstanceOf[TStreamable])
assert(x.elementTyp == body.typ)
case x@ArrayZip(as, names, body, _) =>
assert(as.length == names.length)
assert(as.forall(_.typ.isInstanceOf[TStreamable]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check x.elementTyp == body.typ like ArrayMap does? Or should TypeCheck not overlap with InferType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, good thing to check here I guess. It's a bit unclear what's the right code for typecheck vs infer.

@tpoterba
Copy link
Contributor Author

I got a bad local variable compilation error without the fields for the bindings. I suspect this is a locals/fields problem elsewhere but don't want to debug right now, so reverted.

@danking danking merged commit 1ee3209 into hail-is:master Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants