Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] optimize flatten #7719

Merged
merged 2 commits into from Dec 13, 2019
Merged

[hail] optimize flatten #7719

merged 2 commits into from Dec 13, 2019

Conversation

@catoverdrive
Copy link
Collaborator

catoverdrive commented Dec 12, 2019

Looking at the IR generated by table.flatten, this snippet:

>>> import hail as hl
>>> t = hl.utils.range_table(10)
>>> t2 = t.annotate(**{f'f{i}': i for i in range(5)})
>>> t2.flatten().collect()

generates the following IR:

(GetField rows
  (TableCollect
    (TableMapRows
      (TableOrderBy (Aidx)
        (TableMapRows
          (TableRange 10 8)
          (InsertFields
            (SelectFields (idx)
              (Ref row))
            None
            (f0
              (I32 0))
            (f1
              (I32 1))
            (f2
              (I32 2))
            (f3
              (I32 3))
            (f4
              (I32 4)))))
      (Let __uid_3
        (Ref row)
        (InsertFields
          (SelectFields ()
            (SelectFields (idx f0 f1 f2 f3 f4)
              (Ref row)))
          None
          (idx
            (GetField idx
              (Ref __uid_3)))
          (f0
            (GetField f0
              (Ref __uid_3)))
          (f1
            (GetField f1
              (Ref __uid_3)))
          (f2
            (GetField f2
              (Ref __uid_3)))
          (f3
            (GetField f3
              (Ref __uid_3)))
          (f4
            (GetField f4
              (Ref __uid_3))))))))

If we look at the last TableMapRows IR, the entire thing (Let __uid_3 …) is entirely a no-op, but we're still compiling and generating code for the (post-optimization) IR:

(InsertFields
  (SelectFields ()
    (Ref row))
  None
  (idx
    (GetField idx
      (Ref row)))
  (f0
    (GetField f0
      (Ref row)))
  (f1
    (GetField f1
      (Ref row)))
  (f2
    (GetField f2
      (Ref row)))
  (f3
    (GetField f3
      (Ref row)))
  (f4
    (GetField f4
      (Ref row))))

(cc @tpoterba I added a second ForwardLets in Optimize before the Simplify, although I'm not sure that's actually the correct place to put it; in this case, I think it may eventually come out in the wash given how many passes we make through any given pipeline, but I've noticed that currently our python tends to generate IR of the form:

(TableMapRows
  (Let __uid_n
    (Ref row)
    <mapped value, sometimes using (Ref __uid_n) and sometimes (Ref row)>

and that redundant binding at the top level means that the first Simplify pass misses quite a few optimizations! I'm not super attached to leaving it there, but I do think we might want to consider forwarding Lets on any IRs from python before optimization.)

@@ -22,6 +22,7 @@ object Optimize {
last = ir
runOpt(FoldConstants(_), iter, "FoldConstants")
runOpt(ExtractIntervalFilters(_), iter, "ExtractIntervalFilters")
runOpt(ForwardLets(_), iter, "ForwardLets")

This comment has been minimized.

Copy link
@tpoterba

tpoterba Dec 12, 2019

Collaborator

would prefer to remove this, then good to go

@danking danking merged commit 9b7031c into hail-is:master Dec 13, 2019
1 check passed
1 check passed
ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.