Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] optimize flatten #7719

Merged
merged 2 commits into from Dec 13, 2019
Merged

[hail] optimize flatten #7719

merged 2 commits into from Dec 13, 2019

Conversation

catoverdrive
Copy link
Contributor

@catoverdrive catoverdrive commented Dec 12, 2019

Looking at the IR generated by table.flatten, this snippet:

>>> import hail as hl
>>> t = hl.utils.range_table(10)
>>> t2 = t.annotate(**{f'f{i}': i for i in range(5)})
>>> t2.flatten().collect()

generates the following IR:

(GetField rows
  (TableCollect
    (TableMapRows
      (TableOrderBy (Aidx)
        (TableMapRows
          (TableRange 10 8)
          (InsertFields
            (SelectFields (idx)
              (Ref row))
            None
            (f0
              (I32 0))
            (f1
              (I32 1))
            (f2
              (I32 2))
            (f3
              (I32 3))
            (f4
              (I32 4)))))
      (Let __uid_3
        (Ref row)
        (InsertFields
          (SelectFields ()
            (SelectFields (idx f0 f1 f2 f3 f4)
              (Ref row)))
          None
          (idx
            (GetField idx
              (Ref __uid_3)))
          (f0
            (GetField f0
              (Ref __uid_3)))
          (f1
            (GetField f1
              (Ref __uid_3)))
          (f2
            (GetField f2
              (Ref __uid_3)))
          (f3
            (GetField f3
              (Ref __uid_3)))
          (f4
            (GetField f4
              (Ref __uid_3))))))))

If we look at the last TableMapRows IR, the entire thing (Let __uid_3 …) is entirely a no-op, but we're still compiling and generating code for the (post-optimization) IR:

(InsertFields
  (SelectFields ()
    (Ref row))
  None
  (idx
    (GetField idx
      (Ref row)))
  (f0
    (GetField f0
      (Ref row)))
  (f1
    (GetField f1
      (Ref row)))
  (f2
    (GetField f2
      (Ref row)))
  (f3
    (GetField f3
      (Ref row)))
  (f4
    (GetField f4
      (Ref row))))

(cc @tpoterba I added a second ForwardLets in Optimize before the Simplify, although I'm not sure that's actually the correct place to put it; in this case, I think it may eventually come out in the wash given how many passes we make through any given pipeline, but I've noticed that currently our python tends to generate IR of the form:

(TableMapRows
  (Let __uid_n
    (Ref row)
    <mapped value, sometimes using (Ref __uid_n) and sometimes (Ref row)>

and that redundant binding at the top level means that the first Simplify pass misses quite a few optimizations! I'm not super attached to leaving it there, but I do think we might want to consider forwarding Lets on any IRs from python before optimization.)

tpoterba
tpoterba previously requested changes Dec 12, 2019
@@ -22,6 +22,7 @@ object Optimize {
last = ir
runOpt(FoldConstants(_), iter, "FoldConstants")
runOpt(ExtractIntervalFilters(_), iter, "ExtractIntervalFilters")
runOpt(ForwardLets(_), iter, "ForwardLets")
Copy link
Collaborator

@tpoterba tpoterba Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to remove this, then good to go

@danking danking merged commit 9b7031c into hail-is:master Dec 13, 2019
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants