Fix show problem by changing TableOrderBy to accept keyed tables. #5172

tpoterba · 2019-01-18T18:09:26Z

Fix deoptimization in Simplify.

patrick-schultz · 2019-01-18T20:26:08Z

Can you explain what the problem was?

tpoterba · 2019-01-18T21:47:39Z

It wasn't scanning the full dataset anymore, but:

table.head().flatten() was generating a TableOrderBy(TableKeyBy(TableHead)).

There was no way to remove this node, even if the table was already keyed by the sort fields, so we ended up doing an extra scan and possibly shuffle.

This change simplifies the whole thing, and emits the correct IR from the beginning

cseed · 2019-01-19T01:15:52Z

hail/src/main/scala/is/hail/expr/ir/Simplify.scala

@@ -429,7 +428,9 @@ object Simplify {
      TableMapGlobals(TableHead(child, n), newGlobals)

    case TableHead(TableOrderBy(child, sortFields), n)
-      if sortFields.forall(_.sortOrder == Ascending) && n < 256 && canRepartition =>
+      if sortFields.forall(_.sortOrder == Ascending)
+        && child.typ.key != sortFields.map(_.field)


This is too strict. It should match the condition in table order by: that the sort fields are an prefix of the key.

Which maybe you should break out as a separate function on object TableOrderBy and call in both places.

Fix deoptimization in Simplify.

addressed

patrick-schultz · 2019-01-22T20:30:17Z

hail/src/main/scala/is/hail/expr/ir/Simplify.scala

@@ -319,8 +319,7 @@ object Simplify {
      TableFilter(t,
        ApplySpecial("&&", Array(p1, p2)))

-    case TableOrderBy(child, sortFields) if sortFields.isEmpty =>
-      child
+    case TableOrderBy(TableKeyBy(child, _, _), sortFields) => TableOrderBy(child, sortFields)


I don't think this is general enough. What about adding:

case TableOrderBy(child, sortFields) if TableOrderBy.isAlreadyOrdered(sortFields, child.rvdType.key) => TableKeyBy(child, Array(), false) case TableKeyBy(TableKeyBy(child, sortFields, false), IndexedSeq(), _) => TableOrderBy(child, sortFields)

The spec in the google doc wouldn't allow for either of these rewrites. Since we can rewrite a TableKeyBy(TableKeyBy(child, _), newKey) as TableKeyBy(child, newKey), the first would lead to optimization totally blowing away the order. We can't remove TableOrderBy nodes, even if a KeyBy substitution in-place may have the same semantics.

The latter is also a deoptimization - keying by an empty key doesn't guarantee a stable sort, so we don't actually have to do the inner keyBy at all.

Yup, you're right. This change makes me a little uncomfortable, because the interaction between TableOrdeBy and TableKeyBy is now more complicated. I was trying to find a normalizing set of rewrite rules to handle that interaction (the second rule was only to make it confluent). I'll approve for now but I'll keep thinking about it.

addressed

tpoterba assigned patrick-schultz Jan 18, 2019

cseed previously requested changes Jan 19, 2019

View reviewed changes

tpoterba added 3 commits January 20, 2019 00:26

Fix show problem by changing TableOrderBy to accept keyed tables.

86507af

Fix deoptimization in Simplify.

fix prune

c9aa8a8

address

4807127

tpoterba force-pushed the fix-show-optimization branch from 3bc87a5 to 4807127 Compare January 20, 2019 05:26

patrick-schultz previously requested changes Jan 22, 2019

View reviewed changes

patrick-schultz approved these changes Jan 23, 2019

View reviewed changes

danking merged commit 814d2aa into hail-is:master Jan 23, 2019

tpoterba deleted the fix-show-optimization branch January 23, 2019 13:56

tpoterba restored the fix-show-optimization branch November 7, 2019 17:05

This was referenced Apr 29, 2024

Hail 0.2.10 patch notes iris-garden/test-process#1662

Closed

Hail 0.2.10 patch notes iris-garden/test-process#2248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

tpoterba commented Jan 18, 2019

patrick-schultz commented Jan 18, 2019

tpoterba commented Jan 18, 2019

cseed Jan 19, 2019

cseed Jan 19, 2019

patrick-schultz Jan 22, 2019

tpoterba Jan 22, 2019

patrick-schultz Jan 23, 2019 •

edited

Loading

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Conversation

tpoterba commented Jan 18, 2019

patrick-schultz commented Jan 18, 2019

tpoterba commented Jan 18, 2019

cseed Jan 19, 2019

Choose a reason for hiding this comment

cseed Jan 19, 2019

Choose a reason for hiding this comment

patrick-schultz Jan 22, 2019

Choose a reason for hiding this comment

tpoterba Jan 22, 2019

Choose a reason for hiding this comment

patrick-schultz Jan 23, 2019 • edited Loading

Choose a reason for hiding this comment

patrick-schultz Jan 23, 2019 •

edited

Loading