Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Merged
merged 3 commits into from Jan 23, 2019

Conversation

Projects
None yet
4 participants
@tpoterba
Copy link
Collaborator

commented Jan 18, 2019

Fix deoptimization in Simplify.

@patrick-schultz

This comment has been minimized.

Copy link
Collaborator

commented Jan 18, 2019

Can you explain what the problem was?

@tpoterba

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 18, 2019

It wasn't scanning the full dataset anymore, but:

table.head().flatten() was generating a TableOrderBy(TableKeyBy(TableHead)).

There was no way to remove this node, even if the table was already keyed by the sort fields, so we ended up doing an extra scan and possibly shuffle.

This change simplifies the whole thing, and emits the correct IR from the beginning

@@ -429,7 +428,9 @@ object Simplify {
TableMapGlobals(TableHead(child, n), newGlobals)

case TableHead(TableOrderBy(child, sortFields), n)
if sortFields.forall(_.sortOrder == Ascending) && n < 256 && canRepartition =>
if sortFields.forall(_.sortOrder == Ascending)
&& child.typ.key != sortFields.map(_.field)

This comment has been minimized.

Copy link
@cseed

cseed Jan 19, 2019

Collaborator

This is too strict. It should match the condition in table order by: that the sort fields are an prefix of the key.

This comment has been minimized.

Copy link
@cseed

cseed Jan 19, 2019

Collaborator

Which maybe you should break out as a separate function on object TableOrderBy and call in both places.

@tpoterba tpoterba force-pushed the tpoterba:fix-show-optimization branch from 3bc87a5 to 4807127 Jan 20, 2019

addressed

@@ -319,8 +319,7 @@ object Simplify {
TableFilter(t,
ApplySpecial("&&", Array(p1, p2)))

case TableOrderBy(child, sortFields) if sortFields.isEmpty =>
child
case TableOrderBy(TableKeyBy(child, _, _), sortFields) => TableOrderBy(child, sortFields)

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Jan 22, 2019

Collaborator

I don't think this is general enough. What about adding:

case TableOrderBy(child, sortFields)
  if TableOrderBy.isAlreadyOrdered(sortFields, child.rvdType.key) =>
  TableKeyBy(child, Array(), false)
case TableKeyBy(TableKeyBy(child, sortFields, false), IndexedSeq(), _) =>
  TableOrderBy(child, sortFields)

This comment has been minimized.

Copy link
@tpoterba

tpoterba Jan 22, 2019

Author Collaborator

The spec in the google doc wouldn't allow for either of these rewrites. Since we can rewrite a TableKeyBy(TableKeyBy(child, _), newKey) as TableKeyBy(child, newKey), the first would lead to optimization totally blowing away the order. We can't remove TableOrderBy nodes, even if a KeyBy substitution in-place may have the same semantics.

The latter is also a deoptimization - keying by an empty key doesn't guarantee a stable sort, so we don't actually have to do the inner keyBy at all.

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Jan 23, 2019

Collaborator

Yup, you're right. This change makes me a little uncomfortable, because the interaction between TableOrdeBy and TableKeyBy is now more complicated. I was trying to find a normalizing set of rewrite rules to handle that interaction (the second rule was only to make it confluent). I'll approve for now but I'll keep thinking about it.

@danking danking merged commit 814d2aa into hail-is:master Jan 23, 2019

1 check passed

hail-ci-0-1 successful build
Details

@tpoterba tpoterba deleted the tpoterba:fix-show-optimization branch Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.