Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Merged
merged 3 commits into from Jan 23, 2019

Conversation

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Jan 18, 2019

Fix deoptimization in Simplify.

@patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented Jan 18, 2019

Can you explain what the problem was?

@tpoterba
Copy link
Collaborator Author

@tpoterba tpoterba commented Jan 18, 2019

It wasn't scanning the full dataset anymore, but:

table.head().flatten() was generating a TableOrderBy(TableKeyBy(TableHead)).

There was no way to remove this node, even if the table was already keyed by the sort fields, so we ended up doing an extra scan and possibly shuffle.

This change simplifies the whole thing, and emits the correct IR from the beginning

@@ -429,7 +428,9 @@ object Simplify {
TableMapGlobals(TableHead(child, n), newGlobals)

case TableHead(TableOrderBy(child, sortFields), n)
if sortFields.forall(_.sortOrder == Ascending) && n < 256 && canRepartition =>
if sortFields.forall(_.sortOrder == Ascending)
&& child.typ.key != sortFields.map(_.field)
Copy link
Collaborator

@cseed cseed Jan 19, 2019

This is too strict. It should match the condition in table order by: that the sort fields are an prefix of the key.

Copy link
Collaborator

@cseed cseed Jan 19, 2019

Which maybe you should break out as a separate function on object TableOrderBy and call in both places.

@tpoterba tpoterba force-pushed the fix-show-optimization branch from 3bc87a5 to 4807127 Jan 20, 2019
@tpoterba tpoterba dismissed cseed’s stale review Jan 22, 2019

addressed

@@ -319,8 +319,7 @@ object Simplify {
TableFilter(t,
ApplySpecial("&&", Array(p1, p2)))

case TableOrderBy(child, sortFields) if sortFields.isEmpty =>
child
case TableOrderBy(TableKeyBy(child, _, _), sortFields) => TableOrderBy(child, sortFields)
Copy link
Collaborator

@patrick-schultz patrick-schultz Jan 22, 2019

I don't think this is general enough. What about adding:

case TableOrderBy(child, sortFields)
  if TableOrderBy.isAlreadyOrdered(sortFields, child.rvdType.key) =>
  TableKeyBy(child, Array(), false)
case TableKeyBy(TableKeyBy(child, sortFields, false), IndexedSeq(), _) =>
  TableOrderBy(child, sortFields)

Copy link
Collaborator Author

@tpoterba tpoterba Jan 22, 2019

The spec in the google doc wouldn't allow for either of these rewrites. Since we can rewrite a TableKeyBy(TableKeyBy(child, _), newKey) as TableKeyBy(child, newKey), the first would lead to optimization totally blowing away the order. We can't remove TableOrderBy nodes, even if a KeyBy substitution in-place may have the same semantics.

The latter is also a deoptimization - keying by an empty key doesn't guarantee a stable sort, so we don't actually have to do the inner keyBy at all.

Copy link
Collaborator

@patrick-schultz patrick-schultz Jan 23, 2019

Yup, you're right. This change makes me a little uncomfortable, because the interaction between TableOrdeBy and TableKeyBy is now more complicated. I was trying to find a normalizing set of rewrite rules to handle that interaction (the second rule was only to make it confluent). I'll approve for now but I'll keep thinking about it.

@danking danking merged commit 814d2aa into hail-is:master Jan 23, 2019
1 check passed
@tpoterba tpoterba deleted the fix-show-optimization branch Jan 23, 2019
@tpoterba tpoterba restored the fix-show-optimization branch Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants