Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix show problem by changing TableOrderBy to accept keyed tables. #5172

Merged
merged 3 commits into from
Jan 23, 2019

Conversation

tpoterba
Copy link
Contributor

Fix deoptimization in Simplify.

@patrick-schultz
Copy link
Collaborator

Can you explain what the problem was?

@tpoterba
Copy link
Contributor Author

It wasn't scanning the full dataset anymore, but:

table.head().flatten() was generating a TableOrderBy(TableKeyBy(TableHead)).

There was no way to remove this node, even if the table was already keyed by the sort fields, so we ended up doing an extra scan and possibly shuffle.

This change simplifies the whole thing, and emits the correct IR from the beginning

cseed
cseed previously requested changes Jan 19, 2019
@@ -429,7 +428,9 @@ object Simplify {
TableMapGlobals(TableHead(child, n), newGlobals)

case TableHead(TableOrderBy(child, sortFields), n)
if sortFields.forall(_.sortOrder == Ascending) && n < 256 && canRepartition =>
if sortFields.forall(_.sortOrder == Ascending)
&& child.typ.key != sortFields.map(_.field)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too strict. It should match the condition in table order by: that the sort fields are an prefix of the key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which maybe you should break out as a separate function on object TableOrderBy and call in both places.

@@ -319,8 +319,7 @@ object Simplify {
TableFilter(t,
ApplySpecial("&&", Array(p1, p2)))

case TableOrderBy(child, sortFields) if sortFields.isEmpty =>
child
case TableOrderBy(TableKeyBy(child, _, _), sortFields) => TableOrderBy(child, sortFields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is general enough. What about adding:

case TableOrderBy(child, sortFields)
  if TableOrderBy.isAlreadyOrdered(sortFields, child.rvdType.key) =>
  TableKeyBy(child, Array(), false)
case TableKeyBy(TableKeyBy(child, sortFields, false), IndexedSeq(), _) =>
  TableOrderBy(child, sortFields)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec in the google doc wouldn't allow for either of these rewrites. Since we can rewrite a TableKeyBy(TableKeyBy(child, _), newKey) as TableKeyBy(child, newKey), the first would lead to optimization totally blowing away the order. We can't remove TableOrderBy nodes, even if a KeyBy substitution in-place may have the same semantics.

The latter is also a deoptimization - keying by an empty key doesn't guarantee a stable sort, so we don't actually have to do the inner keyBy at all.

Copy link
Collaborator

@patrick-schultz patrick-schultz Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you're right. This change makes me a little uncomfortable, because the interaction between TableOrdeBy and TableKeyBy is now more complicated. I was trying to find a normalizing set of rewrite rules to handle that interaction (the second rule was only to make it confluent). I'll approve for now but I'll keep thinking about it.

@danking danking merged commit 814d2aa into hail-is:master Jan 23, 2019
@tpoterba tpoterba deleted the fix-show-optimization branch January 23, 2019 13:56
@tpoterba tpoterba restored the fix-show-optimization branch November 7, 2019 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants