Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable PushdownSubfields rule #13082

Merged
merged 1 commit into from
Jul 11, 2019

Conversation

mbasmanova
Copy link
Contributor

Turns out PushdownSubfields doesn't interoperate with CBO yet. PushdownSubfields
changes ColumnHandles, but TableScanStatsRule uses them to look up statistics in
a map keyed by unmodified ColumnHandles.

Introduced pushdown_subfields_enabled session property with default value 'false'
to control whether PushdownSubfields rule runs or not.

booleanProperty(
PUSHDOWN_SUBFIELDS_ENABLED,
"Experimental: enable subfield pruning",
false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be worth adding a configuration property as well for completeness.

@mbasmanova
Copy link
Contributor Author

@arhimondr Andrii, thank you for review. Added configuration property and updated the PR.

Turns out PushdownSubfields doesn't interoperate with CBO yet. PushdownSubfields
changes ColumnHandles, but TableScanStatsRule uses them to look up statistics in
a map keyed by unmodified ColumnHandles.

Introduced pushdown_subfields_enabled session property and
experimental.pushdown-subfields-enabled configuration property with default value
'false' to control whether PushdownSubfields rule runs or not.
Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbasmanova

@mbasmanova mbasmanova merged commit 27e21e1 into prestodb:master Jul 11, 2019
@mbasmanova mbasmanova deleted the disable-pushdown-subfields branch July 11, 2019 20:04
@tooptoop4
Copy link

@mbasmanova do u have example sql that would face issue in presto 0.220 before this PR fix?

@mbasmanova
Copy link
Contributor Author

@tooptoop4 I don't have an example, but I can explain when would this happen.

  • CBO is enabled.
  • The query has a join with right side exceeding global memory limit per query, but left side not E.g. unmodified query hits memory limit, but a query with join sides swapped doesn't.
  • The source tables use complex types like lists, maps or structs and have values significantly larger than 50 bytes per value per row. These tables have accurate stats in the Metastore.
  • Complex types are passing through the join.
  • Only some of the subfields of the complex types are used in the query.

In this case, PushdownSubfields rule collects a list of subfields used in a query and generates new ColumnHandles by calling ColumnHandle#withRequiredSubfields. TableScanStatsRule then calls metadata.getTableStatistics and receives a map from ColumnHandle to stats, where keys are the original ColumnHandles, not modified by calling ColumnHandle#withRequiredSubfields. TableScanStatsRule looks up stats using modified ColumnHandles, come up empty and uses the default of 50 bytes per complex type value per row estimate. If this estimation is significantly off, e.g. complex type is used in the right side of the join and with 50 bytes per value that side appears to be smaller than probe side, then CBO will not swap join sides, and the query would run as is and hit the memory limit.

A proper fix, would be to make stats estimation smarter by taking requiredSubfields into account. A short term fix might be to call withRequiredSubfields(ImmutableList.of()) before looking up the stats. CC: @arhimondr @oerling

        for (Map.Entry<VariableReferenceExpression, ColumnHandle> entry : node.getAssignments().entrySet()) {
            Optional<ColumnStatistics> columnStatistics = Optional.ofNullable(tableStatistics.getColumnStatistics().get(entry.getValue().withRequiredSubfields(ImmutableList.of())));
            outputVariableStats.put(entry.getKey(), columnStatistics.map(statistics -> toSymbolStatistics(tableStatistics, statistics)).orElse(VariableStatsEstimate.unknown()));
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants