Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct value statistics for partitioned Iceberg tables with Hive #22168

Open
ZacBlanco opened this issue Mar 12, 2024 · 0 comments
Open

Distinct value statistics for partitioned Iceberg tables with Hive #22168

ZacBlanco opened this issue Mar 12, 2024 · 0 comments
Assignees
Labels
feature request iceberg Apache Iceberg related

Comments

@ZacBlanco
Copy link
Contributor

ZacBlanco commented Mar 12, 2024

Expected Behavior or Use Case

Currently, the Iceberg connector doesn't support returning table-level statistics for distinct values if the table is partitioned when using Hive+Iceberg. Statistics are critical for generating good query plans. The connector can do a better job of providing them.

The implementation at the moment only returns the statistics stored directly on the Iceberg table when the table is partitioned. See

if (mergeStrategy.equals(NONE) || spec.isPartitioned()) {
return icebergStatistics;
}

I think we can improve this by merging the statistics even when partitioned, if we verify that table-level statistics exist in the metastore and that no constraint is provided in the call to getTableStatistics, then we should be able to safely merge them.

Presto Component, Service, or Connector

Iceberg Connector

Possible Implementation

If a constraint is passed, then we shouldn't return table statistics. However, without any constraint, we should still be able to return the table-level statistics.

Context

Better query plans for partitioned Iceberg datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request iceberg Apache Iceberg related
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

1 participant