Implement Iceberg OPTIMIZE #10497

findepi · 2022-01-07T09:03:29Z

No description provided.

alexjo2144 · 2022-01-07T15:43:40Z

Just clarifying before I start reading this. This is specifically compaction of V1 tables which cannot contain positional or equality based delete markers?

alexjo2144 · 2022-01-07T15:47:38Z

The SparkSQL procedure is called rewrite_data_files should we name this procedure to match? https://github.com/apache/iceberg/blob/master/site/docs/spark-procedures.md?plain=1#L247

findepi · 2022-01-10T08:05:12Z

This is specifically compaction of V1 tables which cannot contain positional or equality based delete markers?

Yes, but only because the reader doesn't support positional or equality based delete markers today.

Once reader has support for them, this should work with v2 tables.

The SparkSQL procedure is called rewrite_data_files should we name this procedure to match?

Thanks for the pointer. "rewrite files" feels low-level description of what the operation does (today), and "optimize" describes (or hints at) the intent.

Integration tests rarely interact with Hadoop FS directly, so `org.apache.hadoop.fs.Path` is uncommon. This allows to import `java.nio.file.Path`.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

losipiuk · 2022-01-11T12:29:26Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+            newFiles.add(builder.build());
+        }
+
+        if (scannedFiles.isEmpty() && newFiles.isEmpty()) {


assert we should not ever get one empty and other not? Feels like a bug situation.

Scanned file list may be non empty, but resulting data may be empty, if input files were empty.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/procedure/IcebergOptimizeHandle.java

findepi · 2022-01-13T08:09:16Z

CI #10583

findepi requested review from homar, losipiuk, phd3 and alexjo2144 January 7, 2022 09:03

cla-bot bot added the cla-signed label Jan 7, 2022

findepi force-pushed the findepi/iceberg-optimize branch from fdd6530 to c817478 Compare January 7, 2022 13:13

findepi added 3 commits January 10, 2022 16:52

Remove unused method

f607480

Qualify Hadoop Path in integration test

19aa89f

Integration tests rarely interact with Hadoop FS directly, so `org.apache.hadoop.fs.Path` is uncommon. This allows to import `java.nio.file.Path`.

Fix indentation

91522fa

findepi force-pushed the findepi/iceberg-optimize branch from c817478 to 55599a0 Compare January 10, 2022 15:54

losipiuk reviewed Jan 11, 2022

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Show resolved Hide resolved

losipiuk reviewed Jan 11, 2022

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/procedure/IcebergOptimizeHandle.java Outdated Show resolved Hide resolved

losipiuk approved these changes Jan 11, 2022

View reviewed changes

Implement Iceberg OPTIMIZE

410e4fb

findepi force-pushed the findepi/iceberg-optimize branch from 55599a0 to 410e4fb Compare January 12, 2022 16:43

findepi merged commit f0c67f0 into trinodb:master Jan 13, 2022

findepi deleted the findepi/iceberg-optimize branch January 13, 2022 08:09

findepi mentioned this pull request Jan 13, 2022

Release notes for 369 #10552

Closed

github-actions bot added this to the 369 milestone Jan 13, 2022

mosabua mentioned this pull request Jan 13, 2022

Add Trino 369 release notes #10553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Iceberg OPTIMIZE #10497

Implement Iceberg OPTIMIZE #10497

findepi commented Jan 7, 2022

alexjo2144 commented Jan 7, 2022

alexjo2144 commented Jan 7, 2022

findepi commented Jan 10, 2022

losipiuk Jan 11, 2022

findepi Jan 12, 2022

findepi commented Jan 13, 2022

Implement Iceberg OPTIMIZE #10497

Implement Iceberg OPTIMIZE #10497

Conversation

findepi commented Jan 7, 2022

alexjo2144 commented Jan 7, 2022

alexjo2144 commented Jan 7, 2022

findepi commented Jan 10, 2022

losipiuk Jan 11, 2022

Choose a reason for hiding this comment

findepi Jan 12, 2022

Choose a reason for hiding this comment

findepi commented Jan 13, 2022