Skip to content

Commit

Permalink
Auto Compaction and OPTIMIZE command
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Mar 2, 2024
1 parent afc268d commit e778f4d
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 4 deletions.
15 changes: 13 additions & 2 deletions docs/auto-compaction/AutoCompactBase.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,20 @@ compact(
maxDeletedRowsRatio: Option[Double] = None): Seq[OptimizeMetrics]
```

`compact` takes the value of the following configuration properties:
`compact` [starts a transaction](../DeltaLog.md#startTransaction) on the [delta table](../DeltaLog.md) and performs [optimization](../commands/optimize/OptimizeExecutor.md#optimize).

---

`compact` requests the given [DeltaLog](../DeltaLog.md) to [start a transaction](../DeltaLog.md#startTransaction).

`compact` creates a [DeltaOptimizeContext](../commands/optimize/DeltaOptimizeContext.md) with the value of the following configuration properties:

* [spark.databricks.delta.autoCompact.maxFileSize](../configuration-properties/index.md#spark.databricks.delta.autoCompact.maxFileSize)
* [spark.databricks.delta.autoCompact.minFileSize](../configuration-properties/index.md#spark.databricks.delta.autoCompact.minFileSize)

`compact`...FIXME
`compact` requests a new [OptimizeExecutor](../commands/optimize/OptimizeExecutor.md) (with no [zOrderByColumns](../commands/optimize/OptimizeExecutor.md#zOrderByColumns) and the [isAutoCompact](../commands/optimize/OptimizeExecutor.md#isAutoCompact) flag enabled) to [optimize](../commands/optimize/OptimizeExecutor.md#optimize).

!!! note
The delta table to run [optimize](../commands/optimize/OptimizeExecutor.md#optimize) on is passed indirectly, as the [DeltaLog](../OptimisticTransaction.md#deltaLog) via the [OptimisticTransaction](../OptimisticTransaction.md).

In the end, `compact` returns the [OptimizeMetrics](../commands/optimize/OptimizeStats.md#toOptimizeMetrics) (from the [optimize](../commands/optimize/OptimizeExecutor.md#optimize) stats).
11 changes: 9 additions & 2 deletions docs/auto-compaction/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
# Auto Compaction

**Auto Compaction** feature in Delta Lake is responsible for [compacting files](AutoCompactBase.md#compact) upon a [successful write](../OptimisticTransactionImpl.md#registerPostCommitHook) into a delta table.
**Auto Compaction** feature allows performing [OPTIMIZE](../commands/index.md) command at the end of a transaction (and [compacting files](AutoCompactBase.md#compact) upon a [successful write](../OptimisticTransactionImpl.md#registerPostCommitHook) into a delta table).

Auto Compaction can be enabled system-wide or per table using [spark.databricks.delta.autoCompact.enabled](../configuration-properties/index.md#spark.databricks.delta.autoCompact.enabled) configuration property or [delta.autoOptimize.autoCompact](../table-properties/DeltaConfigs.md#autoOptimize.autoCompact) table property, respectively.

??? warning "delta.autoOptimize Table Property Deprecated"
[delta.autoOptimize](../table-properties/DeltaConfigs.md#delta.autoOptimize) table property is deprecated.

Auto Compaction uses [AutoCompact](AutoCompact.md) post-commit hook to be [executed](AutoCompactBase.md#run) at a [successful transaction commit](../OptimisticTransactionImpl.md#registerPostCommitHook) if there are files written to a delta table that can leverage compaction after a commit.
Auto Compaction uses [AutoCompact](AutoCompact.md) post-commit hook to be [executed](AutoCompactBase.md#run) at a [successful transaction commit](../OptimisticTransactionImpl.md#registerPostCommitHook) if there are files written to a delta table and it even makes sense to run such a heavy file rewritting job.

Eventually, Auto Compaction uses [OptimizeExecutor](../commands/optimize/OptimizeExecutor.md) (with no [zOrderByColumns](../commands/optimize/OptimizeExecutor.md#zOrderByColumns) and the [isAutoCompact](../commands/optimize/OptimizeExecutor.md#isAutoCompact) flag enabled) to run [optimization](../commands/optimize/OptimizeExecutor.md#optimize).

Auto Compaction uses the following configuration properties:

* [spark.databricks.delta.autoCompact.maxFileSize](../configuration-properties/index.md#spark.databricks.delta.autoCompact.maxFileSize)
* [spark.databricks.delta.autoCompact.minFileSize](../configuration-properties/index.md#spark.databricks.delta.autoCompact.minFileSize)

0 comments on commit e778f4d

Please sign in to comment.