Skip to content

Commit

Permalink
Dynamic Partition Overwrite
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Feb 11, 2024
1 parent 155bf40 commit ba48bbf
Show file tree
Hide file tree
Showing 12 changed files with 67 additions and 18 deletions.
2 changes: 1 addition & 1 deletion docs/commands/DeltaCommand.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ createSetTransaction(

* `DeleteCommand` is requested to [performDelete](delete/DeleteCommand.md#performDelete)
* `MergeIntoCommand` is requested to [commitAndRecordStats](merge/MergeIntoCommand.md#commitAndRecordStats)
* `UpdateCommand` is requested to [performUpdate](update/UpdateCommand.md#performUpdate)
* [Update](update/index.md) command is executed
* `WriteIntoDelta` is requested to [write data out](WriteIntoDelta.md#write)

## Logging
Expand Down
2 changes: 1 addition & 1 deletion docs/commands/WriteIntoDelta.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

`WriteIntoDelta` is created when:

* [DeltaDynamicPartitionOverwriteCommand](DeltaDynamicPartitionOverwriteCommand.md) is executed
* [DeltaDynamicPartitionOverwriteCommand](../dynamic-partition-overwrite/DeltaDynamicPartitionOverwriteCommand.md) is executed
* `DeltaLog` is requested to [create an insertable HadoopFsRelation](../DeltaLog.md#createRelation) (when `DeltaDataSource` is requested to create a relation as a [CreatableRelationProvider](../spark-connector/DeltaDataSource.md#CreatableRelationProvider) or a [RelationProvider](../spark-connector/DeltaDataSource.md#RelationProvider))
* `DeltaCatalog` is requested to [create a delta table](../DeltaCatalog.md#createDeltaTable)
* `WriteIntoDeltaBuilder` is requested to [build a V1Write](../WriteIntoDeltaBuilder.md#build)
Expand Down
4 changes: 2 additions & 2 deletions docs/commands/delete/DeleteCommand.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@
sparkSession: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run` requests the [TahoeFileIndex](#tahoeFileIndex) for the [DeltaLog](../../TahoeFileIndex.md#deltaLog) (and [asserts that the table is removable](../../DeltaLog.md#assertRemovable)).

`run` requests the `DeltaLog` to [start a new transaction](../../DeltaLog.md#withNewTransaction) for [performDelete](#performDelete).

In the end, `run` re-caches all cached plans (incl. this relation itself) by requesting the `CacheManager` ([Spark SQL]({{ book.spark_sql }}/CacheManager)) to recache the [target](#target).

## <span id="performDelete"> performDelete
## performDelete { #performDelete }

```scala
performDelete(
Expand Down
2 changes: 1 addition & 1 deletion docs/commands/merge/MergeIntoCommandBase.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ In the end, `buildTargetPlanWithIndex` creates a `Project` logical operator with
spark: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run` is a transactional operation that is made up of the following steps:

Expand Down
20 changes: 11 additions & 9 deletions docs/commands/update/UpdateCommand.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,20 @@ Name | web UI
`scanTimeMs` | time taken to scan the files for matches
`rewriteTimeMs` | time taken to rewrite the matched files

## <span id="run"> Executing Command
## <span id="performUpdate"><span id="rewriteFiles"><span id="buildUpdatedColumns"> Executing Command { #run }

```scala
run(
sparkSession: SparkSession): Seq[Row]
```
??? note "RunnableCommand"

```scala
run(
sparkSession: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run`...FIXME

### <span id="performUpdate"> performUpdate
### performUpdate

```scala
performUpdate(
Expand All @@ -50,7 +52,7 @@ performUpdate(

`performUpdate`...FIXME

### <span id="rewriteFiles"> rewriteFiles
### rewriteFiles

```scala
rewriteFiles(
Expand All @@ -64,7 +66,7 @@ rewriteFiles(

`rewriteFiles`...FIXME

### <span id="buildUpdatedColumns"> buildUpdatedColumns
### buildUpdatedColumns

```scala
buildUpdatedColumns(
Expand Down
13 changes: 13 additions & 0 deletions docs/configuration-properties/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,19 @@ Default: `3`

Default: `.s3-optimization-`

### <span id="dynamicPartitionOverwrite.enabled"><span id="DYNAMIC_PARTITION_OVERWRITE_ENABLED"> dynamicPartitionOverwrite.enabled { #spark.databricks.delta.dynamicPartitionOverwrite.enabled }

**spark.databricks.delta.dynamicPartitionOverwrite.enabled**

**(internal)** Enables [Dynamic Partition Overwrite](../dynamic-partition-overwrite/index.md) (whether to overwrite partitions dynamically when `partitionOverwriteMode` is `dynamic` in either the SQL configuration or a `DataFrameWriter` option).
When disabled, `partitionOverwriteMode` will be ignored.

Default: `true`

Used when:

* `DeltaWriteOptionsImpl` is requested to [isDynamicPartitionOverwriteMode](../spark-connector/DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode)

### <span id="spark.databricks.delta.history.maxKeysPerList"><span id="DELTA_HISTORY_PAR_SEARCH_THRESHOLD"> history.maxKeysPerList { #history.maxKeysPerList }

**spark.databricks.delta.history.maxKeysPerList**
Expand Down
4 changes: 4 additions & 0 deletions docs/dynamic-partition-overwrite/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: Dynamic Partition Overwrite
nav:
- index.md
- ...
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DeltaDynamicPartitionOverwriteCommand

`DeltaDynamicPartitionOverwriteCommand` is a `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) and a `V2WriteCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/V2WriteCommand/)) for dynamic partition overwrite using [WriteIntoDelta](WriteIntoDelta.md).
`DeltaDynamicPartitionOverwriteCommand` is a `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) and a `V2WriteCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/V2WriteCommand/)) for dynamic partition overwrite using [WriteIntoDelta](../commands/WriteIntoDelta.md).

`DeltaDynamicPartitionOverwriteCommand` sets [partitionOverwriteMode](../spark-connector/options.md#partitionOverwriteMode) option as `DYNAMIC` before [write](#run).

Expand Down
23 changes: 23 additions & 0 deletions docs/dynamic-partition-overwrite/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
hide:
- toc
---

# Dynamic Partition Overwrite

**Dynamic Partition Overwrite** is a `DataFrameWriter` ([Spark SQL]({{ book.spark_sql }}/DataFrameWriter)) feature that allows to only overwrite partitions (of a partitioned delta table) that have data written into it.

Dynamic Partition Overwrite can be enabled system-wide using [spark.databricks.delta.dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) configuration property.

??? note "DeltaIllegalArgumentException"
It is a [DeltaIllegalArgumentException](../spark-connector/DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode) for [spark.databricks.delta.dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) configuration property disabled yet the Dynamic Partition Overwrite Mode is [dynamic](../spark-connector/DeltaOptions.md#DYNAMIC).

!!! note "Conflicts with `replaceWhere`"
Dynamic Partition Overwrite cannot be used with [replaceWhere](../spark-connector/options.md#replaceWhere) option as they both specify which data to overwrite.

## Partition Overwrite Mode

**Partition Overwrite Mode** can be one of the following values (case-insensitive):

* [dynamic](../spark-connector/DeltaOptions.md#DYNAMIC)
* [static](../spark-connector/DeltaOptions.md#STATIC)
9 changes: 8 additions & 1 deletion docs/spark-connector/DeltaWriteOptionsImpl.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,14 @@ rearrangeOnly: Boolean
isDynamicPartitionOverwriteMode: Boolean
```

`isDynamicPartitionOverwriteMode`...FIXME
`isDynamicPartitionOverwriteMode` determines the **partition overwrite mode** based on the value of [partitionOverwriteMode](#partitionOverwriteMode) option (in the [options](DeltaOptionParser.md#options)), if specified, or defaults to the value of `spark.sql.sources.partitionOverwriteMode` ([Spark SQL]({{ book.spark_sql }}/configuration-properties/#spark.sql.sources.partitionOverwriteMode)).

??? note "DeltaIllegalArgumentException: `DYNAMIC` mode with Dynamic Partition Overwrite disabled"
For `DYNAMIC` partition overwrite mode and [dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) disabled, `isDynamicPartitionOverwriteMode` reports a [DeltaIllegalArgumentException](../DeltaErrors.md#deltaDynamicPartitionOverwriteDisabled).

With [dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) disabled and the partition overwrite mode is anything but [DYNAMIC](../spark-connector/DeltaOptions.md#DYNAMIC), `isDynamicPartitionOverwriteMode` is off (returns `false`).

Otherwise, `isDynamicPartitionOverwriteMode` is whether the partition overwrite mode is dynamic or not (static).

---

Expand Down
2 changes: 1 addition & 1 deletion docs/spark-connector/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Mutually exclusive with [replaceWhere](#replaceWhere)

Used when:

* `DeltaDynamicPartitionOverwriteCommand` is executed
* [DeltaDynamicPartitionOverwriteCommand](../dynamic-partition-overwrite/DeltaDynamicPartitionOverwriteCommand.md) is executed (and sets `partitionOverwriteMode` to `DYNAMIC`)
* `DeltaWriteOptionsImpl` is requested to [isDynamicPartitionOverwriteMode](DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode) and for [partitionOverwriteModeInOptions](DeltaWriteOptionsImpl.md#partitionOverwriteModeInOptions)
* `WriteIntoDeltaBuilder` is requested to [overwriteDynamicPartitions](../WriteIntoDeltaBuilder.md#overwriteDynamicPartitions)

Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ nav:
- ... | data-skipping/**.md
- ... | deletion-vectors/**.md
- developer-api.md
- ... | dynamic-partition-overwrite/**.md
- ... | generated-columns/**.md
- installation.md
- ... | liquid-clustering/**.md
Expand Down Expand Up @@ -282,7 +283,6 @@ nav:
- create-table-like/index.md
- Delete:
- ... | flat | commands/delete/**.md
- DeltaDynamicPartitionOverwriteCommand: commands/DeltaDynamicPartitionOverwriteCommand.md
- DESCRIBE DETAIL:
- ... | flat | commands/describe-detail/**.md
- DESCRIBE HISTORY:
Expand Down

0 comments on commit ba48bbf

Please sign in to comment.