Skip to content

Commit

Permalink
WIP: enable liquid clustering for delta lakes
Browse files Browse the repository at this point in the history
  • Loading branch information
mikix committed May 30, 2024
1 parent fa64d81 commit b55a263
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
1 change: 1 addition & 0 deletions cumulus_etl/formats/deltalake.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def update_delta_table(self, updates: pyspark.sql.DataFrame, groups: set[str]) -
table = (
delta.DeltaTable.createIfNotExists(self.spark)
.addColumns(updates.schema)
.clusterBy(*self.uniqueness_fields)
.location(self._table_path(self.dbname))
.execute()
)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ requires-python = ">= 3.10"
dependencies = [
"ctakesclient >= 5.1, < 6",
"cumulus-fhir-support >= 1, < 2",
"delta-spark >= 3, < 4",
"delta-spark >= 3.2, < 4",
"httpx < 1",
"inscriptis < 3",
"jwcrypto < 2",
Expand Down

0 comments on commit b55a263

Please sign in to comment.