Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WAP workflows in Iceberg connector #22025

Open
5 tasks
tdcmeehan opened this issue Feb 27, 2024 · 0 comments
Open
5 tasks

Support WAP workflows in Iceberg connector #22025

tdcmeehan opened this issue Feb 27, 2024 · 0 comments
Labels
feature request iceberg Apache Iceberg related

Comments

@tdcmeehan
Copy link
Contributor

tdcmeehan commented Feb 27, 2024

WAP is short for Write, Audit, Publish. Iceberg provides the ability to keep track of snapshots through partition references. Partition references may be either branches or tags. Tags are immutable identifiers for a particular snapshot. Branches are mutable, and updates to branches create new snapshots based on the reference snapshot. Through branches and tags, you can perform the expected outcomes of WAP--write to a branch, audit the data, and publish it by fast forwarding the written data back onto the table.

For more information, see the Iceberg documentation.

Expected Behavior or Use Case

Integrating the ability to maintain, read and write to branches, and read from tags, will open up new use cases for users of Presto on Iceberg.

  • For auditing purposes (e.g. GDPR), users can begin to retain specific tags.
  • Tags provide a convenient way to roll back data to known checkpoints in time.
  • Users can make updates to branches and test them out before rebasing onto the parent snapshot. In some cases, this could be considered to be similar to a long-open transaction in a traditional Warehouse appliance.
  • For testing pipelines or queries, one can specify a branch to make changes without affecting the production data in the table. This is more efficient than specifying a copy of the data for testing.

Presto Component, Service, or Connector

  • Iceberg connector
  • Some minor changes in parser to allow SYSTEM_VERSION to be strings
  • New syntax to support the creation and deletion of branches, and respective integration into connector metadata

Possible Implementation

List of issues to support full WAP:

Example Screenshots (if appropriate):

Context

This functionality is provided by the Iceberg spec, yet is not included in the Iceberg connector. Implementing this functionality will allow for some of the functionality found in Warehousing tools that traditionally used long-open transactions, e.g. BEGIN TRANSACTION... TEST ... ROLLBACK. While this doesn't provide the exact same experience, it does accomplish many of the same goals. Additionally, it is important for Presto's Iceberg implementation to be complete, so that it can take on workload from other engines. Finally, some users of the Iceberg connector may require some of this functionality as part of regulatory requirements for audited history of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request iceberg Apache Iceberg related
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

1 participant