Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Catalog] Move GC functionality into Nessie Catalog #8733

Open
snazy opened this issue Jun 5, 2024 · 2 comments
Open

[Catalog] Move GC functionality into Nessie Catalog #8733

snazy opened this issue Jun 5, 2024 · 2 comments
Labels
catalog Nessie Catalog / Iceberg REST GC
Milestone

Comments

@snazy
Copy link
Member

snazy commented Jun 5, 2024

Having to configure all the Iceberg and potentially Hadoop configuration options for Nessie GC is not particularly convenient. Nessie Catalog has all the object storage configurations and has access to the credentials.

Nessie GC is not extremely memory hungry, it is rather "just" a time consuming process that requires a lot of object storage I/O.

Moving Nessie GC into Nessie Catalog feels like a natural follow-up, which eliminates a lot of configuration headaches.

It needs to be explored whether change is a feasible option in multi-tenant scenarios.

@snazy snazy added GC catalog Nessie Catalog / Iceberg REST labels Jun 5, 2024
@snazy snazy added this to the 1.0.0 milestone Jun 5, 2024
@adutra
Copy link
Contributor

adutra commented Jul 15, 2024

For the record, I've been playing with a different approach using the Kubernetes Operator for Nesse: a new CRD called NessieGc that is reconciled into a CronJob (if recurring) or a Job (if one-shot).

Creating a NessieGc CRD manually creates a standalone GC job, either recurring or one-shot.

But more importantly, the main Nessie CRD has two new fields: gc.enabled and gc.schedule. If enabled, GC is then automatically started following the cron schedule, using the properties already defined in the Nessie CRD to configure the GC invocation. In this scenario, a NessieGc CRD is generated by the reconciler, and is a dependent resource whose lifecycle is tied to the parent Nessie CRD lifecycle.

@nqvuong1998
Copy link

Hi @snazy ,
After moving GC into the Nessie Catalog, we should support SQL syntax for GC. For example: VACUUM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Nessie Catalog / Iceberg REST GC
Projects
None yet
Development

No branches or pull requests

3 participants