-
Notifications
You must be signed in to change notification settings - Fork 47
Add BigLake REST Catalog doc #1443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BigLake REST Catalog doc #1443
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
|
||
| === Topic names | ||
|
|
||
| BigLake does not support Iceberg table names that contain dots (`.`). When creating Iceberg topics in Redpanda that you plan to access through BigLake, ensure that the topic names do not include dots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct them to the config property where they can configure a dot replacement? We can also note here that ~ (used in the default dlq naming pattern <topic>~dlq) is also not support so it should also be configured.
We should also note that these configs must be chosen carefully to avoid table name collisions i.e. for a topic named foo-bar and foo.bar if dot replacement is chosen to be - then a collision will happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvartolomei Added the property (we have not updated the cluster config reference yet, so I can't link reference page as of now, but can add it later). I also still kept "Ensure that topic names don't include dots" as a bullet point but I can remove it if it doesn't sound good
nvartolomei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good and mergeable
2 nits that can be fixed later
160014c to
72f7fa4
Compare
|
|
||
| For general information about Iceberg catalog integrations in Redpanda, see xref:manage:iceberg/use-iceberg-catalogs.adoc[]. | ||
|
|
||
| NOTE: BigLake support for the Iceberg REST Catalog API is currently in preview. See the https://cloud.google.com/biglake[BigLake product page] for the latest updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| NOTE: BigLake support for the Iceberg REST Catalog API is currently in preview. See the https://cloud.google.com/biglake[BigLake product page] for the latest updates. | |
| NOTE: BigLake support for the Iceberg REST Catalog API is currently in preview. See the https://cloud.google.com/biglake[BigLake product page^] for the latest updates. |
Suggest rephrasing this note, since you or someone will need to be responsible for checking when it's out of preview. Something like: Check BigLake product page for latest availability or support for or something
Co-authored-by: Michele Cyran <michele@redpanda.com>
micheleRP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mattschumpert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should just delete all the 'create service account/bucket setup redpanda' sections.
| - Use the `iceberg_topic_name_dot_replacement` cluster property to set a replacement string for dots in topic names. Ensure that the replacement value does not cause table name collisions. For example, `current.orders` and `current_orders` would both map to the same table name if you set the replacement to an underscore (`_`). | ||
| - Ensure that the new topic names do not include dots. | ||
|
|
||
| You must also set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). See <<configure-topic-for-iceberg>> for the list of cluster properties to set when enabling the BigLake REST catalog integration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @kbatuigas this property is undocumented in the cluster config reference (when I checked yesterday)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattschumpert for the new Iceberg properties, we'll get them into the reference once we run the import -- I believe we have to add a fix so it works with this beta.
|
|
||
| BigLake does not support Iceberg table names that contain dots (`.`). When creating Iceberg topics in Redpanda that you plan to access through BigLake, either: | ||
|
|
||
| - Use the `iceberg_topic_name_dot_replacement` cluster property to set a replacement string for dots in topic names. Ensure that the replacement value does not cause table name collisions. For example, `current.orders` and `current_orders` would both map to the same table name if you set the replacement to an underscore (`_`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbatuigas lets also make sue this is documented
| ---- | ||
| SELECT | ||
| * | ||
| FROM `<bucket-name>>redpanda`.transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbatuigas is this the right syntax? is it not a '.' before redpanda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattschumpert Yes. This works, but <bucket-name>.redpanda.transactions does not. I haven't found a good reference doc for this. Perhaps @nvartolomei knows.
| rpk topic alter-config transactions --set redpanda.iceberg.mode=value_schema_latest:subject=transactions | ||
| ---- | ||
|
|
||
| == Query Iceberg tables in BigQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbatuigas is there absolutely no configuration on the BIgLake /BQ side required? Do we not have to grant BigLake itself access to the cluster's bucket with some kind of policy/role? This is a bit surprising. There must be some setup, no? I thought at least in the BigQuery side you must create a 'Connection' in BigQuery TO BigLake itself (previously with object storage mode it was a connection to the cloud storage location). But maybe not. If there are zero steps on the GCP side that's impressive.
Also, here the hierarchy seems to be projectName.warehouseName.namespace.table: https://docs.cloud.google.com/biglake/docs/blms-rest-catalog#query_a_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattschumpert updated to include a catalog creation step in BigLake.
@mattschumpert I updated this so that Set Up Google Cloud resources still includes the service account creation step, but added "If you don’t already have a Google Cloud service account to use." I took out the bucket creation step in placed it in the install and deploy Redpanda section that I have also marked as optional: http://localhost:5002/25.3/manage/iceberg/iceberg-topics-gcp-biglake/#optional-deploy-redpanda-quickstart-on-gcp The idea is that they can try the quickstart if they want to set this up and test quickly on a fresh deploy of Redpanda on GCP. But if they already have a running cluster they can skip the quickstart part. Some feedback that I got from CS is that the Redpanda install and deploy steps could go into a lab instead. If we're able to create a lab for this, then we can take out the optional steps and include a link to the lab. |
Co-authored-by: Michele Cyran <michele@redpanda.com>
Description
Enable BigLake REST Catalog integration and query Iceberg topics in BigQuery. This guide describes deploying Redpanda using Docker on GCP Linux VMs and integrating a "prefilled" Iceberg topic in BigLake.
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Use Iceberg Topics using GCP BigLake
Checks