Skip to content
This repository has been archived by the owner on Nov 19, 2023. It is now read-only.

Alternative Backends | Support open table projects like Apache Iceberg #25

Open
ramkumarkb opened this issue Dec 31, 2022 · 6 comments
Open

Comments

@ramkumarkb
Copy link

Hi,

First of all, thank you for the great project !

I was wondering if under "Alternative Backends" - can integrations with Open Table format like Apache Iceberg can be considered / added to the roadmap?

@aljazerzen
Copy link
Member

I've checked-out the project but I don't understand how would we integrate with it.

As I understand, the project defines a data format, not a query language.

@ramkumarkb
Copy link
Author

@aljazerzen - Yes, indeed Iceberg defines the table data format and for now recommends using SQL engines like Apache Spark or Apache Flink to read / write data - as described in their Engine Support document

So one thought here woud be when PRQL integrates with Dataframes (as mentioned in the PRQL roadmap), then one of the (potential) candidates would be Spark Dataframe.

@aljazerzen
Copy link
Member

I see.

So in terms of code, Iceberg has connectors that allow different engines to access tables from other engines/storage locations.

It doesn't specify anything about the query language and leaves that to your query engine. So if you are using anything that takes SQL, you should be able to PRQL with Iceberg :D

But I hear you, a guide on how to do this would be nice to have :D

Do you have a specific query engine in mind or are you asking just in general?

@max-sixty
Copy link
Member

My sense is that we've tightened our focus in this repo since that version of the Roadmap, and so will focus on the language here (updated roadmap is PRQL/prql#1374), and leave the execution to other tools, possibly https://github.com/PRQL/prql-query

@snth shall we transfer this issue there?

@snth
Copy link
Member

snth commented Jan 2, 2023

Sure, I am happy with that.

I am quite interested in Apache Iceberg myself and motivated to support it in pq.

@max-sixty max-sixty transferred this issue from PRQL/prql Jan 2, 2023
@snth
Copy link
Member

snth commented Jan 23, 2023

Hi @ramkumarkb ,

The next integration for prql-query will probably Polars. I would like to support Apache Iceberg and initially that will probably come through the DuckDB backend (who I believe are working on this) and possibly also Data Fusion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants