Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deltalake backend #3865

Open
ion-elgreco opened this issue Dec 11, 2023 · 8 comments
Open

Add deltalake backend #3865

ion-elgreco opened this issue Dec 11, 2023 · 8 comments
Labels
kind/feature New feature or request

Comments

@ion-elgreco
Copy link

Is your feature request related to a problem? Please describe.
I am considering using feast but the main back-end we use is not supported, which is deltalake. Deltalake is the only data lake implementation that has read and write support without a JVM. This makes it fairly easy to build large data lakes with only Python but there is no easy way with out a feature store wrapper to make features easily accessible..

Describe the solution you'd like
Add deltalake as an officially supported back-end.

Describe alternatives you've considered
There aren't really any.

@ion-elgreco ion-elgreco added the kind/feature New feature or request label Dec 11, 2023
@sudohainguyen
Copy link
Collaborator

as I understand you want to query a feature table as delta format, spark and trino can help.
feast does support both of them

@ion-elgreco
Copy link
Author

No I would like to do this without a JVM application. So delta-rs Python bindings (deltalake) can be used to achieve this: https://github.com/delta-io/delta-rs

@sudohainguyen
Copy link
Collaborator

cool, we need some changes to extend FileSource to read delta tables, do you mind contributing?

@ion-elgreco
Copy link
Author

Sure, if you can give me some pointers : )

@tokoko
Copy link
Collaborator

tokoko commented Dec 11, 2023

@ion-elgreco Let me try to give you a quick rundown of options how the integration might look like. First of all, The concept closest to backend in feast is an OfflineStore, but offline store implementations don't just specify the sources and how they should be read, they also implement additional logic on top of it (point-in-time join between entity dataframe and feature tables). That's why it's unlikely that we can have a deltalake offline store implementation as there's no way to specify data transformations with deltalake. The closest thing to what you're looking for is probably a polars implementation (it's using delta-rs if i'm not mistaken, right?) or something like duckdb that can be extended to use delta-rs for working with delta tables (I already have a draft PR that adds duckdb minus delta #3822).

Feast has another concept called DataSource. This is how you specify the sources that offline stores will have to read later on.
The implementation you might be interested in is FileSource as @sudohainguyen pointed out, that allows users to specify file format, but currently only parquet format is supported. So the first logical step should be to extend FileSource to allow users to specify delta as a file format. Once we have that, we can teach various offline store implementations (jvm-based or otherwise) how to read them.

@ion-elgreco
Copy link
Author

ion-elgreco commented Dec 11, 2023

@tokoko gotcha, that helps! Since I mainly use Polars I will look into adding that as an offline store and then add delta as additional filesource using deltalake as dependency.

Yup Polars uses deltalake to read and write.

@tokoko
Copy link
Collaborator

tokoko commented Dec 11, 2023

Glad to be able to help. One more pointer that may help you out, but note that this my preferred direction that I'm trying to push (but with not much luck as of yet :) ). Despite your preference for polars, you should probably still check out duckdb PR I linked above. The actual offline store implementation is written using ibis rather than duckdb directly. As ibis has a fairly good polars backend, you could easily reuse the same ibis implementation. In that case, polars implementation might be just a single line code change (probably not but something close to that).

@sudohainguyen
Copy link
Collaborator

Great explaination @tokoko !
Looking forward to seeing changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants