Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Support for reading delta lake tables #5761

Merged
merged 33 commits into from
Dec 11, 2022

Conversation

chitralverma
Copy link
Contributor

@chitralverma chitralverma commented Dec 9, 2022

Allows users to read and scan delta lake tables using the eager and lazy API of polars.

Links to issue #2858

Goals

  • Support different storage backends like local FS, S3 etc.
  • Support of all read options that are allowed by deltalake

Examples

import polars as pl
table_path = "/path/to/delta-table/"

# `scan_delta` example
ldf = pl.scan_delta(table_path).collect()  
print(ldf)

# `read_delta` example
df = pl.read_delta(table_path, version=1)  
print(df)

Notes

  • Relies on deltalake>0.6.0
  • Relies on pyarrow

Checklist

  • implementation for read_delta
  • implementation for scan_delta
  • docs
  • unit tests
  • examples

Signed-off-by: chitralverma chitralverma@gmail.com

Signed-off-by: chitralverma <chitralverma@gmail.com>
chitralverma and others added 28 commits December 10, 2022 19:31
Signed-off-by: chitralverma <chitralverma@gmail.com>
Co-authored-by: Liam Brannigan <l.brannigan@analyticsengines.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
Signed-off-by: chitralverma <chitralverma@gmail.com>
@chitralverma chitralverma changed the title Support for reading delta lake tables feat(python): Support for reading delta lake tables Dec 10, 2022
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Dec 10, 2022
@chitralverma chitralverma marked this pull request as ready for review December 10, 2022 18:06
Signed-off-by: chitralverma <chitralverma@gmail.com>
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look great @chitralverma. I only have a remark about the imports other than that its great thanks a lot!

py-polars/polars/io.py Outdated Show resolved Hide resolved
py-polars/polars/io.py Show resolved Hide resolved
Signed-off-by: chitralverma <chitralverma@gmail.com>
@ritchie46 ritchie46 merged commit 8d12127 into pola-rs:master Dec 11, 2022
@chitralverma chitralverma deleted the read-scan-deltalake branch December 11, 2022 19:09
@chitralverma
Copy link
Contributor Author

Thanks a lot @ritchie46 for accepting my first contribution to the project. 🎊

@ritchie46
Copy link
Member

Thanks a lot @ritchie46 for accepting my first contribution to the project. 🎊

And thanks for the PR!

@stinodego stinodego mentioned this pull request Dec 12, 2022
2 tasks
zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
Signed-off-by: chitralverma <chitralverma@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants