Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Write dataframes as delta tables #7616

Merged
merged 19 commits into from
May 17, 2023

Conversation

chitralverma
Copy link
Contributor

@chitralverma chitralverma commented Mar 17, 2023

Allows users to write a Polars DataFrame as a delta lake table.

[Currently, this is a work in progress as its blocked by https://github.com/delta-io/delta-rs/pull/1044]

Checklist

  • resolve like in scans if URI is provided
  • delta-polars type validations
  • Update min supported delta version after deltalake supports large types.
  • verify behaviour on cloud stores
  • test cases
  • examples
  • build isolated whl for final tests
  • update with the latest master
  • Verify for non-large types

Closes #2858 and #7574

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Mar 17, 2023
@stinodego
Copy link
Member

Hey @chitralverma , just a quick check: I see you have a PR out at delta-rs for supporting LargeBinary/LargeString. Have you also thought about supporting LargeList? This is another 'blocker' I had identified.

@chitralverma
Copy link
Contributor Author

let me add that to the existing PR as well

@stinodego
Copy link
Member

Maybe casting unsigned -> signed integers should also be added on the delta-rs side, then?

@chitralverma
Copy link
Contributor Author

chitralverma commented Mar 26, 2023

Maybe casting unsigned -> signed integers should also be added on the delta-rs side, then?

This I think we will have to do on our side I think. but I can check with the delta team

@stinodego
Copy link
Member

stinodego commented Mar 26, 2023

Maybe casting unsigned -> signed integers should also be added on the delta-rs side, then?

This I think we will have to do on our side I think. but I can check with the delta team

We can definitely do that on the Polars side, but if you're mapping Arrow types to Delta supported types in delta-rs anyway, it would make sense to include casting unsigned integers as well.

I'd suggest trying to cast to a signed integer of the same bit size, and raising an error when this is not possible.

@chitralverma
Copy link
Contributor Author

chitralverma commented Mar 27, 2023

Maybe casting unsigned -> signed integers should also be added on the delta-rs side, then?

This I think we will have to do on our side I think. but I can check with the delta team

We can definitely do that on the Polars side, but if you're mapping Arrow types to Delta supported types in delta-rs anyway, it would make sense to include casting unsigned integers as well.

I'd suggest trying to cast to a signed integer of the same bit size, and raising an error when this is not possible.

Done! added uint* and float16.

Now the following remain,

    DataType::Null => {}
    DataType::Time32(_) => {}
    DataType::Time64(_) => {}
    DataType::Duration(_) => {}
    DataType::Interval(_) => {}
    DataType::Union(_, _, _) => {}
    DataType::Dictionary(_, _) => {} # polars categorical gets converted to this
    DataType::RunEndEncoded(_, _) => {}

@chitralverma
Copy link
Contributor Author

long time!

So we have a new rust release for delta-rs. A python release should be on the way soon, so we can actually have this feature in polars as well.

@chitralverma
Copy link
Contributor Author

Finally, the delta-rs release is in place and I can start working on this again

@stinodego stinodego changed the title feat(python) Write dataframes as delta tables feat(python): Write dataframes as delta tables May 7, 2023
Signed-off-by: Chitral Verma <chitralverma@gmail.com>
@chitralverma chitralverma marked this pull request as ready for review May 16, 2023 12:03
@chitralverma
Copy link
Contributor Author

@stinodego @ritchie46 finally marked this as ready for review!
please check it out at your convenience.

@stinodego
Copy link
Member

Amazing! Reviewing this later today.

@stinodego stinodego self-requested a review May 16, 2023 12:58
@ritchie46
Copy link
Member

I read the code. Not that familiar with delta-tables' API, so I will leave this one to @stinodego.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few remarks.

py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
@chitralverma
Copy link
Contributor Author

Looks great! Just a few remarks.

@stinodego thanks for the review. made the requested changes.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Looks like all is in order.

@stinodego stinodego added the highlight Highlight this PR in the changelog label May 17, 2023
@stinodego stinodego merged commit 5fc3bc8 into pola-rs:main May 17, 2023
9 checks passed
@chitralverma chitralverma deleted the write-delta branch May 18, 2023 12:07
ritchie46 pushed a commit that referenced this pull request May 20, 2023
Signed-off-by: Chitral Verma <chitralverma@gmail.com>
Co-authored-by: Stijn de Gooijer <stijn@degooijer.io>
alexander-beedie pushed a commit to alexander-beedie/polars that referenced this pull request May 20, 2023
Signed-off-by: Chitral Verma <chitralverma@gmail.com>
Co-authored-by: Stijn de Gooijer <stijn@degooijer.io>
@stinodego stinodego mentioned this pull request May 26, 2023
3 tasks
c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
Signed-off-by: Chitral Verma <chitralverma@gmail.com>
Co-authored-by: Stijn de Gooijer <stijn@degooijer.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature highlight Highlight this PR in the changelog python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Are there plans to support delta reader/writer?
3 participants