Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match polars.from_arrow() schema parameter description to the actual behavior #14848

Open
ilyalukibanov opened this issue Mar 5, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@ilyalukibanov
Copy link

Description

I've encountered an issue where to_arrow() throws a ComputeError when I provide a dictionary to the schema parameter in a different order than the initial dataframe. I would expect that I can provide column names and types in any order in the dictionary because the dictionaries are not ordered by default (even though python dictionaries are). Here is an MRE:

import polars as pl
import numpy as np

N = 1_000

def create_data():
    data = pl.DataFrame({"id":pl.Series(np.random.randint(N, size=(N,)))})
    data = data.with_columns(
        ("1:" + pl.col('id').cast(pl.Utf8)).alias('idl'),
        (N*10 + pl.col('id')*1.0).alias('idl_num')
    )
    return data
d = create_data()
pl.from_arrow(d.to_arrow(), schema={'id':pl.String, 'idl_num':pl.Int64, 'idl':pl.String})

I found a similar issue here: #11718 and the resolution is that the dictionary overwrites the data schema. However, this behavior is unexpected for most of the users who look into the documentation. Can we add a part that will specify this behavior? For example, change "If you supply a list of column names that does not match the names in the underlying data" to "If you supply a list (or a dictionary) of column names that does not match the names and the order of the columns in the underlying data"?

Link

https://docs.pola.rs/py-polars/html/reference/api/polars.from_arrow.html

@ilyalukibanov ilyalukibanov added the documentation Improvements or additions to documentation label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant