Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception thrown if converting chunked arrow Table with struct and dictionary columns to polar Dataframe #16040

Open
2 tasks done
reductionnist opened this issue May 3, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@reductionnist
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

table = pa.table({'col1': [1, 2, 3], 'col2': [{'a': 1, 'b': 2}, None, {'a':3, 'b':4}],'col3':pa.array(['A', 'B', 'A'], pa.string()).dictionary_encode()})
table2 = pa.concat_tables([table.slice(0,1),table.slice(0,2)])
pl.from_arrow(table2)

Log output

No response

Issue description

Hi,
pl.from_arrow() will throw an exception if a chunked arrow table contains both dictionary and struct columns. This appears to be due to the logic in arrow_to_pydf which will omit the dictionary columns from the df being constructed if there are any struct columns. The following replacement code appears to fix the issue:

    if len(dictionary_cols) > 0 or len(struct_cols) > 0:
        df = wrap_df(pydf)
        df = df.with_columns([F.lit(s).alias(s.name) for s in itertools.chain(dictionary_cols.values(), struct_cols.values())])
        reset_order = True

Expected behavior

it should convert to a Dataframe instead of throwing

Installed versions

--------Version info---------
Polars:               0.20.10
Index type:           UInt32
Platform:             Linux-6.8.7-100.fc38.x86_64-x86_64-with-glibc2.37
Python:               3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               2024.2.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.27
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@reductionnist reductionnist added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant