Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanicException: Cannot convert object to arrow when using struct with numpy arrays in an expression #5905

Closed
2 tasks done
Tastaturtaste opened this issue Dec 24, 2022 · 5 comments · Fixed by #5918
Closed
2 tasks done
Labels
bug Something isn't working python Related to Python Polars

Comments

@Tastaturtaste
Copy link

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Using pl.struct in a selection context to select a column containing numpy arrays results in a

PanicException: cannot convert object to arrow

Using a pl.col instead of a pl.struct works as intended, but of course doesn't offer the features available with a pl.struct expression, such as apply in a selection context as is suggested in this stackoverflow post.
Below is a minimal reproducible example showing that pl.struct works as intended with integers and pl.col works as intended with numpy arrays but pl.struct does not work as intended with numpy arrays.

Reproducible example

import polars as pl
import numpy as np
df = pl.DataFrame({"A":[1,2], "B": [np.array([1,2,3]), np.array([4,5,6])]})
df.select([pl.struct(["A"])])
df.select([pl.col("B")])
df.select([pl.struct(["B"])])

Expected behavior

No PanicException on the last line and the ability to use the struct expression with numpy arrays together with the apply method to build a query in the selection context.

Installed versions

---Version info---
Polars: 0.15.8
Index type: UInt32
Platform: Windows-10-10.0.19044-SP0
Python: 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: 1.24.0
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.6.2
@Tastaturtaste Tastaturtaste added bug Something isn't working python Related to Python Polars labels Dec 24, 2022
@Tastaturtaste
Copy link
Author

Also, merry Christmas and thanks for this amazing piece of software!

@ritchie46
Copy link
Member

Pyarrow cannot hold arbitrary python objects, so you have to convert the numpy arrays to Series so that they are stored as arrow list type.

And merry Christmas. ^^

@Tastaturtaste
Copy link
Author

Ok, thanks for your fast answer.
I think it would be nice if this would be apparent in the documentation somewhere; right now it is not clear why storing and manipulation with pl.col works, but not with pl.struct.
Also, would it be possible to special case the basic numpy types to allow storage by pyarrow? In scientific use of python numpy is basically mandatory and ubiquitous. It would greatly facilitate interoperability I think.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Dec 28, 2022

I think auto-inference into a properly-typed Series on init should be quite straightforward here, given that we already convert 2D numpy arrays; the only thing preventing it from being converted above is that the data is given as a list of 1D numpy arrays. I'll take a look and confirm / create a PR if so 👍

@Tastaturtaste
Copy link
Author

You are awesome! Any idea how long till this lands on pypi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants