Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Pandas Tensor Data Type #59006

Open
1 of 3 tasks
bionicles opened this issue Jun 13, 2024 · 2 comments
Open
1 of 3 tasks

ENH: Pandas Tensor Data Type #59006

bionicles opened this issue Jun 13, 2024 · 2 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action

Comments

@bionicles
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Reviewing Arrow docs link from @WillAyd, spotted this

https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor

Tensor is exactly what I'm talking about in Additional Context [1] and would enable Pandas users to have a column datatype for big blocks of some underlying type

Feature Description

Support Arrow Tensor in Pandas

Python
https://arrow.apache.org/docs/python/generated/pyarrow.Tensor.html#pyarrow.Tensor

Rust
https://github.com/apache/arrow-rs/blob/3715d5447e468a5a4dc631ae9aafec706c57aa20/arrow/src/tensor.rs#L115

Alternative Solutions

just make everything an "object":

>>> import numpy as np
>>> import pandas as pd
>>> x = {'hello': 'world'}
>>> y = np.ones(3)
>>> df = pd.DataFrame({'X': [x], 'Y': [y]})
>>> df
                    X                Y
0  {'hello': 'world'}  [1.0, 1.0, 1.0]
>>> df.dtypes
X    object
Y    object
dtype: object

Additional Context

[1] #58455 (comment) onward

@bionicles bionicles added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 13, 2024
@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 13, 2024
@mroeschke
Copy link
Member

mroeschke commented Jun 13, 2024

cc @jbrockmendel if pyarrow plans to support it's compute functions for pyarrow.Tensors, this may be the appropriate 2D EA block backing for ArrowExtensionArray instead of pyarrow.Table

@WillAyd
Copy link
Member

WillAyd commented Jun 21, 2024

I think the nullability bitmap for the extension array only applies to the entire datum itself, not to individual records within each struct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants