Skip to content

Array.from_numpy() not working with numpy array of strings or categories (possibly user issue) #256

@ATL2001

Description

@ATL2001

Hey Kyle, I was working with the apply_categorical_cmap in lonboard yesterday (which is great) and was having an issue with getting it to work using a pandas series no matter if my series was a string or a categorical datatype. If I create a pyarrow array from the series, and lonboard passes that pyarrow array to arro3 it works, but when I try to send a pandas series or numpy array of strings to arro3 it says it's an unsupported data type. I'm not sure if I'm not doing something properly or if something isn't working as expected.

any ideas?/Thanks!

import sys

from arro3 import core
from arro3.core import Array
import numpy as np
import pyarrow as pa
import pandas as pd

print("python: ", sys.version)
print("arro3:  ", core.__version__)
print("numpy:  ", np.__version__)
print("pyarrow:", pa.__version__)
print("pandas: ", pd.__version__)

## make a pandas series of strings
series = pd.Series(["a", "b","c"])

## convert pandas series to pyarrow array, then to arro3 array works
Array.from_arrow(pa.Array.from_pandas(series)) # works!

## make arro3 array from pandas series, fails
try:
    Array.from_numpy(series)
except Exception as ex:
    print(f"make arro3 array from pandas series exception: {ex}")

## convert pandas series to numpy (dtype object), make arro3 array from numpy, fails
np_array = pd.Series(["a", "b","c"]).to_numpy()
try:
    Array.from_numpy(np_array)
except Exception as ex:
    print(f"make arro3 array from numpy (dtype object) exception: {ex}")

## convert pandas series to numpy (dtype U1), make arro3 array from numpy, fails
np_array = pd.Series(["a", "b","c"]).to_numpy(dtype="U1")
try:
    Array.from_numpy(np_array)
except Exception as ex:
    print(f"make arro3 array from numpy (dtype U1) exception: {ex}")

## convert pandas series of categorical to numpy, make arro3 array from numpy, fails
np_array = pd.Series(["a", "b","c"]).astype('category').to_numpy()
try:
    Array.from_numpy(np_array)
except Exception as ex:
    print(f"make arro3 array from numpy categorical exception: {ex}")
python:  3.11.9 (main, Aug 14 2024, 04:18:20) [MSC v.1929 64 bit (AMD64)]
arro3:   0.4.2
numpy:   2.1.2
pyarrow: 18.0.0
pandas:  2.2.3
make arro3 array from pandas series exception: Unsupported data type object
make arro3 array from numpy (dtype object) exception: Unsupported data type object
make arro3 array from numpy (dtype U1) exception: Unsupported data type <U1
make arro3 array from numpy categorical exception: Unsupported data type object

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions