Skip to content

Allow conversion to/from pandas without requiring PyArrow #15845

Closed
@adrinjalali

Description

@adrinjalali

Encountered this while reviewing this PR on the scikit-learn side, xref: scikit-learn/scikit-learn#28804 (comment)

Basically, if the environment doesn't have pyarrow, conversion from pandas seems to require pyarrow eventhough the pandas.DataFrame isn't using pyarrow.

Minimal reproducible:

python -m venv /tmp/.venv
source /tmp/.venv/bin/activate
pip install pandas polars

python
>>> import pandas as pd
>>> import polars as pl
>>> pl.DataFrame(pd.DataFrame(['a', 'b']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 406, in __init__
    self._df = pandas_to_pydf(
               ^^^^^^^^^^^^^^^
  File "/tmp/.venv/lib/python3.11/site-packages/polars/_utils/construction/dataframe.py", line 1032, in pandas_to_pydf
    arrow_dict[str(col)] = plc.pandas_series_to_arrow(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/.venv/lib/python3.11/site-packages/polars/_utils/construction/other.py", line 92, in pandas_series_to_arrow
    return pa.array(values, pa.large_utf8(), from_pandas=nan_to_null)
           ^^^^^^^^
  File "/tmp/.venv/lib/python3.11/site-packages/polars/dependencies.py", line 97, in __getattr__
    raise ModuleNotFoundError(msg) from None
ModuleNotFoundError: pa.array requires 'pyarrow' module to be installed
>>> pd.DataFrame(pl.DataFrame(['a', 'b']))
   0
0  a
1  b

Note that in the above example the other way around (conversion from polars to pandas) works fine.

The PR on the scikit-learn side, introduced this line:

co2_data = pl.DataFrame({col: co2.frame[col].to_numpy() for col in co2.frame.columns})

which seems very odd, having to move to numpy and then to polars. Also, if the above line is correct, polars could be doing almost the same internally and not require pyarrow for the conversion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-interop-pandasArea: interoperability with pandasenhancementNew feature or an improvement of an existing featurepythonRelated to Python Polars

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions