We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug cudf parquet reader is not able to properly parse through the multiIndex when pyarrow parquet reader is used as the engine.
cudf
pyarrow
Steps/Code to reproduce bug
In [1]: import pandas as pd In [2]: import cudf In [3]: expected = pd.DataFrame( ...: {"A": [1, 2, 3]}, ...: index=pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1)]), ...: ) In [4]: expected Out[4]: A a 1 1 2 2 b 1 3 In [5]: expected.to_parquet("a.parquet", engine="pyarrow") In [8]: pd.read_parquet("a.parquet", engine="pyarrow") Out[8]: A a 1 1 2 2 b 1 3 In [9]: cudf.read_parquet("a.parquet", engine="pyarrow") /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/io/parquet.py:544: UserWarning: Using CPU via PyArrow to read Parquet dataset. This option is both inefficient and unstable! warnings.warn( Out[9]: A __index_level_1__ __index_level_0__ a 1 1 a 2 2 b 3 1
Expected behavior Should match pandas
Environment overview (please complete the following information)
The text was updated successfully, but these errors were encountered:
galipremsagar
No branches or pull requests
Describe the bug
cudf
parquet reader is not able to properly parse through the multiIndex whenpyarrow
parquet reader is used as the engine.Steps/Code to reproduce bug
Expected behavior
Should match pandas
Environment overview (please complete the following information)
The text was updated successfully, but these errors were encountered: