Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to read a dataframe with multiIndex properly #14352

Closed
galipremsagar opened this issue Nov 1, 2023 · 0 comments
Closed

[BUG] Unable to read a dataframe with multiIndex properly #14352

galipremsagar opened this issue Nov 1, 2023 · 0 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
cudf parquet reader is not able to properly parse through the multiIndex when pyarrow parquet reader is used as the engine.

Steps/Code to reproduce bug

In [1]: import pandas as pd

In [2]: import cudf

In [3]: expected = pd.DataFrame(
   ...:         {"A": [1, 2, 3]},
   ...:         index=pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1)]),
   ...:     )

In [4]: expected
Out[4]: 
     A
a 1  1
  2  2
b 1  3

In [5]: expected.to_parquet("a.parquet", engine="pyarrow")


In [8]: pd.read_parquet("a.parquet", engine="pyarrow")
Out[8]: 
     A
a 1  1
  2  2
b 1  3

In [9]: cudf.read_parquet("a.parquet", engine="pyarrow")
/nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/io/parquet.py:544: UserWarning: Using CPU via PyArrow to read Parquet dataset. This option is both inefficient and unstable!
  warnings.warn(
Out[9]: 
                   A  __index_level_1__
__index_level_0__                      
a                  1                  1
a                  2                  2
b                  3                  1

Expected behavior
Should match pandas

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Nov 1, 2023
@galipremsagar galipremsagar self-assigned this Nov 1, 2023
@github-project-automation github-project-automation bot moved this to In Progress in cuDF/Dask/Numba/UCX Nov 1, 2023
@rapids-bot rapids-bot bot closed this as completed Nov 1, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF/Dask/Numba/UCX Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

No branches or pull requests

1 participant