-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Merging on categorical variables with mismatched ordering is ambiguous #8388
Comments
I believe this should be resolved by #8332 . Can you try with latest nightly ? |
@quasiben I am not able to try nightly. |
Looking into it. |
@jangorecki Thanks for reporting. Are you able to provide a more minimal reproducer? I get a 404 on the URL for Testing with the nightly build, I can do a similar groupby on dask_cudf without issue: import cudf
import dask_cudf
gdf = cudf.datasets.randomdata(
nrows=100,
dtypes={'k1': str, 'k2': str, 'k3': int, 'v1': float}
)
dgdf = dask_cudf.from_cudf(gdf, npartitions=2)
dgdf['k1'] = dgdf['k1'].astype('category')
dgdf['k2'] = dgdf['k2'].astype('category')
dgdf['k3'] = dgdf['k3'].astype('category')
dgdf = dgdf.persist()
ans = dgdf.groupby(['k1', 'k2', 'k3'], as_index=False, dropna=False).agg({'v1':'size'}).compute()
print(ans) |
Sorry, link was broken. I corrected it. Unfortunately I am tracking down other issue in cudf and doesn't really have much time to simplify this now. |
This issue has been labeled |
I am getting this issue following the same procedures as described above in this bug for version 22.06.01 of cudf and using rapids 22.06. Does the fix described in #8332 still work with this version of cudf? |
This no longer fails for me using the same procedure above for data generation. |
Describe the bug
Grouping query fails with "Merging on categorical variables with mismatched ordering is ambiguous" error.
Steps/Code to reproduce bug
wget https://raw.githubusercontent.com/h2oai/db-benchmark/master/_data/groupby-datagen.R ## install data.table if needed, then generate data using script Rscript groupby-datagen.R 1e7 1e2 0 0
Expected behavior
Query should complete successfully.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
Query completed fine when not using
dask_cudf
but onlycudf
.The text was updated successfully, but these errors were encountered: