You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I expect a categorical dtype to always have dtype.order be True or False. It is sometimes None after construction, or converted to None after an operation such as concat.
This can cause dask_cudf to fail when using workloads with categorical dtypes with the following error (the mismatch is b/c one is None, but there should be no mismatch):
File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/dataframe.py", line 3759, in mergereturn merge_cls(
File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/join.py", line 166, in perform_merge
lcol_casted, rcol_casted = _match_join_keys(lcol, rcol, self.how)
File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/_join_helpers.py", line 68, in _match_join_keysreturn _match_categorical_dtypes_both(
File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/_join_helpers.py", line 126, in _match_categorical_dtypes_bothraiseTypeError(
TypeError: Merging on categorical variables with mismatched ordering is ambiguous
Yes, as a "please defer to other categorical ordered property during astype, but this is weird so users probably shouldn't use ordered=None explicitly and we wanted to deprecate this" kind of way. See details here:
But, pandas ordered defaults to False and ordered doesn't get ignored and set to None as is done in the code snippets a shared above. It's the latter behavior (ordered property getting changed when it shouldn't) that is affecting my code. The default should be changed to False as a matter of matching pandas.
Describe the bug
I expect a categorical dtype to always have
dtype.order
beTrue
orFalse
. It is sometimesNone
after construction, or converted toNone
after an operation such as concat.This can cause
dask_cudf
to fail when using workloads with categorical dtypes with the following error (the mismatch is b/c one isNone
, but there should be no mismatch):Steps/Code to reproduce bug
Expected behavior
I expect a categorical dtype to always have
dtype.order
beTrue
orFalse
.This behavior should probably be fixed here (default on line 130):
cudf/python/cudf/cudf/core/dtypes.py
Lines 130 to 132 in e1a4e03
and here (default on line 1475):
cudf/python/cudf/cudf/core/column/column.py
Lines 1468 to 1476 in e1a4e03
or fix it directly in
DataFrame._concat
here:cudf/python/cudf/cudf/core/dataframe.py
Lines 1692 to 1694 in e1a4e03
via:
cudf/python/cudf/cudf/core/dataframe.py
Lines 7300 to 7311 in e1a4e03
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
This is necessary to use categorical dtypes in cugraph's PropertyGraph: rapidsai/cugraph#2510
The text was updated successfully, but these errors were encountered: