Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] categoricals with .ordered is None in cudf and dask_cudf #11487

Closed
eriknw opened this issue Aug 7, 2022 · 3 comments · Fixed by #11604
Closed

[BUG] categoricals with .ordered is None in cudf and dask_cudf #11487

eriknw opened this issue Aug 7, 2022 · 3 comments · Fixed by #11604
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@eriknw
Copy link
Contributor

eriknw commented Aug 7, 2022

Describe the bug
I expect a categorical dtype to always have dtype.order be True or False. It is sometimes None after construction, or converted to None after an operation such as concat.

This can cause dask_cudf to fail when using workloads with categorical dtypes with the following error (the mismatch is b/c one is None, but there should be no mismatch):

  File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/dataframe.py", line 3759, in merge
    return merge_cls(
  File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/join.py", line 166, in perform_merge
    lcol_casted, rcol_casted = _match_join_keys(lcol, rcol, self.how)
  File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/_join_helpers.py", line 68, in _match_join_keys
    return _match_categorical_dtypes_both(
  File "/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/lib/python3.9/site-packages/cudf/core/join/_join_helpers.py", line 126, in _match_categorical_dtypes_both
    raise TypeError(
TypeError: Merging on categorical variables with mismatched ordering is ambiguous

Steps/Code to reproduce bug

In [1]: import cudf
In [2]: import dask_cudf
In [3]: # Good; ordered is False
In [4]: print(cudf.CategoricalDtype(['foo'], ordered=False).ordered)
False

In [5]: s = cudf.Series(4*['foo'], dtype='category')
In [6]: print(s.dtype.ordered)
False

In [7]: s2 = dask_cudf.from_cudf(s, npartitions=2)
In [8]: print(s2.dtype.ordered)
False

In [9]: # Bad; ordered is None
In [10]: print(cudf.CategoricalDtype(['foo']).ordered)
None

In [11]: print(s2.compute().dtype.ordered)  # b/c finalize uses concat
None

In [12]: print(dask_cudf.concat([s, s]).dtype.ordered)
None

In [13]: print(dask_cudf.concat([s, s]).compute().dtype.ordered)
None

In [14]: print(dask_cudf.concat([s2, s2]).dtype.ordered)
None

In [15]: print(dask_cudf.concat([s2, s2]).compute().dtype.ordered)
None

Expected behavior
I expect a categorical dtype to always have dtype.order be True or False.

This behavior should probably be fixed here (default on line 130):

def __init__(self, categories=None, ordered: bool = None) -> None:
self._categories = self._init_categories(categories)
self.ordered = ordered

and here (default on line 1475):
def build_categorical_column(
categories: ColumnBase,
codes: ColumnBase,
mask: Buffer = None,
size: int = None,
offset: int = 0,
null_count: int = None,
ordered: bool = None,
) -> "cudf.core.column.CategoricalColumn":

or fix it directly in DataFrame._concat here:

_reassign_categories(
categories, out._data, indices[first_data_column_position:]
)

via:
def _reassign_categories(categories, cols, col_idxs):
for name, idx in zip(cols, col_idxs):
if idx in categories:
cols[name] = build_categorical_column(
categories=categories[idx],
codes=build_column(
cols[name].base_data, dtype=cols[name].dtype
),
mask=cols[name].base_mask,
offset=cols[name].offset,
size=cols[name].size,
)

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [conda]

Environment details

Click here to see environment details
 **git***
 commit 3a3e05dccbfac5a3c33632cda29253d0411b3ed8 (HEAD -> pg_categorical_type)
 Merge: 2df3cefb 7d8f0fd6
 Author: Erik Welch <erik.n.welch@gmail.com>
 Date:   Fri Aug 5 13:32:52 2022 -0700

 Merge branch 'branch-22.10' into pg_categorical_type
 **git submodules***

 ***OS Information***
 DGX_NAME="DGX Server"
 DGX_PRETTY_NAME="NVIDIA DGX Server"
 DGX_SWBUILD_DATE="2020-03-04"
 DGX_SWBUILD_VERSION="4.4.0"
 DGX_COMMIT_ID="ee09ebc"
 DGX_PLATFORM="DGX Server for DGX-1"
 DGX_SERIAL_NUMBER="QTFCOU8220028"

 DGX_R418_REPO_ENABLED=20220727-142458

 DGX_OTA_VERSION="4.13.0"
 DGX_OTA_DATE="Wed Jul 27 14:38:05 PDT 2022"
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
 NAME="Ubuntu"
 VERSION="18.04.6 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.6 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux dgx12 4.15.0-189-generic #200-Ubuntu SMP Wed Jun 22 19:53:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Sat Aug  6 22:13:57 2022
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
 | N/A   34C    P0    55W / 300W |  12425MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
 | N/A   33C    P0    55W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
 | N/A   31C    P0    54W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
 | N/A   31C    P0    55W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
 | N/A   32C    P0    54W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
 | N/A   33C    P0    54W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
 | N/A   36C    P0    56W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
 | N/A   31C    P0    54W / 300W |  11782MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    0   N/A  N/A     27368      C   ...-groupby-apply/bin/python      643MiB |
 |    0   N/A  N/A     78085      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    1   N/A  N/A     78074      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    2   N/A  N/A     78078      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    3   N/A  N/A     78081      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    4   N/A  N/A     78089      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    5   N/A  N/A     78092      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    6   N/A  N/A     78095      C   ...s/cugraph_dev8/bin/python    11779MiB |
 |    7   N/A  N/A     78098      C   ...s/cugraph_dev8/bin/python    11779MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              80
 On-line CPU(s) list: 0-79
 Thread(s) per core:  2
 Core(s) per socket:  20
 Socket(s):           2
 NUMA node(s):        2
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               79
 Model name:          Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
 Stepping:            1
 CPU MHz:             1745.439
 CPU max MHz:         3600.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            4390.09
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            256K
 L3 cache:            51200K
 NUMA node0 CPU(s):   0-19,40-59
 NUMA node1 CPU(s):   20-39,60-79
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

 ***CMake***
 /home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin/cmake
 cmake version 3.23.3

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin/g++
 g++ (conda-forge gcc 10.4.0-16) 10.4.0
 Copyright (C) 2020 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***
 /home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin/nvcc

/home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin/nvcc: line 8: /bin/nvcc: No such file or directory

 ***Python***
 /home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin/python
 Python 3.9.13

 ***Environment Variables***
 PATH                            : /home/nfs/erwelch/miniconda3/envs/cugraph_dev8/bin:/home/nfs/erwelch/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/nfs/erwelch/miniconda3/envs/cugraph_dev8
 PYTHON_PATH                     :

 ***conda packages***
 /home/nfs/erwelch/miniconda3/condabin/conda
 # packages in environment at /home/nfs/erwelch/miniconda3/envs/cugraph_dev8:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       2_gnu    conda-forge
 abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
 alabaster                 0.7.12                     py_0    conda-forge
 argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
 argon2-cffi-bindings      21.2.0           py39hb9d737c_2    conda-forge
 arrow-cpp                 8.0.1           py39h811ffd7_0_cpu    conda-forge
 asttokens                 2.0.5              pyhd8ed1ab_0    conda-forge
 asvdb                     0.4.2               g90e8f2c_40    rapidsai
 atk-1.0                   2.36.0               h3371d22_4    conda-forge
 attrs                     22.1.0             pyh71513ae_1    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
 babel                     2.10.3             pyhd8ed1ab_0    conda-forge
 backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
 beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
 binutils                  2.36.1               hdd6e379_2    conda-forge
 binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
 binutils_linux-64         2.36                hf3e587d_10    conda-forge
 black                     22.6.0           py39hf3d152e_2    conda-forge
 bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
 bokeh                     2.4.3            py39hf3d152e_0    conda-forge
 boost                     1.78.0           py39hac2352c_0    conda-forge
 boost-cpp                 1.78.0               h75c5d50_1    conda-forge
 boto3                     1.24.45            pyhd8ed1ab_0    conda-forge
 botocore                  1.27.45            pyhd8ed1ab_0    conda-forge
 brotlipy                  0.7.0           py39hb9d737c_1004    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.18.1               h7f98852_0    conda-forge
 c-compiler                1.4.2                h166bdaf_0    conda-forge
 ca-certificates           2022.6.15            ha878542_0    conda-forge
 cachetools                5.0.0              pyhd8ed1ab_0    conda-forge
 cairo                     1.16.0            ha61ee94_1011    conda-forge
 certifi                   2022.6.15        py39hf3d152e_0    conda-forge
 cffi                      1.15.1           py39he91dace_0    conda-forge
 charset-normalizer        2.1.0              pyhd8ed1ab_0    conda-forge
 clang                     11.1.0               ha770c72_1    conda-forge
 clang-11                  11.1.0          default_ha53f305_1    conda-forge
 clang-tools               11.1.0          default_ha53f305_1    conda-forge
 clangxx                   11.1.0          default_ha53f305_1    conda-forge
 click                     8.1.3            py39hf3d152e_0    conda-forge
 cloudpickle               2.1.0              pyhd8ed1ab_0    conda-forge
 cmake                     3.23.3               h5432695_0    conda-forge
 colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
 commonmark                0.9.1                      py_0    conda-forge
 coverage                  6.4.2            py39hb9d737c_0    conda-forge
 cryptography              37.0.4           py39hd97740a_0    conda-forge
 cuda-python               11.7.0           py39h3fd9d12_0    nvidia
 cudatoolkit               11.5.1               hcf5317a_9    nvidia
 cudf                      22.08.00a220804 cuda_11_py39_gb2ae9a9f9e_301    rapidsai-nightly
 cugraph                   22.8.0a0+127.gcdc563fc          pypi_0    pypi
 cupy                      10.6.0           py39hc3c280e_0    conda-forge
 cxx-compiler              1.4.2                h924138e_0    conda-forge
 cython                    0.29.32          py39h5a03fae_0    conda-forge
 cytoolz                   0.12.0           py39hb9d737c_0    conda-forge
 dask                      2022.7.1           pyhd8ed1ab_0    conda-forge
 dask-core                 2022.7.1           pyhd8ed1ab_0    conda-forge
 dask-cuda                 22.08.00a220804 py39_gad985ce_34    rapidsai-nightly
 dask-cudf                 22.08.00a220804 cuda_11_py39_gb2ae9a9f9e_301    rapidsai-nightly
 debugpy                   1.6.0            py39h5a03fae_0    conda-forge
 decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
 defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
 distributed               2022.7.1           pyhd8ed1ab_0    conda-forge
 distro                    1.6.0              pyhd8ed1ab_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 docutils                  0.19             py39hf3d152e_0    conda-forge
 doxygen                   1.9.3                h583eb01_1    conda-forge
 entrypoints               0.4                pyhd8ed1ab_0    conda-forge
 executing                 0.9.1              pyhd8ed1ab_0    conda-forge
 expat                     2.4.8                h27087fc_0    conda-forge
 fastavro                  1.5.4            py39hb9d737c_0    conda-forge
 fastrlock                 0.8              py39h5a03fae_2    conda-forge
 flake8                    5.0.4              pyhd8ed1ab_0    conda-forge
 flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
 font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
 font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
 font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
 font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
 fontconfig                2.14.0               h8e229c2_0    conda-forge
 fonts-conda-ecosystem     1                             0    conda-forge
 fonts-conda-forge         1                             0    conda-forge
 freetype                  2.10.4               h0708190_1    conda-forge
 fribidi                   1.0.10               h36c2ea0_0    conda-forge
 fsspec                    2022.7.1           pyhd8ed1ab_0    conda-forge
 future                    0.18.2           py39hf3d152e_5    conda-forge
 gcc                       10.4.0              hb92f740_10    conda-forge
 gcc_impl_linux-64         10.4.0              h7ee1905_16    conda-forge
 gcc_linux-64              10.4.0              h9215b83_10    conda-forge
 gdk-pixbuf                2.42.8               hff1cb4f_0    conda-forge
 gettext                   0.19.8.1          h73d1719_1008    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 gh                        2.14.3               ha8f183a_0    conda-forge
 giflib                    5.2.1                h36c2ea0_2    conda-forge
 glog                      0.6.0                h6f12383_0    conda-forge
 gmock                     1.10.0               h4bd325d_7    conda-forge
 graphite2                 1.3.13            h58526e2_1001    conda-forge
 graphviz                  5.0.0                h5abf519_0    conda-forge
 grpc-cpp                  1.45.2               h3b8df00_4    conda-forge
 gtest                     1.10.0               h4bd325d_7    conda-forge
 gtk2                      2.24.33              h90689f9_2    conda-forge
 gts                       0.7.6                h64030ff_2    conda-forge
 gxx                       10.4.0              hb92f740_10    conda-forge
 gxx_impl_linux-64         10.4.0              h7ee1905_16    conda-forge
 gxx_linux-64              10.4.0              h6e491c6_10    conda-forge
 harfbuzz                  5.1.0                hf9f4e7c_0    conda-forge
 heapdict                  1.0.1                      py_0    conda-forge
 icu                       70.1                 h27087fc_0    conda-forge
 idna                      3.3                pyhd8ed1ab_0    conda-forge
 imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
 importlib-metadata        4.11.4           py39hf3d152e_0    conda-forge
 importlib_resources       5.9.0              pyhd8ed1ab_0    conda-forge
 iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
 ipycytoscape              1.3.3              pyh1d7be83_0    conda-forge
 ipykernel                 6.15.1             pyh210e3f2_0    conda-forge
 ipython                   8.4.0            py39hf3d152e_0    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 ipywidgets                7.7.1              pyhd8ed1ab_0    conda-forge
 isort                     5.10.1             pyhd8ed1ab_0    conda-forge
 jedi                      0.18.1           py39hf3d152e_1    conda-forge
 jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
 jmespath                  1.0.1              pyhd8ed1ab_0    conda-forge
 joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
 jpeg                      9e                   h166bdaf_2    conda-forge
 jsonschema                4.9.1              pyhd8ed1ab_0    conda-forge
 jupyter_client            7.3.4              pyhd8ed1ab_0    conda-forge
 jupyter_core              4.11.1           py39hf3d152e_0    conda-forge
 jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
 jupyterlab_widgets        1.1.1              pyhd8ed1ab_0    conda-forge
 kernel-headers_linux-64   2.6.32              he073ed8_15    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.19.3               h3790be6_0    conda-forge
 lcms2                     2.12                 hddcbb42_0    conda-forge
 ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
 lerc                      4.0.0                h27087fc_0    conda-forge
 libblas                   3.9.0           15_linux64_openblas    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
 libbrotlidec              1.0.9                h166bdaf_7    conda-forge
 libbrotlienc              1.0.9                h166bdaf_7    conda-forge
 libcblas                  3.9.0           15_linux64_openblas    conda-forge
 libclang-cpp11.1          11.1.0          default_ha53f305_1    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   22.08.00a220804 cuda11_gb2ae9a9f9e_301    rapidsai-nightly
 libcugraphops             22.08.00a220804 cuda11_g52e8b618_26    rapidsai-nightly
 libcurl                   7.83.1               h7bff187_0    conda-forge
 libcusolver               11.4.0.1                      0    nvidia
 libcusparse               11.7.4.91                     0    nvidia
 libdeflate                1.12                 h166bdaf_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               h9b69904_4    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-devel_linux-64     10.4.0              h74af60c_16    conda-forge
 libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
 libgd                     2.3.3                h18fbbfe_3    conda-forge
 libgfortran-ng            12.1.0              h69a702a_16    conda-forge
 libgfortran5              12.1.0              hdcd56e2_16    conda-forge
 libglib                   2.72.1               h2d90d5f_0    conda-forge
 libgomp                   12.1.0              h8d9b700_16    conda-forge
 libgoogle-cloud           1.40.2               habd0e3a_0    conda-forge
 libiconv                  1.16                 h516909a_0    conda-forge
 liblapack                 3.9.0           15_linux64_openblas    conda-forge
 libllvm11                 11.1.0               hf817b99_3    conda-forge
 libnghttp2                1.47.0               h727a467_0    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libopenblas               0.3.20          pthreads_h78a6416_1    conda-forge
 libpng                    1.6.37               h753d276_3    conda-forge
 libprotobuf               3.20.1               h6239696_0    conda-forge
 libraft-headers           22.08.00a220804 cuda11_g3def727_68    rapidsai-nightly
 librmm                    22.08.00a220804 cuda11_g594070fb_60    rapidsai-nightly
 librsvg                   2.54.4               h7abd40a_0    conda-forge
 libsanitizer              10.4.0              hde28e3b_16    conda-forge
 libsodium                 1.0.18               h36c2ea0_1    conda-forge
 libssh2                   1.10.0               ha56f1ee_2    conda-forge
 libstdcxx-devel_linux-64  10.4.0              h74af60c_16    conda-forge
 libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
 libthrift                 0.16.0               h519c5ea_1    conda-forge
 libtiff                   4.4.0                h0d92c0b_2    conda-forge
 libtool                   2.4.6             h9c3ff4c_1008    conda-forge
 libutf8proc               2.7.0                h7f98852_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libuv                     1.44.2               h166bdaf_0    conda-forge
 libwebp                   1.2.3                h522a892_1    conda-forge
 libwebp-base              1.2.3                h166bdaf_2    conda-forge
 libxcb                    1.13              h7f98852_1004    conda-forge
 libxml2                   2.9.14               h22db469_3    conda-forge
 libzlib                   1.2.12               h166bdaf_2    conda-forge
 llvmlite                  0.38.1           py39h7d9a04d_0    conda-forge
 locket                    1.0.0              pyhd8ed1ab_0    conda-forge
 lz4                       4.0.0            py39h029007f_2    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 make                      4.3                  hd18ef5c_1    conda-forge
 markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
 markupsafe                2.1.1            py39hb9d737c_1    conda-forge
 matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
 mccabe                    0.7.0              pyhd8ed1ab_0    conda-forge
 mistune                   0.8.4           py39h3811e60_1005    conda-forge
 msgpack-python            1.0.4            py39hf939315_0    conda-forge
 mypy_extensions           0.4.3            py39hf3d152e_5    conda-forge
 nbclient                  0.6.6              pyhd8ed1ab_0    conda-forge
 nbconvert                 6.5.0              pyhd8ed1ab_0    conda-forge
 nbconvert-core            6.5.0              pyhd8ed1ab_0    conda-forge
 nbconvert-pandoc          6.5.0              pyhd8ed1ab_0    conda-forge
 nbformat                  5.4.0              pyhd8ed1ab_0    conda-forge
 nbsphinx                  0.8.9              pyhd8ed1ab_0    conda-forge
 nccl                      2.12.12.1            h0800d71_0    conda-forge
 ncurses                   6.3                  h27087fc_1    conda-forge
 nest-asyncio              1.5.5              pyhd8ed1ab_0    conda-forge
 networkx                  2.8.5              pyhd8ed1ab_0    conda-forge
 notebook                  6.4.12             pyha770c72_0    conda-forge
 numba                     0.55.2           py39h66db6d7_0    conda-forge
 numpy                     1.22.4           py39hc58783e_0    conda-forge
 numpydoc                  1.4.0              pyhd8ed1ab_1    conda-forge
 nvcc_linux-64             10.1                hcaf9a05_10
 nvtx                      0.2.3            py39h3811e60_1    conda-forge
 openjpeg                  2.4.0                hb52868f_1    conda-forge
 openssl                   1.1.1q               h166bdaf_0    conda-forge
 orc                       1.7.5                h6c59b99_0    conda-forge
 packaging                 21.3               pyhd8ed1ab_0    conda-forge
 pandas                    1.4.3            py39h1832856_0    conda-forge
 pandoc                    2.18                 ha770c72_0    conda-forge
 pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
 pango                     1.50.8               hc4f8a73_1    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 parso                     0.8.3              pyhd8ed1ab_0    conda-forge
 partd                     1.2.0              pyhd8ed1ab_0    conda-forge
 pathspec                  0.9.0              pyhd8ed1ab_0    conda-forge
 pcre                      8.45                 h9c3ff4c_0    conda-forge
 pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
 pickleshare               0.7.5                   py_1003    conda-forge
 pillow                    9.2.0            py39hae2aec6_0    conda-forge
 pip                       22.2.2             pyhd8ed1ab_0    conda-forge
 pixman                    0.40.0               h36c2ea0_0    conda-forge
 pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
 platformdirs              2.5.2              pyhd8ed1ab_1    conda-forge
 pluggy                    1.0.0            py39hf3d152e_3    conda-forge
 prometheus_client         0.14.1             pyhd8ed1ab_0    conda-forge
 prompt-toolkit            3.0.30             pyha770c72_0    conda-forge
 protobuf                  3.20.1           py39h5a03fae_0    conda-forge
 psutil                    5.9.1            py39hb9d737c_0    conda-forge
 pthread-stubs             0.4               h36c2ea0_1001    conda-forge
 ptxcompiler               0.2.0            py39h107f55c_0    rapidsai
 ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
 pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
 py                        1.11.0             pyh6c4a22f_0    conda-forge
 py-cpuinfo                8.0.0              pyhd8ed1ab_0    conda-forge
 pyarrow                   8.0.1           py39h42d110c_0_cpu    conda-forge
 pycodestyle               2.9.0              pyhd8ed1ab_0    conda-forge
 pycparser                 2.21               pyhd8ed1ab_0    conda-forge
 pydata-sphinx-theme       0.9.0              pyhd8ed1ab_1    conda-forge
 pyflakes                  2.5.0              pyhd8ed1ab_0    conda-forge
 pygal                     2.4.0                      py_0    conda-forge
 pygments                  2.12.0             pyhd8ed1ab_0    conda-forge
 pygraphviz                1.9              py39h1f7127a_3    conda-forge
 pylibcugraph              22.8.0a0+127.gcdc563fc           dev_0    <develop>
 pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
 pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
 pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
 pyraft                    22.08.00a220804 cuda11_py39_g3def727_68    rapidsai-nightly
 pyrsistent                0.18.1           py39hb9d737c_1    conda-forge
 pysocks                   1.7.1            py39hf3d152e_5    conda-forge
 pytest                    7.1.2            py39hf3d152e_0    conda-forge
 pytest-benchmark          3.2.3              pyh9f0ad1d_0    conda-forge
 pytest-cov                3.0.0              pyhd8ed1ab_0    conda-forge
 python                    3.9.13          h9a8a25e_0_cpython    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python-fastjsonschema     2.16.1             pyhd8ed1ab_0    conda-forge
 python-graphviz           0.20.1             pyh22cad53_0    conda-forge
 python_abi                3.9                      2_cp39    conda-forge
 pytz                      2022.1             pyhd8ed1ab_0    conda-forge
 pyyaml                    6.0              py39hb9d737c_4    conda-forge
 pyzmq                     23.2.0           py39headdf64_0    conda-forge
 rapids-pytest-benchmark   0.0.14                     py_0    rapidsai
 re2                       2022.06.01           h27087fc_0    conda-forge
 readline                  8.1.2                h0f457ee_0    conda-forge
 recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
 requests                  2.28.1             pyhd8ed1ab_0    conda-forge
 rhash                     1.4.3                h166bdaf_0    conda-forge
 rmm                       22.08.00a220804 cuda11_py39_g594070fb_60    rapidsai-nightly
 s2n                       1.0.10               h9b69904_0    conda-forge
 s3transfer                0.6.0              pyhd8ed1ab_0    conda-forge
 scikit-build              0.15.0             pyhb871ab6_0    conda-forge
 scikit-learn              1.1.1            py39h4037b75_0    conda-forge
 scipy                     1.9.0            py39h8ba3f38_0    conda-forge
 send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
 setuptools                63.3.0           py39hf3d152e_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.9                hbd366e4_1    conda-forge
 snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
 sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
 soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
 spdlog                    1.8.5                h4bd325d_1    conda-forge
 spectate                  1.0.1              pyhd8ed1ab_0    conda-forge
 sphinx                    5.1.1              pyhd8ed1ab_1    conda-forge
 sphinx-copybutton         0.5.0              pyhd8ed1ab_0    conda-forge
 sphinx-markdown-tables    0.0.17             pyh6c4a22f_0    conda-forge
 sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
 sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
 sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
 sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
 sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_2    conda-forge
 sphinxcontrib-websupport  1.2.4              pyhd8ed1ab_1    conda-forge
 sqlite                    3.39.2               h4ff8645_0    conda-forge
 stack_data                0.3.0              pyhd8ed1ab_0    conda-forge
 sysroot_linux-64          2.12                he073ed8_15    conda-forge
 tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
 terminado                 0.15.0           py39hf3d152e_0    conda-forge
 threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
 tinycss2                  1.1.1              pyhd8ed1ab_0    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 toml                      0.10.2             pyhd8ed1ab_0    conda-forge
 tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
 toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
 tornado                   6.1              py39hb9d737c_3    conda-forge
 traitlets                 5.3.0              pyhd8ed1ab_0    conda-forge
 traittypes                0.2.1              pyh9f0ad1d_2    conda-forge
 typing_extensions         4.3.0              pyha770c72_0    conda-forge
 tzdata                    2022a                h191b570_0    conda-forge
 ucx                       1.13.0               h538f049_0    conda-forge
 ucx-proc                  1.0.0                       gpu    rapidsai
 ucx-py                    0.27.00a220714  py39_g44062cd_14    rapidsai-nightly
 urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
 wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
 webencodings              0.5.1                      py_1    conda-forge
 wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
 widgetsnbextension        3.6.1              pyha770c72_0    conda-forge
 xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
 xorg-libice               1.0.10               h7f98852_0    conda-forge
 xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
 xorg-libx11               1.7.2                h7f98852_0    conda-forge
 xorg-libxau               1.0.9                h7f98852_0    conda-forge
 xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
 xorg-libxext              1.3.4                h7f98852_1    conda-forge
 xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
 xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
 xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
 xorg-xproto               7.0.31            h7f98852_1007    conda-forge
 xz                        5.2.5                h516909a_1    conda-forge
 yaml                      0.2.5                h7f98852_2    conda-forge
 zeromq                    4.3.4                h9c3ff4c_1    conda-forge
 zict                      2.2.0              pyhd8ed1ab_0    conda-forge
 zipp                      3.8.1              pyhd8ed1ab_0    conda-forge
 zlib                      1.2.12               h166bdaf_2    conda-forge
 zstd                      1.5.2                h8a70e8d_2    conda-forge


Additional context
This is necessary to use categorical dtypes in cugraph's PropertyGraph: rapidsai/cugraph#2510

@eriknw eriknw added Needs Triage Need team to review and classify bug Something isn't working labels Aug 7, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Aug 7, 2022
@galipremsagar galipremsagar self-assigned this Aug 8, 2022
@galipremsagar galipremsagar added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 8, 2022
@galipremsagar galipremsagar added this to Issue-Needs prioritizing in v22.10 Release via automation Aug 8, 2022
@shwina
Copy link
Contributor

shwina commented Aug 9, 2022

Doesn't Pandas support None in addition to True and False?

https://pandas.pydata.org/docs/reference/api/pandas.CategoricalDtype.html#pandas.CategoricalDtype

@eriknw
Copy link
Contributor Author

eriknw commented Aug 9, 2022

Yes, as a "please defer to other categorical ordered property during astype, but this is weird so users probably shouldn't use ordered=None explicitly and we wanted to deprecate this" kind of way. See details here:

pandas-dev/pandas#26336
pandas-dev/pandas#26403
pandas-dev/pandas#29955

But, pandas ordered defaults to False and ordered doesn't get ignored and set to None as is done in the code snippets a shared above. It's the latter behavior (ordered property getting changed when it shouldn't) that is affecting my code. The default should be changed to False as a matter of matching pandas.

@eriknw
Copy link
Contributor Author

eriknw commented Aug 9, 2022

It's the latter behavior (ordered property getting changed when it shouldn't) that is affecting my code.

Let me highlight this from my examples above:

In [1]: import cudf
In [2]: import dask_cudf
In [5]: s = cudf.Series(4*['foo'], dtype='category')
In [6]: print(s.dtype.ordered)
False

In [7]: s2 = dask_cudf.from_cudf(s, npartitions=2)
In [8]: print(s2.dtype.ordered)
False

In [11]: print(s2.compute().dtype.ordered)  # b/c finalize uses concat
None

@galipremsagar galipremsagar moved this from Issue-Needs prioritizing to Issue-P1 in v22.10 Release Aug 24, 2022
rapids-bot bot pushed a commit that referenced this issue Aug 26, 2022
…11604)

Fixes: #11487 

This PR switches default value of `ordered` parameter in `CategoricalDtype` to `False`. This fixes some issues around concat and building categorical columns.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ashwin Srinath (https://github.com/shwina)

URL: #11604
Bug Squashing automation moved this from Needs prioritizing to Closed Aug 26, 2022
v22.10 Release automation moved this from Issue-P1 to Done Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants