Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Merging on categorical variables with mismatched ordering is ambiguous #8388

Closed
jangorecki opened this issue May 27, 2021 · 8 comments
Closed
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@jangorecki
Copy link

jangorecki commented May 27, 2021

Describe the bug
Grouping query fails with "Merging on categorical variables with mismatched ordering is ambiguous" error.

Steps/Code to reproduce bug

  • generate data
wget https://raw.githubusercontent.com/h2oai/db-benchmark/master/_data/groupby-datagen.R
## install data.table if needed, then generate data using script
Rscript groupby-datagen.R 1e7 1e2 0 0
  • run code
import cudf as cu
import dask_cudf as dc

x = dc.read_csv("G1_1e7_1e2_0_0.csv", header=0, dtype=['str','str','str','int32','int32','int32','int32','int32','float64'])
x['id1'] = x['id1'].astype('category')
x['id2'] = x['id2'].astype('category')
x['id3'] = x['id3'].astype('category')
x = x.persist()
ans = x.groupby(['id1','id2','id3','id4','id5','id6'], as_index=False, dropna=False).agg({'v3':'sum', 'v1':'size'}).compute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/base.py", line 284, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/base.py", line 566, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 565, in get_sync
    **kwargs
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 503, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 545, in submit
    fut.set_result(fn(*args, **kwargs))
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 237, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 237, in <listcomp>
    return [execute_task(*a) for a in it]
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 228, in execute_task
    result = pack_exception(e, dumps)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/utils.py", line 34, in apply
    return func(*args, **kwargs)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 930, in _groupby_apply_funcs
    return type(df)(result)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py", line 301, in __init__
    self._init_from_dict_like(data, index=index, columns=columns)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py", line 459, in _init_from_dict_like
    data, index = self._align_input_series_indices(data, index=index)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py", line 531, in _align_input_series_indices
    input_series
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py", line 7207, in _align_indices
    for sr in series_list
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py", line 7207, in <listcomp>
    for sr in series_list
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py", line 6381, in _align_to_index
    result = lhs.join(rhs, how=how, sort=sort)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py", line 4564, in join
    sort=sort,
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py", line 4514, in merge
    suffixes=suffixes,
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/frame.py", line 3164, in _merge
    suffixes=suffixes,
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/join.py", line 54, in merge
    return mergeobj.perform_merge()
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/join.py", line 168, in perform_merge
    lhs, rhs = self._match_key_dtypes(self.lhs, self.rhs)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/join.py", line 447, in _match_key_dtypes
    lcol, rcol, how=self.how
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/_join_helpers.py", line 95, in _match_join_keys
    return _match_categorical_dtypes(lcol, rcol, how)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/_join_helpers.py", line 143, in _match_categorical_
dtypes
    return _match_categorical_dtypes_both(lcol, rcol, how)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.7/site-packages/cudf/core/join/_join_helpers.py", line 171, in _match_categorical_
dtypes_both
    "Merging on categorical variables with mismatched"
TypeError: Merging on categorical variables with mismatched ordering is ambiguous

Expected behavior
Query should complete successfully.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda

Environment details

Click here to see environment details
 **git***
 commit de056ca926109569998c99b68213de04b2230977 (HEAD -> cudf-use-dask, upstream/cudf-use-dask)
 Author: jangorecki <j.gorecki@wit.edu.pl>
 Date:   Thu May 27 13:14:08 2021 +0200
 
 median not implemented in dask cudf?
 **git submodules***
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=16.04
 DISTRIB_CODENAME=xenial
 DISTRIB_DESCRIPTION="Ubuntu 16.04.7 LTS"
 NAME="Ubuntu"
 VERSION="16.04.7 LTS (Xenial Xerus)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 16.04.7 LTS"
 VERSION_ID="16.04"
 HOME_URL="http://www.ubuntu.com/"
 SUPPORT_URL="http://help.ubuntu.com/"
 BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
 VERSION_CODENAME=xenial
 UBUNTU_CODENAME=xenial
 Linux mr-dl11 4.15.0-122-generic #124~16.04.1-Ubuntu SMP Thu Oct 15 16:08:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Thu May 27 04:23:32 2021
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  GeForce GTX 108...  On   | 00000000:02:00.0 Off |                  N/A |
 | 23%   34C    P8    16W / 250W |      1MiB / 11178MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  GeForce GTX 108...  On   | 00000000:81:00.0 Off |                  N/A |
 | 23%   38C    P8    12W / 250W |      1MiB / 11178MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:          x86_64
 CPU op-mode(s):        32-bit, 64-bit
 Byte Order:            Little Endian
 CPU(s):                40
 On-line CPU(s) list:   0-39
 Thread(s) per core:    2
 Core(s) per socket:    10
 Socket(s):             2
 NUMA node(s):          2
 Vendor ID:             GenuineIntel
 CPU family:            6
 Model:                 79
 Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
 Stepping:              1
 CPU MHz:               2595.848
 CPU max MHz:           3100.0000
 CPU min MHz:           1200.0000
 BogoMIPS:              4401.79
 Virtualization:        VT-x
 L1d cache:             32K
 L1i cache:             32K
 L2 cache:              256K
 L3 cache:              25600K
 NUMA node0 CPU(s):     0-9,20-29
 NUMA node1 CPU(s):     10-19,30-39
 Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
 
 ***CMake***
 /usr/local/bin/cmake
 cmake version 3.13.2
 
 CMake suite maintained and supported by Kitware (kitware.com/cmake).
 
 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
 Copyright (C) 2017 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 /usr/local/cuda-9.2/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2018 NVIDIA Corporation
 Built on Tue_Jun_12_23:07:04_CDT_2018
 Cuda compilation tools, release 9.2, V9.2.148
 
 ***Python***
 /usr/bin/python
 Python 2.7.12
 
 ***Environment Variables***
 PATH                            : /usr/local/cuda-9.2/bin:/home/jan/bin:/home/jan/.local/bin:/home/jan/bin:/home/jan/.local/bin:/home/jan/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/mapd:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/opt/julia-1.5.3/bin:/opt/julia-1.6.0/bin
 LD_LIBRARY_PATH                 : /usr/local/cuda-9.2/lib64:/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    :
 PYTHON_PATH                     :
 
 ***conda packages***
 /home/jan/anaconda3/condabin/conda
 # packages in environment at /home/jan/anaconda3:
 #
 # Name                    Version                   Build  Channel
 _ipyw_jlab_nb_ext_conf    0.1.0                    py36_0
 _libgcc_mutex             0.1                        main
 alabaster                 0.7.12                     py_0    conda-forge
 anaconda-client           1.7.1                      py_0    conda-forge
 anaconda-navigator        1.9.2                    py36_0
 anaconda-project          0.8.2                      py_1    conda-forge
 appdirs                   1.4.3                      py_1    conda-forge
 arrow-cpp                 0.10.0           py36h70250a7_0    conda-forge
 asn1crypto                0.24.0                py36_1003    conda-forge
 astroid                   2.0.3                 py36_1000    conda-forge
 astropy                   3.0.5            py36h470a237_0    conda-forge
 atomicwrites              1.2.1                      py_0    conda-forge
 attrs                     18.2.0                     py_0    conda-forge
 automat                   0.7.0                      py_1    conda-forge
 babel                     2.6.0                      py_1    conda-forge
 backcall                  0.1.0                      py_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.os              0.1.1                 py36_1000    conda-forge
 backports.shutil_get_terminal_size 1.0.0                      py_3    conda-forge
 beautifulsoup4            4.6.3                 py36_1000    conda-forge
 bitarray                  0.8.3            py36h470a237_0    conda-forge
 bkcharts                  0.2                      py36_0    conda-forge
 blas                      1.0                         mkl
 blaze                     0.11.3                   py36_0    conda-forge
 bleach                    3.0.2                    pypi_0    pypi
 blinker                   1.4                        py_1    conda-forge
 blosc                     1.14.4               hdbcaa40_0
 bokeh                     1.0.1                 py36_1000    conda-forge
 boost-cpp                 1.67.0               h3a22d5f_0    conda-forge
 boto                      2.49.0                   py36_0
 boto3                     1.9.38                     py_0    conda-forge
 botocore                  1.12.39                    py_0    conda-forge
 bottleneck                1.2.1            py36h7eb728f_1    conda-forge
 bz2file                   0.98                       py_0    conda-forge
 bzip2                     1.0.6                h14c3975_5
 ca-certificates           2020.1.1                      0
 cairo                     1.14.12              h8948797_3
 certifi                   2020.4.5.1               py38_0
 cffi                      1.14.0           py38he30daa8_1
 chardet                   3.0.4                 py38_1003
 click                     7.0                        py_0    conda-forge
 cloudpickle               0.6.1                      py_0    conda-forge
 clyent                    1.2.2                      py_1    conda-forge
 colorama                  0.4.0                      py_0    conda-forge
 conda                     4.8.3                    py38_0
 conda-build               3.16.2                   py36_0    conda-forge
 conda-env                 2.6.0                         1
 conda-package-handling    1.6.1            py38h7b6447c_0
 constantly                15.1.0                     py_0    conda-forge
 contextlib2               0.5.5                      py_2    conda-forge
 cryptography              2.9.2            py38h1ba5d50_0
 cryptography-vectors      2.3.1                 py36_1000    conda-forge
 curl                      7.61.0               h84994c4_0
 cycler                    0.10.0                     py_1    conda-forge
 cython                    0.28.5          py36hf484d3e_1000    conda-forge
 cytoolz                   0.9.0.1          py36h470a237_1    conda-forge
 dask                      0.20.0                     py_0    conda-forge
 dask-core                 0.20.0                     py_0    conda-forge
 datashape                 0.5.4                    py36_0    conda-forge
 dbus                      1.13.2               h714fa37_1
 decorator                 4.3.0                      py_0    conda-forge
 defusedxml                0.5.0                      py_1    conda-forge
 distributed               1.24.0                py36_1000    conda-forge
 docutils                  0.14                  py36_1001    conda-forge
 entrypoints               0.2.3                 py36_1002    conda-forge
 et_xmlfile                1.0.1                    py36_0    conda-forge
 expat                     2.2.6                he6710b0_0
 fastcache                 1.0.2            py36h470a237_1    conda-forge
 filelock                  3.0.10                     py_0    conda-forge
 flask                     1.0.2                      py_2    conda-forge
 flask-cors                3.0.6                      py_0    conda-forge
 fontconfig                2.13.0               h9420a91_0
 freetype                  2.9.1                h8a8886c_1
 fribidi                   1.0.5                h7b6447c_0
 gensim                    3.5.0                    py36_0    conda-forge
 get_terminal_size         1.0.0                haa9412d_0
 gevent                    1.3.7            py36h470a237_0    conda-forge
 glib                      2.56.2               hd408876_0
 glob2                     0.6                        py_0    conda-forge
 gmp                       6.1.2                h6c8ec71_1
 gmpy2                     2.0.8            py36hb705a9b_2    conda-forge
 graphite2                 1.3.12               h23475e2_2
 greenlet                  0.4.13                   py36_0    conda-forge
 gst-plugins-base          1.14.0               hbbd80ab_1
 gstreamer                 1.14.0               hb453b48_1
 h5py                      2.8.0            py36h7eb728f_3    conda-forge
 harfbuzz                  1.8.8                hffaf4a1_0
 hdf5                      1.10.2               hba1933b_1
 heapdict                  1.0.0                 py36_1000    conda-forge
 html5lib                  1.0.1                      py_0    conda-forge
 hyperlink                 17.3.1                     py_0    conda-forge
 icu                       58.2                 h9c2bf20_1
 idna                      2.9                        py_1
 imageio                   2.4.1                      py_0    conda-forge
 imagesize                 1.1.0                      py_0    conda-forge
 importlib_metadata        0.6                      py36_0    conda-forge
 incremental               17.5.0                     py_0    conda-forge
 intel-openmp              2019.0                      118
 ipykernel                 5.1.0              pyh24bf2e0_0    conda-forge
 ipython                   7.1.1           py36h24bf2e0_1000    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 ipywidgets                7.4.2                      py_0    conda-forge
 isort                     4.3.4                 py36_1000    conda-forge
 itsdangerous              1.1.0                      py_0    conda-forge
 jbig                      2.1                  hdba287a_0
 jdcal                     1.4                        py_1    conda-forge
 jedi                      0.13.1                py36_1000    conda-forge
 jeepney                   0.4                        py_0    conda-forge
 jinja2                    2.10                       py_1    conda-forge
 jmespath                  0.9.3                      py_1    conda-forge
 jpeg                      9b                   h024ee3a_2
 jsonschema                3.0.0a3               py36_1000    conda-forge
 jupyter                   1.0.0                      py_1    conda-forge
 jupyter_client            5.2.3                      py_1    conda-forge
 jupyter_console           6.0.0                      py_0    conda-forge
 jupyter_core              4.4.0                      py_0    conda-forge
 jupyterlab                0.35.4                   py36_0    conda-forge
 jupyterlab_launcher       0.13.1                     py_2    conda-forge
 jupyterlab_server         0.2.0                      py_0    conda-forge
 keyring                   16.0.1                   py36_0    conda-forge
 kiwisolver                1.0.1            py36h2d50403_2    conda-forge
 lazy-object-proxy         1.3.1            py36h470a237_0    conda-forge
 ld_impl_linux-64          2.33.1               h53a641e_7
 libarchive                3.3.3                h823be47_0    conda-forge
 libcudf                   0.4.0                 cuda9.2_0    rapidsai
 libcudf_cffi              0.4.0            cuda9.2_py36_0    rapidsai
 libcurl                   7.61.0               h1ad7b7a_0
 libedit                   3.1.20181209         hc058e9b_0
 libffi                    3.3                  he6710b0_1
 libgcc-ng                 9.1.0                hdf63c60_0
 libgdf                    0.2.0                cuda9.2_95    rapidsai
 libgdf_cffi               0.2.0           cuda9.2_py36_95    rapidsai
 libgfortran-ng            7.3.0                hdf63c60_0
 libiconv                  1.15                 h470a237_3    conda-forge
 libpng                    1.6.34               hb9fc6fc_0
 libsodium                 1.0.16               h1bed415_0
 libssh2                   1.8.0                h9cfc8f7_4
 libstdcxx-ng              8.2.0                hdf63c60_1
 libtiff                   4.0.9                he85c1e1_2
 libtool                   2.4.6                h544aabb_3
 libuuid                   1.0.3                h1bed415_2
 libxcb                    1.13                 h1bed415_1
 libxml2                   2.9.8                h26e45fe_1
 libxslt                   1.1.32               h1312cb7_0
 llvmlite                  0.25.0           py36hf484d3e_0    numba
 locket                    0.2.0                      py_2    conda-forge
 lxml                      4.2.5            py36hc9114bc_0    conda-forge
 lzo                       2.10                 h49e0be7_2
 markupsafe                1.1.0            py36h470a237_0    conda-forge
 matplotlib                3.0.0            py36h5429711_0
 mccabe                    0.6.1                      py_1    conda-forge
 mistune                   0.8.4            py36h470a237_0    conda-forge
 mkl                       2019.0                      118
 mkl-service               1.1.2            py36h90e4bf4_5
 mkl_fft                   1.0.6                    py36_0    conda-forge
 mkl_random                1.0.2                    py36_0    conda-forge
 more-itertools            4.3.0                 py36_1000    conda-forge
 mpc                       1.1.0                h10f8cd9_1
 mpfr                      4.0.1                hdf1c602_3
 mpmath                    1.0.0                      py_1    conda-forge
 msgpack-python            0.5.6            py36h2d50403_3    conda-forge
 multipledispatch          0.6.0                      py_0    conda-forge
 navigator-updater         0.2.1                    py36_0
 nbconvert                 5.4.0                         1    conda-forge
 nbformat                  4.4.0                      py_1    conda-forge
 ncurses                   6.2                  he6710b0_1
 networkx                  2.2                        py_1    conda-forge
 nltk                      3.2.5                      py_0    conda-forge
 nose                      1.3.7                 py36_1002    conda-forge
 notebook                  5.7.0                 py36_1000    conda-forge
 numba                     0.41.0dev0      np114py36hf484d3e_176    numba
 numexpr                   2.6.8            py36hf8a1672_0    conda-forge
 numpy                     1.14.2           py36hdbf6ddf_0
 numpy-base                1.15.4           py36h81de0dd_0
 numpydoc                  0.8.0                      py_1    conda-forge
 nvstrings                 0.2.0            cuda9.2_py36_3    nvidia
 oauthlib                  2.1.0                      py_0    conda-forge
 odo                       0.5.1                      py_1    conda-forge
 olefile                   0.46                       py_0    conda-forge
 openpyxl                  2.5.9                      py_0    conda-forge
 openssl                   1.1.1g               h7b6447c_0
 packaging                 18.0                       py_0    conda-forge
 pandas                    0.20.3                   py36_1    conda-forge
 pandoc                    1.19.2.1             hea2e7c5_1
 pandocfilters             1.4.2                      py_1    conda-forge
 pango                     1.42.4               h049681c_0
 parquet-cpp               1.5.0.pre            h83d4a3d_0    conda-forge
 parso                     0.3.1                      py_0    conda-forge
 partd                     0.3.9                      py_0    conda-forge
 patchelf                  0.9                  hf484d3e_2
 path.py                   11.5.0                     py_0    conda-forge
 pathlib2                  2.3.2                 py36_1000    conda-forge
 patsy                     0.5.1                      py_0    conda-forge
 pcre                      8.42                 h439df22_0
 pep8                      1.7.1                      py_0    conda-forge
 pexpect                   4.6.0                 py36_1000    conda-forge
 pickleshare               0.7.5                 py36_1000    conda-forge
 pillow                    5.3.0            py36h34e0f95_0
 pip                       20.0.2                   py38_3
 pixman                    0.34.0               hceecf20_3
 pkginfo                   1.4.2                      py_1    conda-forge
 pluggy                    0.8.0                      py_0    conda-forge
 ply                       3.11                       py_1    conda-forge
 prometheus_client         0.4.2                      py_0    conda-forge
 prompt_toolkit            2.0.7                      py_0    conda-forge
 psutil                    5.4.8            py36h470a237_0    conda-forge
 ptyprocess                0.6.0                 py36_1000    conda-forge
 py                        1.7.0                      py_0    conda-forge
 pyarrow                   0.10.0           py36hfc679d8_0    conda-forge
 pyasn1                    0.4.4                      py_1    conda-forge
 pyasn1-modules            0.2.1                      py_0    conda-forge
 pycodestyle               2.4.0                      py_1    conda-forge
 pycosat                   0.6.3            py38h7b6447c_1
 pycparser                 2.20                       py_0
 pycrypto                  2.6.1            py36h470a237_2    conda-forge
 pycurl                    7.43.0.2         py36hb7f436b_0
 pyflakes                  2.0.0                      py_0    conda-forge
 pygments                  2.2.0                      py_1    conda-forge
 pyhamcrest                1.9.0                      py_2    conda-forge
 pyjwt                     1.6.4                      py_0    conda-forge
 pylint                    2.1.1                 py36_1000    conda-forge
 pyodbc                    4.0.24           py36hfc679d8_0    conda-forge
 pyopenssl                 19.1.0                   py38_0
 pyparsing                 2.3.0                      py_0    conda-forge
 pyqt                      5.9.2            py36h05f1152_2
 pyrsistent                0.14.5           py36h470a237_1    conda-forge
 pysocks                   1.7.1                    py38_0
 pytables                  3.4.4            py36h4f72b40_1    conda-forge
 pytest                    3.10.0                py36_1000    conda-forge
 pytest-arraydiff          0.2                        py_0    conda-forge
 pytest-astropy            0.4.0                      py_0    conda-forge
 pytest-doctestplus        0.1.3                      py_0    conda-forge
 pytest-openfiles          0.3.0                      py_0    conda-forge
 pytest-remotedata         0.3.1                      py_0    conda-forge
 python                    3.8.2               hcff3b4d_14
 python-crfsuite           0.9.6            py36h2d50403_0    conda-forge
 python-dateutil           2.7.5                      py_0    conda-forge
 python-libarchive-c       2.8                   py36_1004    conda-forge
 pytz                      2018.7                     py_0    conda-forge
 pywavelets                1.0.1            py36h7eb728f_0    conda-forge
 pyyaml                    3.13             py36h470a237_1    conda-forge
 pyzmq                     17.1.2           py36hae99301_1    conda-forge
 qt                        5.9.6                h8703b6f_2
 qtawesome                 0.5.2              pyh8a2030e_0    conda-forge
 qtconsole                 4.4.2                      py_1    conda-forge
 qtpy                      1.5.2              pyh8a2030e_0    conda-forge
 readline                  8.0                  h7b6447c_0
 requests                  2.23.0                   py38_0
 requests-oauthlib         1.0.0                      py_1    conda-forge
 rope                      0.10.7                     py_1    conda-forge
 ruamel_yaml               0.15.87          py38h7b6447c_0
 s3transfer                0.1.13                py36_1001    conda-forge
 scikit-image              0.14.1           py36hfc679d8_0    conda-forge
 scikit-learn              0.20.0           py36h4989274_1
 scipy                     1.1.0            py36hfa4b5c9_1
 seaborn                   0.9.0                      py_0    conda-forge
 secretstorage             3.1.0                 py36_1001    conda-forge
 send2trash                1.5.0                      py_0    conda-forge
 service_identity          17.0.0                     py_0    conda-forge
 setuptools                46.2.0                   py38_0
 simplegeneric             0.8.1                      py_1    conda-forge
 singledispatch            3.4.0.3               py36_1000    conda-forge
 sip                       4.19.8           py36hfc679d8_0    conda-forge
 six                       1.14.0                   py38_0
 smart_open                1.7.1                      py_0    conda-forge
 snappy                    1.1.7                hbae5bb6_3
 snowballstemmer           1.2.1                      py_1    conda-forge
 sortedcollections         1.0.1                      py_1    conda-forge
 sortedcontainers          2.0.5                      py_0    conda-forge
 sphinx                    1.8.1                 py36_1000    conda-forge
 sphinxcontrib             1.0                      py36_1
 sphinxcontrib-websupport  1.1.0                      py_1    conda-forge
 spyder                    3.2.8                    py36_0    conda-forge
 spyder-kernels            1.1.0                      py_0    conda-forge
 sqlalchemy                1.2.13           py36h470a237_0    conda-forge
 sqlite                    3.31.1               h62c20be_1
 statsmodels               0.9.0            py36h7eb728f_0    conda-forge
 sympy                     1.3                   py36_1000    conda-forge
 tblib                     1.3.2                      py_1    conda-forge
 terminado                 0.8.1                 py36_1001    conda-forge
 testpath                  0.4.2                 py36_1000    conda-forge
 tk                        8.6.8                hbc83047_0
 toolz                     0.9.0                      py_1    conda-forge
 tornado                   5.1.1            py36h470a237_0    conda-forge
 tqdm                      4.46.0                     py_0
 traitlets                 4.3.2                 py36_1000    conda-forge
 twisted                   18.9.0           py36h470a237_0    conda-forge
 twython                   3.7.0                      py_0    conda-forge
 typed-ast                 1.1.0                    py36_0    conda-forge
 unicodecsv                0.14.1                     py_1    conda-forge
 unixodbc                  2.3.7                h14c3975_0
 urllib3                   1.25.8                   py38_0
 wcwidth                   0.1.7                      py_1    conda-forge
 webencodings              0.5.1                    pypi_0    pypi
 werkzeug                  0.14.1                     py_0    conda-forge
 wheel                     0.34.2                   py38_0
 widgetsnbextension        3.4.2                 py36_1000    conda-forge
 wrapt                     1.10.11          py36h470a237_1    conda-forge
 xlrd                      1.1.0                      py_2    conda-forge
 xlsxwriter                1.1.2                      py_0    conda-forge
 xlwt                      1.3.0                      py_1    conda-forge
 xz                        5.2.5                h7b6447c_0
 yaml                      0.1.7                had09818_2
 zeromq                    4.2.5                hf484d3e_1
 zict                      0.1.3                      py_0    conda-forge
 zlib                      1.2.11               h7b6447c_3
 zope                      1.0                      py36_1
 zope.interface            4.6.0            py36h470a237_0    conda-forge

Additional context
Query completed fine when not using dask_cudf but only cudf.

@jangorecki jangorecki added Needs Triage Need team to review and classify bug Something isn't working labels May 27, 2021
@quasiben
Copy link
Member

I believe this should be resolved by #8332 . Can you try with latest nightly ?

@jangorecki
Copy link
Author

@quasiben I am not able to try nightly.
I hit this error as well but when doing merge on categorical variable, documented in h2oai/db-benchmark#218.
This report is strictly about grouping, not merging. Also the error I got during merging was "not yet implemented" error, which is different from error during group by here. So it is possibly a different root cause of it.

@shwina
Copy link
Contributor

shwina commented May 27, 2021

Looking into it.

@shwina shwina added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 27, 2021
@shwina
Copy link
Contributor

shwina commented May 28, 2021

@jangorecki Thanks for reporting.

Are you able to provide a more minimal reproducer? I get a 404 on the URL for groupby-datagen.R. I also don't have a lot of experience using R.

Testing with the nightly build, I can do a similar groupby on dask_cudf without issue:

import cudf
import dask_cudf

gdf = cudf.datasets.randomdata(
    nrows=100,
    dtypes={'k1': str, 'k2': str, 'k3': int, 'v1': float}
)
dgdf = dask_cudf.from_cudf(gdf, npartitions=2)

dgdf['k1'] = dgdf['k1'].astype('category')
dgdf['k2'] = dgdf['k2'].astype('category')
dgdf['k3'] = dgdf['k3'].astype('category')

dgdf = dgdf.persist()
ans = dgdf.groupby(['k1', 'k2', 'k3'], as_index=False, dropna=False).agg({'v1':'size'}).compute()
print(ans)

@jangorecki
Copy link
Author

Sorry, link was broken. I corrected it. Unfortunately I am tracking down other issue in cudf and doesn't really have much time to simplify this now.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@RitikParmar
Copy link

I am getting this issue following the same procedures as described above in this bug for version 22.06.01 of cudf and using rapids 22.06. Does the fix described in #8332 still work with this version of cudf?

@vyasr
Copy link
Contributor

vyasr commented May 13, 2024

This no longer fails for me using the same procedure above for data generation.

@vyasr vyasr closed this as completed May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

5 participants