Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cumulative aggregation functions fail on dataframe/series groupbys with nulls #12055

Closed
charlesbluca opened this issue Nov 3, 2022 · 1 comment · Fixed by #13389
Closed
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@charlesbluca
Copy link
Member

Describe the bug
When attempting to use the cumulative aggregation functions on a groupby of a frame or series with nulls (i.e. df.groupby("x").cumsum()), we get a ValueError.

Steps/Code to reproduce bug
Note we would see the same traceback if we used cumcount, cummax, or cummin:

import cudf
import cupy as cp

gdf = cudf.DataFrame(
    {
        "xx": cp.random.randint(0, 5, size=10000),
        "x":  cp.random.normal(size=10000),
        "y":  cp.random.normal(size=10000),
    }
)
gdf = gdf.mask(cp.random.choice([True, False], size=gdf.shape))

gdf_grouped = gdf.groupby("xx")


gdf_grouped.cumsum()
gdf_grouped.xx.cumsum()
ValueError                                Traceback (most recent call last)
Cell In [3], line 16
     11 gdf = gdf.mask(cp.random.choice([True, False], size=gdf.shape))
     13 gdf_grouped = gdf.groupby("xx")
---> 16 gdf_grouped.xx.cumsum()
     17 gdf_grouped.cumcount()
     18 gdf_grouped.cummax()

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/mixins/mixin_factory.py:11, in _partialmethod.<locals>.wrapper(self, *args2, **kwargs2)
     10 def wrapper(self, *args2, **kwargs2):
---> 11     return method(self, *args1, *args2, **kwargs1, **kwargs2)

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/groupby/groupby.py:536, in GroupBy._scan(self, op, *args, **kwargs)
    534 def _scan(self, op: str, *args, **kwargs):
    535     """{op_name} for each group."""
--> 536     return self.agg(op)

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/groupby/groupby.py:1749, in SeriesGroupBy.agg(self, func)
   1748 def agg(self, func):
-> 1749     result = super().agg(func)
   1751     # downcast the result to a Series:
   1752     if len(result._data):

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/groupby/groupby.py:490, in GroupBy.agg(self, func)
    487     result = result.reset_index()
    488 if libgroupby._is_all_scan_aggregate(normalized_aggs):
    489     # Scan aggregations return rows in original index order
--> 490     return self._mimic_pandas_order(result)
    492 return result

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/groupby/groupby.py:1724, in GroupBy._mimic_pandas_order(self, result)
   1722 gather_map = order_cols[0].argsort()
   1723 result = result.take(gather_map)
-> 1724 result.index = self.obj.index
   1725 return result

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/dataframe.py:1091, in DataFrame.__setattr__(self, key, col)
   1089     super().__setattr__(key, col)
   1090 else:
-> 1091     super().__setattr__(key, col)

File /raid/charlesb/mambaforge/envs/checkout-groupby-layers/lib/python3.9/site-packages/cudf/core/indexed_frame.py:533, in IndexedFrame.index(self, value)
    531 # A DataFrame with 0 columns can have an index of arbitrary length.
    532 if len(self._data) > 0 and new_length != old_length:
--> 533     raise ValueError(
    534         f"Length mismatch: Expected axis has {old_length} elements, "
    535         f"new values have {len(value)} elements"
    536     )
    537 self._index = Index(value)

ValueError: Length mismatch: Expected axis has 5032 elements, new values have 10000 elements

Expected behavior
I would expect these operations to succeed and give me something roughly similar to the output of pandas:

In [2]: gdf.to_pandas().groupby("xx").cumsum()
Out[2]: 
              x          y
0           NaN        NaN
1           NaN        NaN
2           NaN        NaN
3           NaN        NaN
4           NaN        NaN
...         ...        ...
9995  15.674543  13.537169
9996 -22.231675        NaN
9997        NaN -19.599722
9998        NaN        NaN
9999        NaN        NaN

[10000 rows x 2 columns]

Environment overview (please complete the following information)

  • Environment location: bare-metal
  • Method of cuDF install: source

Environment details

Click here to see environment details
 **git***

print_env.sh: 11: [: true: unexpected operator
Not inside a git repository

 ***OS Information***
 DGX_NAME="DGX Server"
 DGX_PRETTY_NAME="NVIDIA DGX Server"
 DGX_SWBUILD_DATE="2020-03-04"
 DGX_SWBUILD_VERSION="4.4.0"
 DGX_COMMIT_ID="ee09ebc"
 DGX_PLATFORM="DGX Server for DGX-1"
 DGX_SERIAL_NUMBER="QTFCOU8220028"
 
 DGX_R418_REPO_ENABLED=20220727-142458
 
 DGX_OTA_VERSION="4.13.0"
 DGX_OTA_DATE="Wed Jul 27 14:38:05 PDT 2022"
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
 NAME="Ubuntu"
 VERSION="18.04.6 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.6 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux dgx12 4.15.0-189-generic #200-Ubuntu SMP Wed Jun 22 19:53:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Thu Nov  3 07:06:22 2022       
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
 | N/A   33C    P0    42W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
 | N/A   30C    P0    42W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
 | N/A   28C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
 | N/A   28C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
 | N/A   30C    P0    42W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
 | N/A   30C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
 | N/A   32C    P0    43W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
 | N/A   29C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              80
 On-line CPU(s) list: 0-79
 Thread(s) per core:  2
 Core(s) per socket:  20
 Socket(s):           2
 NUMA node(s):        2
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               79
 Model name:          Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
 Stepping:            1
 CPU MHz:             1528.974
 CPU max MHz:         3600.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            4389.99
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            256K
 L3 cache:            51200K
 NUMA node0 CPU(s):   0-19,40-59
 NUMA node1 CPU(s):   20-39,60-79
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
 
 ***CMake***
 /raid/charlesb/mambaforge/envs/checkout-groupby-layers/bin/cmake
 cmake version 3.24.3
 
 CMake suite maintained and supported by Kitware (kitware.com/cmake).
 
 ***g++***
 /raid/charlesb/mambaforge/envs/checkout-groupby-layers/bin/g++
 g++ (conda-forge gcc 9.5.0-19) 9.5.0
 Copyright (C) 2019 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 /usr/local/cuda-11.5/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2021 NVIDIA Corporation
 Built on Thu_Nov_18_09:45:30_PST_2021
 Cuda compilation tools, release 11.5, V11.5.119
 Build cuda_11.5.r11.5/compiler.30672275_0
 
 ***Python***
 /raid/charlesb/mambaforge/envs/checkout-groupby-layers/bin/python
 Python 3.9.13
 
 ***Environment Variables***
 PATH                            : /usr/local/cuda-11.5/bin:/home/nfs/charlesb/.local/bin:/raid/charlesb/.vscode-server/bin/d045a5eda657f4d7b676dedbfa7aab8207f8a075/bin/remote-cli:/home/nfs/charlesb/.cargo/bin:/raid/charlesb/mambaforge/envs/checkout-groupby-layers/bin:/raid/charlesb/mambaforge/condabin:/usr/local/cuda-11.5/bin:/home/nfs/charlesb/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
 LD_LIBRARY_PATH                 : /usr/local/cuda-11.5/lib64:/usr/local/cuda-11.5/lib64
 NUMBAPRO_NVVM                   : 
 NUMBAPRO_LIBDEVICE              : 
 CONDA_PREFIX                    : /raid/charlesb/mambaforge/envs/checkout-groupby-layers
 PYTHON_PATH                     : 
 
 ***conda packages***
 conda is /raid/charlesb/mambaforge/condabin/conda
 /raid/charlesb/mambaforge/condabin/conda
 # packages in environment at /raid/charlesb/mambaforge/envs/checkout-groupby-layers:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                  2_kmp_llvm    conda-forge
 _sysroot_linux-64_curr_repodata_hack 3                   h5bd9786_13    conda-forge
 abseil-cpp                20211102.0           h93e1e8c_3    conda-forge
 aiobotocore               2.4.0              pyhd8ed1ab_0    conda-forge
 aiohttp                   3.8.3            py39hb9d737c_1    conda-forge
 aioitertools              0.11.0             pyhd8ed1ab_0    conda-forge
 aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
 alabaster                 0.7.12                     py_0    conda-forge
 anyio                     3.6.2              pyhd8ed1ab_0    conda-forge
 argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
 argon2-cffi-bindings      21.2.0           py39hb9d737c_3    conda-forge
 arrow-cpp                 9.0.0           py39h2531139_1_cpu    conda-forge
 asttokens                 2.1.0              pyhd8ed1ab_0    conda-forge
 async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
 attrs                     22.1.0             pyh71513ae_1    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sam-translator        1.53.0             pyhd8ed1ab_0    conda-forge
 aws-sdk-cpp               1.8.186              hecaee15_4    conda-forge
 aws-xray-sdk              2.10.0             pyhd8ed1ab_0    conda-forge
 babel                     2.10.3             pyhd8ed1ab_0    conda-forge
 backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
 backports.zoneinfo        0.2.1            py39hf3d152e_7    conda-forge
 bcrypt                    3.2.2            py39hb9d737c_1    conda-forge
 beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
 binutils                  2.39                 hdd6e379_0    conda-forge
 binutils_impl_linux-64    2.39                 h6ceecb4_0    conda-forge
 binutils_linux-64         2.39                h5fc0e48_11    conda-forge
 bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
 bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge
 boto3                     1.24.59            pyhd8ed1ab_0    conda-forge
 botocore                  1.27.59            pyhd8ed1ab_0    conda-forge
 brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.18.1               h7f98852_0    conda-forge
 c-compiler                1.3.0                h7f98852_0    conda-forge
 ca-certificates           2022.9.24            ha878542_0    conda-forge
 cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
 certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
 cffi                      1.15.1           py39he91dace_2    conda-forge
 cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
 cfn-lint                  0.69.1             pyhd8ed1ab_0    conda-forge
 charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
 clang                     11.1.0               ha770c72_1    conda-forge
 clang-11                  11.1.0          default_ha53f305_1    conda-forge
 clang-tools               11.1.0          default_ha53f305_1    conda-forge
 clangxx                   11.1.0          default_ha53f305_1    conda-forge
 click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
 cloudpickle               2.2.0              pyhd8ed1ab_0    conda-forge
 cmake                     3.24.3               h816a3e0_0    conda-forge
 cmake_setuptools          0.1.3                      py_0    rapidsai
 colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
 commonmark                0.9.1                      py_0    conda-forge
 cryptography              38.0.2           py39hd97740a_2    conda-forge
 cubinlinker               0.2.0            py39h11215e4_1    rapidsai
 cuda-python               11.7.1           py39h1eff087_1    conda-forge
 cudatoolkit               11.5.1              h59c8dcf_10    conda-forge
 cudf                      22.12.0a0+205.g543f8db6df.dirty          pypi_0    pypi
 cudnn                     8.4.1.50             hed8a83a_0    conda-forge
 cupy                      11.2.0           py39hc3c280e_0    conda-forge
 cxx-compiler              1.3.0                h4bd325d_0    conda-forge
 cyrus-sasl                2.1.27               h230043b_5    conda-forge
 cython                    0.29.32          py39h5a03fae_1    conda-forge
 cytoolz                   0.12.0           py39hb9d737c_1    conda-forge
 dask                      2022.10.3a221102  py_gc137ac08_3    dask/label/dev
 dask-core                 2022.10.3a221101 py_g65f40ad46_1    dask/label/dev
 dask-cuda                 22.12.00a221102 py39_g3e5a19b_21    rapidsai-nightly
 dask-cudf                 22.12.0a0+205.g543f8db6df.dirty          pypi_0    pypi
 dataclasses               0.8                pyhc8e2a94_3    conda-forge
 debugpy                   1.6.3            py39h5a03fae_1    conda-forge
 decopatch                 1.4.10             pyhd8ed1ab_0    conda-forge
 decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
 defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
 distlib                   0.3.5              pyhd8ed1ab_0    conda-forge
 distributed               2022.10.3a221102  py_gc137ac08_3    dask/label/dev
 distro                    1.6.0              pyhd8ed1ab_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 docker-py                 6.0.0              pyhd8ed1ab_0    conda-forge
 docutils                  0.19             py39hf3d152e_1    conda-forge
 double-conversion         3.2.0                h27087fc_1    conda-forge
 doxygen                   1.8.20               had0d8f1_0    conda-forge
 ecdsa                     0.18.0             pyhd8ed1ab_1    conda-forge
 entrypoints               0.4                pyhd8ed1ab_0    conda-forge
 exceptiongroup            1.0.0              pyhd8ed1ab_0    conda-forge
 execnet                   1.9.0              pyhd8ed1ab_0    conda-forge
 executing                 1.2.0              pyhd8ed1ab_0    conda-forge
 expat                     2.5.0                h27087fc_0    conda-forge
 fastavro                  1.7.0            py39hb9d737c_0    conda-forge
 fastrlock                 0.8              py39h5a03fae_3    conda-forge
 filelock                  3.8.0              pyhd8ed1ab_0    conda-forge
 flask                     2.1.3              pyhd8ed1ab_0    conda-forge
 flask_cors                3.0.10             pyhd3deb0d_0    conda-forge
 flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
 freetype                  2.12.1               hca18f0e_0    conda-forge
 frozenlist                1.3.1            py39hb9d737c_1    conda-forge
 fsspec                    2022.10.0          pyhd8ed1ab_0    conda-forge
 future                    0.18.2             pyhd8ed1ab_6    conda-forge
 gcc                       9.5.0               h1fea6ba_11    conda-forge
 gcc_impl_linux-64         9.5.0               h99780fb_19    conda-forge
 gcc_linux-64              9.5.0               h4258300_11    conda-forge
 gettext                   0.21.1               h27087fc_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 glog                      0.6.0                h6f12383_0    conda-forge
 gmp                       6.2.1                h58526e2_0    conda-forge
 gmpy2                     2.1.2            py39h376b7d2_1    conda-forge
 graphql-core              3.2.3              pyhd8ed1ab_0    conda-forge
 greenlet                  1.1.3.post0      py39h5a03fae_0    conda-forge
 grpc-cpp                  1.46.4               h6fc47f4_3    conda-forge
 gxx                       9.5.0               h1fea6ba_11    conda-forge
 gxx_impl_linux-64         9.5.0               h99780fb_19    conda-forge
 gxx_linux-64              9.5.0               h43f449f_11    conda-forge
 heapdict                  1.0.1                      py_0    conda-forge
 huggingface_hub           0.10.1             pyhd8ed1ab_0    conda-forge
 hypothesis                6.56.4             pyha770c72_0    conda-forge
 identify                  2.5.8              pyhd8ed1ab_0    conda-forge
 idna                      3.4                pyhd8ed1ab_0    conda-forge
 imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
 importlib-metadata        5.0.0              pyha770c72_1    conda-forge
 importlib_metadata        5.0.0                hd8ed1ab_1    conda-forge
 importlib_resources       3.3.1              pyhd8ed1ab_1    conda-forge
 iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
 ipykernel                 6.17.0             pyh210e3f2_0    conda-forge
 ipython                   8.6.0              pyh41d4057_1    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 itsdangerous              2.1.2              pyhd8ed1ab_0    conda-forge
 jedi                      0.18.1             pyhd8ed1ab_2    conda-forge
 jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
 jmespath                  1.0.1              pyhd8ed1ab_0    conda-forge
 joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
 jpeg                      9e                   h166bdaf_2    conda-forge
 jschema-to-python         1.2.3              pyhd8ed1ab_0    conda-forge
 jsondiff                  2.0.0              pyhd8ed1ab_0    conda-forge
 jsonpatch                 1.32               pyhd8ed1ab_0    conda-forge
 jsonpickle                2.2.0              pyhd8ed1ab_0    conda-forge
 jsonpointer               2.0                        py_0    conda-forge
 jsonschema                3.2.0              pyhd8ed1ab_3    conda-forge
 junit-xml                 1.9                pyh9f0ad1d_0    conda-forge
 jupyter-cache             0.5.0              pyhd8ed1ab_0    conda-forge
 jupyter_client            7.3.4              pyhd8ed1ab_0    conda-forge
 jupyter_core              4.11.1           py39hf3d152e_1    conda-forge
 jupyter_server            1.21.0             pyhd8ed1ab_0    conda-forge
 jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
 kernel-headers_linux-64   3.10.0              h4a8ded7_13    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.19.3               h3790be6_0    conda-forge
 lcms2                     2.14                 h6ed2654_0    conda-forge
 ld_impl_linux-64          2.39                 hc81fddc_0    conda-forge
 lerc                      4.0.0                h27087fc_0    conda-forge
 libabseil                 20211102.0      cxx17_h48a1fff_3    conda-forge
 libblas                   3.9.0            16_linux64_mkl    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
 libbrotlidec              1.0.9                h166bdaf_8    conda-forge
 libbrotlienc              1.0.9                h166bdaf_8    conda-forge
 libcblas                  3.9.0            16_linux64_mkl    conda-forge
 libclang-cpp11.1          11.1.0          default_ha53f305_1    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   22.12.00a221102 cuda11_g5ace809af6_207    rapidsai-nightly
 libcurl                   7.86.0               h7bff187_1    conda-forge
 libdeflate                1.14                 h166bdaf_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               h9b69904_4    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-devel_linux-64     9.5.0               h0a57e50_19    conda-forge
 libgcc-ng                 12.2.0              h65d4601_19    conda-forge
 libgcrypt                 1.10.1               h166bdaf_0    conda-forge
 libgfortran-ng            12.2.0              h69a702a_19    conda-forge
 libgfortran5              12.2.0              h337968e_19    conda-forge
 libgomp                   12.2.0              h65d4601_19    conda-forge
 libgoogle-cloud           1.40.2               hefc27d0_0    conda-forge
 libgpg-error              1.45                 hc0c96e0_0    conda-forge
 libgsasl                  1.10.0               h5b4c23d_0    conda-forge
 libiconv                  1.17                 h166bdaf_0    conda-forge
 liblapack                 3.9.0            16_linux64_mkl    conda-forge
 libllvm11                 11.1.0               he0ac6c6_5    conda-forge
 libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libntlm                   1.4               h7f98852_1002    conda-forge
 libpng                    1.6.38               h753d276_0    conda-forge
 libprotobuf               3.20.1               h6239696_4    conda-forge
 librdkafka                1.7.0                hc49e61c_1    conda-forge
 librmm                    22.12.00a221102 cuda11_gc1af31ff_49    rapidsai-nightly
 libsanitizer              9.5.0               h2f262e1_19    conda-forge
 libsodium                 1.0.18               h36c2ea0_1    conda-forge
 libsqlite                 3.39.4               h753d276_0    conda-forge
 libssh2                   1.10.0               haa6b8db_3    conda-forge
 libstdcxx-devel_linux-64  9.5.0               h0a57e50_19    conda-forge
 libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
 libthrift                 0.16.0               h491838f_2    conda-forge
 libtiff                   4.4.0                h55922b4_4    conda-forge
 libutf8proc               2.8.0                h166bdaf_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libuv                     1.44.2               h166bdaf_0    conda-forge
 libwebp-base              1.2.4                h166bdaf_0    conda-forge
 libxcb                    1.13              h7f98852_1004    conda-forge
 libzlib                   1.2.13               h166bdaf_4    conda-forge
 livereload                2.6.3              pyh9f0ad1d_0    conda-forge
 llvm-openmp               14.0.4               he0ac6c6_0    conda-forge
 llvmlite                  0.39.1           py39h7d9a04d_1    conda-forge
 locket                    1.0.0              pyhd8ed1ab_0    conda-forge
 lz4                       4.0.2            py39h029007f_0    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 magma                     2.5.4                hc72dce7_4    conda-forge
 make                      4.3                  hd18ef5c_1    conda-forge
 makefun                   1.15.0             pyhd8ed1ab_0    conda-forge
 markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
 markdown-it-py            2.1.0              pyhd8ed1ab_0    conda-forge
 markupsafe                2.1.1            py39hb9d737c_2    conda-forge
 matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
 mdit-py-plugins           0.3.1              pyhd8ed1ab_0    conda-forge
 mdurl                     0.1.0              pyhd8ed1ab_0    conda-forge
 mimesis                   6.1.1              pyhd8ed1ab_0    conda-forge
 mistune                   2.0.4              pyhd8ed1ab_0    conda-forge
 mkl                       2022.1.0           h84fe81f_915    conda-forge
 moto                      4.0.9              pyhd8ed1ab_0    conda-forge
 mpc                       1.2.1                h9f54685_0    conda-forge
 mpfr                      4.1.0                h9202a9a_1    conda-forge
 msgpack-python            1.0.4            py39hf939315_1    conda-forge
 multidict                 6.0.2            py39hb9d737c_2    conda-forge
 myst-nb                   0.17.1             pyhd8ed1ab_0    conda-forge
 myst-parser               0.18.1             pyhd8ed1ab_0    conda-forge
 nbclassic                 0.4.5              pyhd8ed1ab_0    conda-forge
 nbclient                  0.5.13             pyhd8ed1ab_0    conda-forge
 nbconvert                 7.2.3              pyhd8ed1ab_0    conda-forge
 nbconvert-core            7.2.3              pyhd8ed1ab_0    conda-forge
 nbconvert-pandoc          7.2.3              pyhd8ed1ab_0    conda-forge
 nbformat                  5.7.0              pyhd8ed1ab_0    conda-forge
 nbsphinx                  0.8.9              pyhd8ed1ab_0    conda-forge
 nccl                      2.14.3.1             h0800d71_0    conda-forge
 ncurses                   6.3                  h27087fc_1    conda-forge
 nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
 networkx                  2.8.8              pyhd8ed1ab_0    conda-forge
 ninja                     1.11.0               h924138e_0    conda-forge
 nodeenv                   1.7.0              pyhd8ed1ab_0    conda-forge
 notebook                  6.5.1              pyha770c72_0    conda-forge
 notebook-shim             0.2.0              pyhd8ed1ab_0    conda-forge
 numba                     0.56.3           py39h61ddf18_0    conda-forge
 numpy                     1.23.4           py39h3d75532_1    conda-forge
 numpydoc                  1.5.0              pyhd8ed1ab_0    conda-forge
 nvcc_linux-64             11.5                h44f499b_21    conda-forge
 nvtx                      0.2.3            py39hb9d737c_2    conda-forge
 openapi-schema-validator  0.2.3              pyhd8ed1ab_0    conda-forge
 openapi-spec-validator    0.4.0              pyhd8ed1ab_1    conda-forge
 openjpeg                  2.5.0                h7d73246_1    conda-forge
 openssl                   1.1.1s               h166bdaf_0    conda-forge
 orc                       1.7.5                h6c59b99_0    conda-forge
 packaging                 21.3               pyhd8ed1ab_0    conda-forge
 pandas                    1.5.1            py39h4661b88_1    conda-forge
 pandoc                    1.19.2                        0    conda-forge
 pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
 paramiko                  2.11.0             pyhd8ed1ab_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 parso                     0.8.3              pyhd8ed1ab_0    conda-forge
 partd                     1.3.0              pyhd8ed1ab_0    conda-forge
 pbr                       5.11.0             pyhd8ed1ab_0    conda-forge
 pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
 pickleshare               0.7.5                   py_1003    conda-forge
 pillow                    9.2.0            py39hf3a2cdf_3    conda-forge
 pip                       22.3               pyhd8ed1ab_0    conda-forge
 platformdirs              2.5.2              pyhd8ed1ab_1    conda-forge
 pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
 pre-commit                2.20.0           py39hf3d152e_1    conda-forge
 prometheus_client         0.15.0             pyhd8ed1ab_0    conda-forge
 prompt-toolkit            3.0.31             pyha770c72_0    conda-forge
 protobuf                  3.20.1           py39h5a03fae_0    conda-forge
 psutil                    5.9.3            py39hb9d737c_1    conda-forge
 pthread-stubs             0.4               h36c2ea0_1001    conda-forge
 ptxcompiler               0.7.0            py39h1eff087_2    conda-forge
 ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
 pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
 py-cpuinfo                9.0.0              pyhd8ed1ab_0    conda-forge
 pyarrow                   9.0.0           py39h58137f1_1_cpu    conda-forge
 pyasn1                    0.4.8                      py_0    conda-forge
 pycparser                 2.21               pyhd8ed1ab_0    conda-forge
 pydata-sphinx-theme       0.11.0             pyhd8ed1ab_1    conda-forge
 pygments                  2.13.0             pyhd8ed1ab_0    conda-forge
 pynacl                    1.5.0            py39hb9d737c_2    conda-forge
 pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
 pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
 pyorc                     0.7.0            py39h3720fd5_0    conda-forge
 pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
 pyrsistent                0.19.1           py39hb9d737c_0    conda-forge
 pysocks                   1.7.1              pyha2e5f31_6    conda-forge
 pytest                    7.2.0              pyhd8ed1ab_2    conda-forge
 pytest-benchmark          4.0.0              pyhd8ed1ab_0    conda-forge
 pytest-cases              3.6.13             pyhd8ed1ab_0    conda-forge
 pytest-xdist              3.0.2              pyhd8ed1ab_0    conda-forge
 python                    3.9.13          h9a8a25e_0_cpython    conda-forge
 python-confluent-kafka    1.7.0            py39h3811e60_2    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python-fastjsonschema     2.16.2             pyhd8ed1ab_0    conda-forge
 python-jose               3.3.0              pyh6c4a22f_1    conda-forge
 python-snappy             0.6.0            py39he8e2bb5_2    conda-forge
 python_abi                3.9                      2_cp39    conda-forge
 pytorch                   1.11.0          cuda112py39ha0cca9b_202    conda-forge
 pytz                      2022.6             pyhd8ed1ab_0    conda-forge
 pywin32-on-windows        0.1.0              pyh1179c8e_3    conda-forge
 pyyaml                    6.0              py39hb9d737c_5    conda-forge
 pyzmq                     24.0.1           py39headdf64_1    conda-forge
 rapidjson                 1.1.0             he1b5a44_1002    conda-forge
 re2                       2022.06.01           h27087fc_0    conda-forge
 readline                  8.1.2                h0f457ee_0    conda-forge
 recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
 regex                     2022.10.31       py39hb9d737c_0    conda-forge
 requests                  2.28.1             pyhd8ed1ab_1    conda-forge
 responses                 0.21.0             pyhd8ed1ab_0    conda-forge
 rhash                     1.4.3                h166bdaf_0    conda-forge
 rmm                       22.12.00a221102 cuda11_py39_gc1af31ff_49    rapidsai-nightly
 rsa                       4.9                pyhd8ed1ab_0    conda-forge
 s2n                       1.0.10               h9b69904_0    conda-forge
 s3fs                      2022.10.0          pyhd8ed1ab_0    conda-forge
 s3transfer                0.6.0              pyhd8ed1ab_0    conda-forge
 sacremoses                0.0.53             pyhd8ed1ab_0    conda-forge
 sarif-om                  1.0.4              pyhd8ed1ab_0    conda-forge
 scikit-build              0.16.1             pyhb871ab6_0    conda-forge
 scipy                     1.9.3            py39hddc5342_1    conda-forge
 sed                       4.8                  he412f7d_0    conda-forge
 send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
 setuptools                65.5.0             pyhd8ed1ab_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 sleef                     3.5.1                h9b69904_2    conda-forge
 snappy                    1.1.9                hbd366e4_1    conda-forge
 sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
 snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
 sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
 soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
 spdlog                    1.8.5                h4bd325d_1    conda-forge
 sphinx                    5.3.0              pyhd8ed1ab_0    conda-forge
 sphinx-autobuild          2021.3.14          pyhd8ed1ab_0    conda-forge
 sphinx-copybutton         0.5.0              pyhd8ed1ab_0    conda-forge
 sphinx-markdown-tables    0.0.17             pyh6c4a22f_0    conda-forge
 sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
 sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
 sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
 sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
 sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_2    conda-forge
 sphinxcontrib-websupport  1.2.4              pyhd8ed1ab_1    conda-forge
 sqlalchemy                1.4.42           py39hb9d737c_1    conda-forge
 sqlite                    3.39.4               h4ff8645_0    conda-forge
 sshpubkeys                3.3.1              pyhd8ed1ab_0    conda-forge
 stack_data                0.6.0              pyhd8ed1ab_0    conda-forge
 streamz                   0.6.4              pyh6c4a22f_0    conda-forge
 sysroot_linux-64          2.17                h4a8ded7_13    conda-forge
 tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
 tbb                       2021.6.0             h924138e_1    conda-forge
 tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
 terminado                 0.17.0             pyh41d4057_0    conda-forge
 tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 tokenizers                0.10.3           py39hd6d55de_1    conda-forge
 toml                      0.10.2             pyhd8ed1ab_0    conda-forge
 tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
 toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
 tornado                   6.1              py39hb9d737c_3    conda-forge
 tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
 traitlets                 5.5.0              pyhd8ed1ab_0    conda-forge
 transformers              4.10.3             pyhd8ed1ab_0    conda-forge
 typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
 typing_extensions         4.4.0              pyha770c72_0    conda-forge
 tzdata                    2022f                h191b570_0    conda-forge
 ukkonen                   1.0.1            py39hf939315_3    conda-forge
 urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
 virtualenv                20.16.5          py39hf3d152e_1    conda-forge
 wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
 webencodings              0.5.1                      py_1    conda-forge
 websocket-client          1.4.1              pyhd8ed1ab_0    conda-forge
 werkzeug                  2.1.2              pyhd8ed1ab_1    conda-forge
 wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
 wrapt                     1.14.1           py39hb9d737c_1    conda-forge
 xmltodict                 0.13.0             pyhd8ed1ab_0    conda-forge
 xorg-libxau               1.0.9                h7f98852_0    conda-forge
 xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
 xz                        5.2.6                h166bdaf_0    conda-forge
 yaml                      0.2.5                h7f98852_2    conda-forge
 yarl                      1.8.1            py39hb9d737c_0    conda-forge
 zeromq                    4.3.4                h9c3ff4c_1    conda-forge
 zict                      2.2.0              pyhd8ed1ab_0    conda-forge
 zipp                      3.10.0             pyhd8ed1ab_0    conda-forge
 zlib                      1.2.13               h166bdaf_4    conda-forge
 zstd                      1.5.2                h6239696_4    conda-forge

Additional context
Ran into this issue while adding null testing to dask-cudf's groupby tests in #10853.

@wence-
Copy link
Contributor

wence- commented May 17, 2023

See discussion in #13349

wence- added a commit to wence-/cudf that referenced this issue May 19, 2023
Scan-based groupbys are massaged back into pandas (original dataframe)
order by a post-processing step. Previously, this did the wrong thing
if the grouping key contained null (or nan) keys. In this situation
dropna=True will cause libcudf to produce an output table that is
smaller than the input frame. To mimic pandas we need to expand this
output to the original frame size, inserting nulls in the missing rows
and reordering correctly.

Furthermore, the previous reordering code had an out-of-bounds memory
access when there were null keys, since we were asking to group and
column of the same length as a result, but the grouping object expects
columns of length of the original input (which is larger with
dropna=True and null keys).

To fix these issues, compute the reordering on a column of appropriate
size, and, if dropna is true and any of the key columns have nulls, go
down a more expensive reordering path that inserts nulls correctly by
reindexing the result.

- Closes rapidsai#13349
- Closes rapidsai#12055
@rapids-bot rapids-bot bot closed this as completed in f1e8863 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants