ZeroDivisionError when groupby rank with method="dense" and pct=True #23666

njryo · 2018-11-13T14:30:34Z

When I tried to use groupby rank function with method="dense", pct=True options, I encountered the ZeroDivisionError.

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2],
                   "B": [1, 1, 1, 1, 2, 2],
                   "C": [1, 2, 1, 1, 1, 2]})
df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)

error:

Traceback (most recent call last):
  File "c:/Users/<user_name>/Documents/test.py", line 6, in <module>
    df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 1906, in rank
    na_option=na_option, pct=pct, axis=axis)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 1025, in _cython_transform
    **kwargs)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2630, in transform
    return self._cython_operation('transform', values, how, axis, **kwargs)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2590, in _cython_operation
    **kwargs)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2664, in _transform
    transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2479, in wrapper
    return f(afunc, *args, **kwargs)
  File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2431, in <lambda>
    kwargs.get('na_option', 'keep')
  File "pandas\_libs\groupby_helper.pxi", line 1292, in pandas._libs.groupby.group_rank_int64
ZeroDivisionError: float division

Problem description

I encountered ZeroDivisionError when I tried to use the groupby rank function.

I can't find out exactly what a problem is. But when I drop either method="dense" or pct=True option, the above code works.

If some elements in the above DataFrame are changed, this error disappear. For example, the following code gives the expected output.

df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2],
                   "B": [1, 1, 1, 1, 2, 2],
                   "C": [1, 2, 1, 0, 1, 2]}) # a little change in column C
df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)

output:

0    0.5
1    1.0
2    0.5
3    1.0
4    0.5
5    1.0
Name: C, dtype: float64

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.5
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

Koustav-Samaddar · 2018-11-22T22:19:46Z

I think this is caused due to groups of size 1. By removing (A, B) = (2, 1) group, the error goes away.

@WillAyd Do you mind if I tackle this? It's my first time contributing to pandas but I think I have a rough idea on how to fix the problem.

WillAyd · 2018-11-22T22:26:37Z

Go for it!

Koustav-Samaddar · 2018-11-22T23:36:13Z

@WillAyd Sorry for bothering you, but I had a quick question related to bug fix contributions.

In my rough testing of the fix, I found another bug in groupby rank (checked to make sure that the bug also existed without my fix). Should I fix the bug in another commit in the same PR or should I create a new bug tracker and go from there?

The new bug isn't directly related to the current bug but arises from nearby code.

WillAyd · 2018-11-23T00:55:10Z

Typically best to create a separate issue and PR for tracking and review purposes

WillAyd added Bug Groupby labels Nov 13, 2018

Koustav-Samaddar mentioned this issue Nov 23, 2018

ZeroDivisionError when groupby rank with method="dense" and pct=True #23864

Merged

4 tasks

jreback added this to the 0.24.0 milestone Dec 3, 2018

WillAyd closed this as completed in #23864 Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError when groupby rank with method="dense" and pct=True #23666

ZeroDivisionError when groupby rank with method="dense" and pct=True #23666

njryo commented Nov 13, 2018 •

edited

INSTALLED VERSIONS

Koustav-Samaddar commented Nov 22, 2018 •

edited

WillAyd commented Nov 22, 2018

Koustav-Samaddar commented Nov 22, 2018 •

edited

WillAyd commented Nov 23, 2018

ZeroDivisionError when groupby rank with method="dense" and pct=True #23666

ZeroDivisionError when groupby rank with method="dense" and pct=True #23666

Comments

njryo commented Nov 13, 2018 • edited

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

Koustav-Samaddar commented Nov 22, 2018 • edited

WillAyd commented Nov 22, 2018

Koustav-Samaddar commented Nov 22, 2018 • edited

WillAyd commented Nov 23, 2018

njryo commented Nov 13, 2018 •

edited

Output of `pd.show_versions()`

Koustav-Samaddar commented Nov 22, 2018 •

edited

Koustav-Samaddar commented Nov 22, 2018 •

edited