Rank 'na_option="bottom"' Usage Clarification #19499

WillAyd · 2018-02-01T23:51:43Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})

# Works as documented - missing values are highest rank
In []: df.rank(na_option='top')
Out []: 
   val
0  4.0
1  1.5
2  4.0
3  7.0
4  4.0
5  1.5
6  6.0

# Technically works - missing values are lowest rank
In []: df.rank(na_option='bottom')
Out []: 
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0 

# However, we could say anything besides 'foo'
In []: df.rank(na_option='foo')
Out []: 
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0

Problem description

For the sake of being explicit it would be better to raise for an unknown na_option, or alternately update the documentation to reflect that any value outside of 'keep' and 'top' would trigger this behavior

INSTALLED VERSIONS

commit: d3f7d2a
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+169.gd3f7d2a66.dirty
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.0b1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

peterpanmj · 2018-07-23T09:52:22Z

I think raising a value error is better so that users will know when a argument is passed by mistake, e.g
df.rank(na_option=True)

WillAyd · 2018-07-23T15:39:33Z

@peterpanmj agreed on the ValueError - PRs are welcome if interested!

raguiar2 · 2018-07-24T06:00:00Z

I'm trying to create a PR for this, but I keep getting the error
remote: Permission to pandas-dev/pandas.git denied to raguiar2.
Any advice?

WillAyd · 2018-07-24T06:02:23Z

I'm assuming from the message you are trying to push directly to the pandas repo instead of to your own. Assuming you have your own fork established as origin you could just do:

git push origin your-branch-name

If in doubt be sure to check out the forking section of the contributing guide as well:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#forking

…andas-dev#19499)

peterpanmj · 2018-07-30T01:39:53Z

It is not fully solved.

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)

In [2]: df
Out[2]:
   val  key
0  2.0  foo
1  NaN  foo
2  2.0  foo
3  8.0  foo
4  2.0  foo
5  NaN  foo
6  6.0  foo

In [5]: df.groupby("key").rank(na_option="not bottom")
Out[5]:
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0

The above is the behavior from current master ( pandas: 0.24.0.dev0+377.gd30c4a069)
It is always tricky when it comes to method implemented separately for groupby and none-groupby.
I am working on a PR. Reopen it or should I submit another issue ?

…andas-dev#19499)

WillAyd · 2018-07-30T02:22:56Z

Thanks for the callout. I didn't realize in the PR that closed this but we may have only fixed this when dealing with object dtypes.

@peterpanmj better to open a separate issue at this point. cc @raguiar2

…andas-dev#19499)

WillAyd added this to the Contributions Welcome milestone Jul 23, 2018

WillAyd added Groupby Effort Low good first issue labels Jul 23, 2018

raguiar2 mentioned this issue Jul 24, 2018

Raised value error on incorrect na_option #22037

Merged

4 tasks

TomAugspurger closed this as completed in #22037 Jul 25, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Jul 29, 2018

BUG: Better handling of invalid na_option argument for groupby.rank(p…

7a985f1

…andas-dev#19499)

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Jul 30, 2018

BUG: Better handling of invalid na_option argument for groupby.rank(p…

2bd4145

…andas-dev#19499)

peterpanmj mentioned this issue Jul 30, 2018

groupby.rank 'na_option="bottom"' Usage Clarification #22124

Closed

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Jul 30, 2018

BUG: Better handling of invalid na_option argument for groupby.rank(p…

8029fe0

…andas-dev#19499)

peterpanmj mentioned this issue Jul 30, 2018

BUG: Better handling of invalid na_option argument for groupby.rank #22125

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank 'na_option="bottom"' Usage Clarification #19499

Rank 'na_option="bottom"' Usage Clarification #19499

WillAyd commented Feb 1, 2018

INSTALLED VERSIONS

peterpanmj commented Jul 23, 2018

WillAyd commented Jul 23, 2018

raguiar2 commented Jul 24, 2018

WillAyd commented Jul 24, 2018

peterpanmj commented Jul 30, 2018

WillAyd commented Jul 30, 2018

Rank 'na_option="bottom"' Usage Clarification #19499

Rank 'na_option="bottom"' Usage Clarification #19499

Comments

WillAyd commented Feb 1, 2018

Code Sample, a copy-pastable example if possible

Problem description

INSTALLED VERSIONS

peterpanmj commented Jul 23, 2018

WillAyd commented Jul 23, 2018

raguiar2 commented Jul 24, 2018

WillAyd commented Jul 24, 2018

peterpanmj commented Jul 30, 2018

WillAyd commented Jul 30, 2018