Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rank 'na_option="bottom"' Usage Clarification #19499

Closed
WillAyd opened this issue Feb 1, 2018 · 6 comments · Fixed by #22037
Closed

Rank 'na_option="bottom"' Usage Clarification #19499

WillAyd opened this issue Feb 1, 2018 · 6 comments · Fixed by #22037

Comments

@WillAyd
Copy link
Member

WillAyd commented Feb 1, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})

# Works as documented - missing values are highest rank
In []: df.rank(na_option='top')
Out []: 
   val
0  4.0
1  1.5
2  4.0
3  7.0
4  4.0
5  1.5
6  6.0

# Technically works - missing values are lowest rank
In []: df.rank(na_option='bottom')
Out []: 
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0 

# However, we could say anything besides 'foo'
In []: df.rank(na_option='foo')
Out []: 
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0 

Problem description

For the sake of being explicit it would be better to raise for an unknown na_option, or alternately update the documentation to reflect that any value outside of 'keep' and 'top' would trigger this behavior

INSTALLED VERSIONS

commit: d3f7d2a
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+169.gd3f7d2a66.dirty
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.0b1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

@peterpanmj
Copy link
Contributor

I think raising a value error is better so that users will know when a argument is passed by mistake, e.g
df.rank(na_option=True)

@WillAyd
Copy link
Member Author

WillAyd commented Jul 23, 2018

@peterpanmj agreed on the ValueError - PRs are welcome if interested!

@WillAyd WillAyd added this to the Contributions Welcome milestone Jul 23, 2018
@raguiar2
Copy link
Contributor

I'm trying to create a PR for this, but I keep getting the error
remote: Permission to pandas-dev/pandas.git denied to raguiar2.
Any advice?

@WillAyd
Copy link
Member Author

WillAyd commented Jul 24, 2018

I'm assuming from the message you are trying to push directly to the pandas repo instead of to your own. Assuming you have your own fork established as origin you could just do:

git push origin your-branch-name

If in doubt be sure to check out the forking section of the contributing guide as well:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#forking

@peterpanmj
Copy link
Contributor

It is not fully solved.

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)

In [2]: df
Out[2]:
   val  key
0  2.0  foo
1  NaN  foo
2  2.0  foo
3  8.0  foo
4  2.0  foo
5  NaN  foo
6  6.0  foo

In [5]: df.groupby("key").rank(na_option="not bottom")
Out[5]:
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0

The above is the behavior from current master ( pandas: 0.24.0.dev0+377.gd30c4a069)
It is always tricky when it comes to method implemented separately for groupby and none-groupby.
I am working on a PR. Reopen it or should I submit another issue ?

@WillAyd
Copy link
Member Author

WillAyd commented Jul 30, 2018

Thanks for the callout. I didn't realize in the PR that closed this but we may have only fixed this when dealing with object dtypes.

@peterpanmj better to open a separate issue at this point. cc @raguiar2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants