Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby.rank 'na_option="bottom"' Usage Clarification #22124

Closed
peterpanmj opened this issue Jul 30, 2018 · 0 comments · Fixed by #22125
Closed

groupby.rank 'na_option="bottom"' Usage Clarification #22124

peterpanmj opened this issue Jul 30, 2018 · 0 comments · Fixed by #22125
Labels
Error Reporting Incorrect or improved errors from pandas Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@peterpanmj
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)

In [2]: df
Out[2]:
   val  key
0  2.0  foo
1  NaN  foo
2  2.0  foo
3  8.0  foo
4  2.0  foo
5  NaN  foo
6  6.0  foo

In [5]: df.groupby("key").rank(na_option="not bottom")
Out[5]:
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0

Problem description

When an invalid value is passed to groupby.rank for na_option argument. It didn't raise a ValueError as expected. The same behavior will raise a ValueError("na_option must be one of 'keep', 'top', or 'bottom'") in DataFrame.rank or Series.rank
The expected output is derived from #19499

Expected Output

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)
   ...:

In [2]: df.rank(na_option="no bottom")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-ec2afc565a7d> in <module>()
----> 1 df.rank(na_option="no bottom")

C:\Users\Public\pandas-peter\pandas\core\generic.py in rank(self, axis, method,
numeric_only, na_option, ascending, pct)
   7523         if na_option not in {'keep', 'top', 'bottom'}:
   7524             msg = "na_option must be one of 'keep', 'top', or 'bottom'"
-> 7525             raise ValueError(msg)
   7526
   7527         def ranker(data):

ValueError: na_option must be one of 'keep', 'top', or 'bottom'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d30c4a0
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.24.0.dev0+377.gd30c4a069
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.28.4
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas labels Jul 30, 2018
@jreback jreback added this to the 0.24.0 milestone Jul 30, 2018
minggli added a commit to minggli/pandas that referenced this issue Aug 5, 2018
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants