Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby.rank 'na_option="bottom"' Usage Clarification #22124

Closed
peterpanmj opened this issue Jul 30, 2018 · 0 comments

Comments

Projects
None yet
3 participants
@peterpanmj
Copy link
Contributor

commented Jul 30, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)

In [2]: df
Out[2]:
   val  key
0  2.0  foo
1  NaN  foo
2  2.0  foo
3  8.0  foo
4  2.0  foo
5  NaN  foo
6  6.0  foo

In [5]: df.groupby("key").rank(na_option="not bottom")
Out[5]:
   val
0  2.0
1  6.5
2  2.0
3  5.0
4  2.0
5  6.5
6  4.0

Problem description

When an invalid value is passed to groupby.rank for na_option argument. It didn't raise a ValueError as expected. The same behavior will raise a ValueError("na_option must be one of 'keep', 'top', or 'bottom'") in DataFrame.rank or Series.rank
The expected output is derived from #19499

Expected Output

In [1]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]})
   ...: df["key"] = pd.Series(["foo"]*7)
   ...:

In [2]: df.rank(na_option="no bottom")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-ec2afc565a7d> in <module>()
----> 1 df.rank(na_option="no bottom")

C:\Users\Public\pandas-peter\pandas\core\generic.py in rank(self, axis, method,
numeric_only, na_option, ascending, pct)
   7523         if na_option not in {'keep', 'top', 'bottom'}:
   7524             msg = "na_option must be one of 'keep', 'top', or 'bottom'"
-> 7525             raise ValueError(msg)
   7526
   7527         def ranker(data):

ValueError: na_option must be one of 'keep', 'top', or 'bottom'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d30c4a0
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.24.0.dev0+377.gd30c4a069
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.28.4
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Jul 30, 2018

@jreback jreback added this to the 0.24.0 milestone Jul 30, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Jul 31, 2018

@WillAyd WillAyd added the Groupby label Jul 31, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Aug 1, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Aug 1, 2018

jreback added a commit that referenced this issue Aug 1, 2018

minggli added a commit to minggli/pandas that referenced this issue Aug 5, 2018

merge master
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.