Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Use total_tie_count to normalize dense ranking #18297

Closed
wants to merge 1 commit into from
Closed

BUG: Use total_tie_count to normalize dense ranking #18297

wants to merge 1 commit into from

Conversation

proinsias
Copy link
Contributor

As reported in #18296, for a Series with repeated values, Series.rank(pct=True, method='dense').max() may not be <=1 as expected.

This is due to the division of the ranks by the total number of elements in the Series, instead of the maximum rank assigned. Here we update the calculation.

As reported in #18296, for a `Series` with repeated values,
`Series.rank(pct=True, method='dense').max()` may not be `<=1` as
expected.

This is due to the division of the ranks by the total number of
elements in the `Series`, instead of the maximum rank assigned. Here
we update the calculation.
@proinsias
Copy link
Contributor Author

I've confirmed that the updated code gives the expected result from the example in #18296.

@codecov
Copy link

codecov bot commented Nov 15, 2017

Codecov Report

Merging #18297 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18297      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.02%     
==========================================
  Files         164      164              
  Lines       49880    49880              
==========================================
- Hits        45592    45583       -9     
- Misses       4288     4297       +9
Flag Coverage Δ
#multiple 89.19% <ø> (ø) ⬆️
#single 39.42% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 148ed63...b7a81a9. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 15, 2017

how about some tests

@jreback
Copy link
Contributor

jreback commented Nov 15, 2017

see #15630 (and linked PR soln but has comments).

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 15, 2017
@@ -117,7 +117,7 @@ Reshaping
Numeric
^^^^^^^

-
- Fixed incorrect maximum :func:`Series.rank` percentile when using the `dense` method with repeated values (:issue:`18296`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.23

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

closing as stale. ping if you want to update.

@jreback jreback closed this Feb 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.rank(pct=True, method='dense').max() != 1 for repeated values
2 participants