BUG: Series.rank(pct=True).max() != 1 for a large series of floats #18271
Code Sample, a copy-pastable example if possible
import numpy as np import pandas as pd rs = np.random.RandomState(seed=0, ) len_df = 20000000 df = pd.DataFrame(data=rs.rand(len_df), columns=['abc'], ).sort_values('abc') df['Rank'] = df['abc'].rank() df['Rank_Pct']= df['abc'].rank(pct=True) df['Rank_Pct_Manual'] = df['Rank']/len_df df.describe()
I have a set of 20 million floats, and I am trying to follow this StackOverflow example. This code discusses calculating the percentile ranking of a column using either the
I noticed that the former values have a maximum that is not 1, while the latter have the expected maximum of 1.
I have tried this with the latest (0.21.0) version of
It seems to be related to the number of rows being greater than 2^23 – you can see this by comparing the output when
I believe the responsible code is
I'm working on a PR now.
I would expect the values of