Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:Can't calculate quantiles from Int64Dtype Series when results are floats #42936

Merged
merged 6 commits into from
Aug 10, 2021

Conversation

debnathshoham
Copy link
Member

@debnathshoham debnathshoham commented Aug 8, 2021

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 8, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3.2 milestone Aug 9, 2021
@simonjayhawkins
Copy link
Member

needs a release note. fixing a regression so target 1.3.2

@jreback jreback merged commit 14cf6e2 into pandas-dev:master Aug 10, 2021
@jreback
Copy link
Contributor

jreback commented Aug 10, 2021

thanks @debnathshoham

@jreback
Copy link
Contributor

jreback commented Aug 10, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Aug 10, 2021

Something went wrong ... Please have a look at my logs.

@debnathshoham debnathshoham deleted the gh42626 branch August 10, 2021 20:07
jreback pushed a commit that referenced this pull request Aug 10, 2021
…ies when results are floats (#42974)

Co-authored-by: Shoham Debnath <debnathshoham@gmail.com>
@simonjayhawkins
Copy link
Member

In summary and for the record this changed behavior from 1.2.5.

1.2.5 always returned an object array of floats when q is list-like. If q is a scalar the return type was always a Python float (as documented https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.quantile.html)

>>> pd.__version__
'1.2.5'
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0, 0.75])
>>> result
0.00    1.0
0.75    2.5
dtype: object
>>> 
>>> type(result[0])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0])
>>> result
0.0    1.0
dtype: object
>>> 
>>> type(result[0])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0.75])
>>> result
0.75    2.5
dtype: object
>>> 
>>> type(result[0.75])
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0.75)
>>> result
2.5
>>> 
>>> type(result)
<class 'float'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0)
>>> result
1.0
>>> 
>>> type(result)
<class 'float'>

1.3.2 will now return nullable integer (Int64) or numpy float (float64) depending on the values in the result when q is list-like. If q is a scalar the return type is now a numpy float or a numpy int. (inconsistent with documentation)

>>> pd.__version__
'1.4.0.dev0+415.g99cf794ae2'
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0, 0.75])
>>> result
0.00    1.0
0.75    2.5
dtype: float64
>>> 
>>> type(result[0])
<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0])
>>> result
pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantil0.0    1
dtype: Int64
>>> 
>>> type(result[0])
<class 'numpy.int64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile([0.75])
>>> result
0.75    2.5
dtype: float64
>>> 
>>> type(result[0.75])
Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0.75)
>>> result
2.5
>>> 
>>> type(result)
<class 'numpy.float64'>
>>> 
>>> result = pd.Series([1, 2, 3, pd.NA], dtype="Int64").quantile(0)
>>> result
1
>>> 
>>> type(result)
<class 'numpy.int64'>

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Cannot calculate quantiles from Int64Dtype Series when results are floats
4 participants