Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: replace pandas.compat.scipy.scoreatpercentile with numpy.percentile #6810

Merged
merged 2 commits into from
Apr 16, 2014

Conversation

gdraps
Copy link
Contributor

@gdraps gdraps commented Apr 5, 2014

PR to fix #5824.

  • Replaces compat.scipy.scoreatpercentile with numpy.percentile.
  • Sets axis=0 in a few tests, because axis=None by default on numpy.percentile, which returns a scalar (i.e., operates on a flattened version of the array).
  • Fixes a test that fails after the switch due to differences in how fractions are computed. (Created a np.timedelta64 directly since to_timedelta doesn't support enough precision.)

Let me know if you'd like to see any changes..

@@ -229,7 +229,7 @@ def test_timedelta_ops(self):

result = td.quantile(.1)
# This properly returned a scalar.
expected = to_timedelta('00:00:02.6')
expected = np.timedelta64(2599999999,'ns')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rounding issue yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Julian looked into the difference in rounding methods between pandas.compat.scipy.scoreatpercentile and numpy in a comment on #5824.. and also offered to update numpy. do you think this hard-coded expect should be removed and expect whatever numpy.percentile returns, in case they do change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think this is ok to change it, sort of go with np.percentile results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do a fuzzy comparison instead of equality? (I guess as its an integers almost_equal does not work)
I may still update numpy as this method saves a few precious cycles for small percentiles

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliantaylor this pr replace our original method so using np.percentile now fully
and ok with all numpy (incl numpy master)

if u do make a change in numpy master then we can change the test (to more of a allclose one)

@jreback jreback added this to the 0.14.0 milestone Apr 5, 2014
@jreback jreback added the Algos label Apr 5, 2014
@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

pls add a release note referinc the issue (do it in API changes, even though just a compat change).

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

side note, I am trying to figure out why @wesm included this in the first place, as I think np.percentile has been around a while? any idea

@gdraps
Copy link
Contributor Author

gdraps commented Apr 15, 2014

@jreback, sorry for the delay, but I've added a second commit to this PR with:

  • added to release notes

  • Series.quantile on datetime[ns] series changed to return a Timestamp object (with an explicit test)

  • added a test for Series.quantile on timedelta64 series, which didn't require any
    special handling, though the test is skipped at numpy 1.6.x, because
    numpy.percentile throws the following exception with timedelta64 objects
    pre-1.7:

        TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'
    
  • removed timedelta64 handling (introduced by BUG: Series.quantile raising on an object dtype (GH6555) #6558) in Series.quantile,
    because not np.isscalar is always false and, in my testing,
    numpy.percentile properly returns timedelta64 objects on numpy 1.7 & up.

No idea why np.percentile wasn't used originally, but will note that scipy.stats.scoreatpercentile existed first. Perhaps, when the scipy dependency was removed from pandas, the difference in rounding made copying scoreatpercentile the easiest way forward. Will also note that scipy has deprecated scoreatpercentile in favor of np.percentile.

if com.is_datetime64_dtype(self):
from pandas.tseries.tools import to_datetime
values = _values_from_object(self).view('i8')
result = to_datetime(_quantile(values, q * 100))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just make this a call to Timestamp rather than use to_datetime

@jreback
Copy link
Contributor

jreback commented Apr 15, 2014

@gdraps looks good

just a couple of minor comments

timedeltas are thoroughly broken under numpy < 1.7
so I gave up on trying to make them work a while ago

@gdraps
Copy link
Contributor Author

gdraps commented Apr 16, 2014

@jreback, updated the last commit of this PR based on your feedback -- let me know if you see anything else

jreback added a commit that referenced this pull request Apr 16, 2014
CLN: replace pandas.compat.scipy.scoreatpercentile with numpy.percentile
@jreback jreback merged commit f30278e into pandas-dev:master Apr 16, 2014
@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

@gdraps thanks for the work on this gr8!

@gdraps gdraps deleted the replace-scoreatpercentile branch April 16, 2014 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numpy 1.8 DeprecationWarning in compat/scipy.py
3 participants