ENH: make Series.ptp() handle missing values #11163

Closed
ajcr opened this Issue Sep 21, 2015 · 4 comments

Comments

Projects
None yet
3 participants
Contributor

ajcr commented Sep 21, 2015

Currently (in master), Series.ptp() is just implemented using np.ptp() and so the method will return nan for any Series that has one or more missing values:

>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan

It is simple to write s.max() - s.min() instead, but the ptp() result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp() method to ignore NaN.

If there is any agreement as to whether ptp() should be changed, I would like to work on a pull request!


Extending the idea, it might be useful to have both DataFrame.ptp() and groupby.ptp() methods.

For this example DataFrame...

df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
                   'b': [3, 11, 72, 46, 32],
                   'c': [1.2, 6.7, 13.9, np.nan, -7.7],
                   'd': ['v', 'w', 'x', 'y', 'z']})

...I would expect the following behaviour:

>>> df.ptp()
a      1
b     69
c   12.7
dtype: float64

>>> df.ptp(axis=1)
0     2.0
1     9.0
2    70.0
3    45.0
4    39.7
dtype: float64

>>> df.groupby('a').ptp()
    b    c
a         
1  43  8.9
2  61  7.2

Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.

Contributor

jreback commented Sep 21, 2015

absolutely. this should do .align, then .max()-.min(). Its here prob just as a convience.

want to do a pull-requests?

jreback added this to the Next Major Release milestone Sep 21, 2015

Contributor

ajcr commented Sep 21, 2015

Sure, I can work on this. It looks pretty straightforward, although I guess non-numeric columns will have to skipped over as 'str' - 'str' will raise an error, for example.

Certainly OK to fix the NaN issue in Series.ptp! (we could consider it a bug)

But, I am a bit more hesitant on the second part, adding it to DataFrame. Some reasons: 1) we already have many methods, and personally I don't think ptp has enough added value to justify addition, 2) it is really easy to do this yourself and 3) I also don't find ptp a very clear name.

Contributor

ajcr commented Sep 22, 2015

I agree ptp is very easily implemented by the user if they need ever it, so maybe it's not worth adding it as a new method to DataFrame and groupby for now.

In that case, I'll just change Series.ptp() so that it's written as max() - min() and so is NaN-aware.

@jreback jreback modified the milestone: 0.17.1, Next Major Release Oct 5, 2015

@jreback jreback modified the milestone: Next Major Release, 0.17.1 Nov 13, 2015

jreback closed this in #11172 Nov 14, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment