New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make Series.ptp() handle missing values #11163

Closed
ajcr opened this Issue Sep 21, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@ajcr
Contributor

ajcr commented Sep 21, 2015

Currently (in master), Series.ptp() is just implemented using np.ptp() and so the method will return nan for any Series that has one or more missing values:

>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan

It is simple to write s.max() - s.min() instead, but the ptp() result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp() method to ignore NaN.

If there is any agreement as to whether ptp() should be changed, I would like to work on a pull request!


Extending the idea, it might be useful to have both DataFrame.ptp() and groupby.ptp() methods.

For this example DataFrame...

df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
                   'b': [3, 11, 72, 46, 32],
                   'c': [1.2, 6.7, 13.9, np.nan, -7.7],
                   'd': ['v', 'w', 'x', 'y', 'z']})

...I would expect the following behaviour:

>>> df.ptp()
a      1
b     69
c   12.7
dtype: float64

>>> df.ptp(axis=1)
0     2.0
1     9.0
2    70.0
3    45.0
4    39.7
dtype: float64

>>> df.groupby('a').ptp()
    b    c
a         
1  43  8.9
2  61  7.2

Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Sep 21, 2015

Contributor

absolutely. this should do .align, then .max()-.min(). Its here prob just as a convience.

want to do a pull-requests?

Contributor

jreback commented Sep 21, 2015

absolutely. this should do .align, then .max()-.min(). Its here prob just as a convience.

want to do a pull-requests?

@jreback jreback added this to the Next Major Release milestone Sep 21, 2015

@ajcr

This comment has been minimized.

Show comment
Hide comment
@ajcr

ajcr Sep 21, 2015

Contributor

Sure, I can work on this. It looks pretty straightforward, although I guess non-numeric columns will have to skipped over as 'str' - 'str' will raise an error, for example.

Contributor

ajcr commented Sep 21, 2015

Sure, I can work on this. It looks pretty straightforward, although I guess non-numeric columns will have to skipped over as 'str' - 'str' will raise an error, for example.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Sep 22, 2015

Member

Certainly OK to fix the NaN issue in Series.ptp! (we could consider it a bug)

But, I am a bit more hesitant on the second part, adding it to DataFrame. Some reasons: 1) we already have many methods, and personally I don't think ptp has enough added value to justify addition, 2) it is really easy to do this yourself and 3) I also don't find ptp a very clear name.

Member

jorisvandenbossche commented Sep 22, 2015

Certainly OK to fix the NaN issue in Series.ptp! (we could consider it a bug)

But, I am a bit more hesitant on the second part, adding it to DataFrame. Some reasons: 1) we already have many methods, and personally I don't think ptp has enough added value to justify addition, 2) it is really easy to do this yourself and 3) I also don't find ptp a very clear name.

@ajcr

This comment has been minimized.

Show comment
Hide comment
@ajcr

ajcr Sep 22, 2015

Contributor

I agree ptp is very easily implemented by the user if they need ever it, so maybe it's not worth adding it as a new method to DataFrame and groupby for now.

In that case, I'll just change Series.ptp() so that it's written as max() - min() and so is NaN-aware.

Contributor

ajcr commented Sep 22, 2015

I agree ptp is very easily implemented by the user if they need ever it, so maybe it's not worth adding it as a new method to DataFrame and groupby for now.

In that case, I'll just change Series.ptp() so that it's written as max() - min() and so is NaN-aware.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 5, 2015

@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment