Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add quantiles kw to Series.describe to create dynamic quantiles #4196

Closed
jreback opened this issue Jul 10, 2013 · 8 comments · Fixed by #7088
Closed

ENH: add quantiles kw to Series.describe to create dynamic quantiles #4196

jreback opened this issue Jul 10, 2013 · 8 comments · Fixed by #7088
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

This of course can be done directly, but adding the keyword quantiles that accepts a number or list (to create multiple fields), seems like a nice idea

and maybe deprecate percentile_width / replace with quantiles=[50]

like this:
http://stackoverflow.com/questions/17578115/pass-percentiles-to-pandas-agg-function

@ghost ghost assigned hayd Oct 23, 2013
@hayd
Copy link
Contributor

hayd commented Oct 23, 2013

Assigned this to me, think it should be pretty straightforward (I don' think it even needs to be cythonized...)

@ghost
Copy link

ghost commented Jan 24, 2014

removed @hayd as assignee

@TomAugspurger
Copy link
Contributor

I've got an implementation for this.

If you were to do

In [7]: df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])

In [8]: df
Out[8]: 
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

[3 rows x 3 columns]

In [9]: df.quantile([.1, .9])

Would you expect

Out[9]: 
   0.1  0.9
a  1.6  6.4
b  2.6  7.4
c  3.6  8.4

[3 rows x 2 columns]

or

Out[10]: 
       a    b    c
0.1  1.6  2.6  3.6
0.9  6.4  7.4  8.4

[2 rows x 3 columns]

For comparison:

In [11]: df.quantile(.1)
Out[11]: 
a    1.6
b    2.6
c    3.6
dtype: float64

and

In [13]: df['a'].quantile([.1, .9])
Out[13]: 
0.1    1.6
0.9    6.4
dtype: float64

@TomAugspurger TomAugspurger self-assigned this Apr 23, 2014
@jreback
Copy link
Contributor Author

jreback commented Apr 23, 2014

I think this one makes the most sense as you get back the same columns (of course they are just a transpose away)

Out[10]: 
       a    b    c
0.1  1.6  2.6  3.6
0.9  6.4  7.4  8.4

[2 rows x 3 columns]

@hayd
Copy link
Contributor

hayd commented Apr 24, 2014

👍 df['a'].quantile(..) should be the same as df.quantile(..)['a'](so I guess there should be some name attributes kicking around here). This would be nice addition!

@TomAugspurger
Copy link
Contributor

Thanks, that's what I was thinking.

I've got one failing test, it may be a separate issue.

In pandas/core/frame.py,

    def _apply_empty_result(self, func, axis, reduce):
        if reduce is None:
            reduce = False
            try:
                reduce = not isinstance(func(_EMPTY_SERIES), Series)  <---
            except Exception:
                pass

        if reduce:
            return Series(NA, index=self._get_agg_axis(axis))
        else:
            return self.copy()

Should this function take the *args and **kwargs** passed to df.apply? When I get to the line I marked <---, my func expects a second argument (the quantile) given in *args. Since it isn't passed, a Exception is raised.

@jreback
Copy link
Contributor Author

jreback commented Apr 24, 2014

hmm...this is a heuristic to figure out if its a reduction function, so yes I think that would be right (take the args)

@jreback
Copy link
Contributor Author

jreback commented Apr 24, 2014

I think do that fix as a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants