Skip to content

ENH: Add Series.histogram() #3945

Closed
danielballan opened this Issue Jun 18, 2013 · 13 comments

5 participants

@danielballan

Should I write a Series.histogram that calls numpy.histogram and returns a new Series of counts indexed by bins?

In [4]: s = Series(randn(100))

In [5]: counts, bins = np.histogram(s)

In [6]: Series(counts, index=bins[:-1])
Out[6]: 
-2.968575     1
-2.355032     4
-1.741488     5
-1.127944    26
-0.514401    23
 0.099143    23
 0.712686    12
 1.326230     5
 1.939773     0
 2.553317     1
dtype: int32

...would be accomplished by...

Series.histogram()

which would accept at the arguments that np.histogram accepts (e.g., choosing bins manually, specifying number or range of bins, etc.).

from http://stackoverflow.com/a/17150734/1221924

The proposed method doesn't save all that much typing, but it would help users discover this convenient way of storing a histogram. Proceed?

@jreback
jreback commented Jun 18, 2013

What you are proposing is basically this
(I am not sure what is a better result in any event)

In [9]: s.groupby(pd.cut(s,10)).count()
Out[9]: 
(-2.955, -2.419]     1
(-2.419, -1.889]     2
(-1.889, -1.359]     2
(-1.359, -0.829]     8
(-0.829, -0.299]    24
(-0.299, 0.231]     21
(0.231, 0.761]      15
(0.761, 1.291]      14
(1.291, 1.821]      10
(1.821, 2.351]       3
dtype: int64
@danielballan

That's new to me. Not sure which output is more valuable for whatever users would do next.

@jreback
jreback commented Jun 18, 2013

yours more friendly for plotting
cut is NaN friendly (not sure if np.histogram is)

@jtratner
Python for Data member

@jreback cut returns string row labels, right?

@jreback
jreback commented Jun 20, 2013

it returns a Categorical which has both labels and levels

@cpcloud
Python for Data member
cpcloud commented Jul 20, 2013

would this be more useful as a top level function instead of a method?

np.histogram is not nan friendly

@jreback
jreback commented Jul 20, 2013

I think an instance method on series is about right (but maybe using groupy and cut as above)

@danielballan

I don't have strong feelings about this. It's certainly easy to work around, but I really hate working with Categoricals.

@jreback
jreback commented Jul 22, 2013

problem is np.histogram doesn't like nan
maybe just drop em anyhow

@cpcloud
Python for Data member
cpcloud commented Jul 22, 2013

or u can just get the bins from cut

cat, bins = cut(s, 10, retbins=True)
s = Series(cat.values, index=bins[:-1], name=cat.name)

not sure if name should be there...

@hayd
Python for Data member
hayd commented Aug 6, 2013

confusing having different hist and histogram methods... perhaps this could be an argument (bins) to value_counts ?

@danielballan
@hayd
Python for Data member
hayd commented Aug 6, 2013

This is pretty easy, pr on the way, will also add more functionality to Series.value_counts which was never updated from pd.value_counts.

@hayd hayd closed this in #4502 Aug 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.