Skip to content

ENH: Add Series.histogram() #3945

danielballan opened this Issue Jun 18, 2013 · 13 comments

5 participants


Should I write a Series.histogram that calls numpy.histogram and returns a new Series of counts indexed by bins?

In [4]: s = Series(randn(100))

In [5]: counts, bins = np.histogram(s)

In [6]: Series(counts, index=bins[:-1])
-2.968575     1
-2.355032     4
-1.741488     5
-1.127944    26
-0.514401    23
 0.099143    23
 0.712686    12
 1.326230     5
 1.939773     0
 2.553317     1
dtype: int32

...would be accomplished by...


which would accept at the arguments that np.histogram accepts (e.g., choosing bins manually, specifying number or range of bins, etc.).


The proposed method doesn't save all that much typing, but it would help users discover this convenient way of storing a histogram. Proceed?

jreback commented Jun 18, 2013

What you are proposing is basically this
(I am not sure what is a better result in any event)

In [9]: s.groupby(pd.cut(s,10)).count()
(-2.955, -2.419]     1
(-2.419, -1.889]     2
(-1.889, -1.359]     2
(-1.359, -0.829]     8
(-0.829, -0.299]    24
(-0.299, 0.231]     21
(0.231, 0.761]      15
(0.761, 1.291]      14
(1.291, 1.821]      10
(1.821, 2.351]       3
dtype: int64

That's new to me. Not sure which output is more valuable for whatever users would do next.

jreback commented Jun 18, 2013

yours more friendly for plotting
cut is NaN friendly (not sure if np.histogram is)

Python for Data member

@jreback cut returns string row labels, right?

jreback commented Jun 20, 2013

it returns a Categorical which has both labels and levels

Python for Data member
cpcloud commented Jul 20, 2013

would this be more useful as a top level function instead of a method?

np.histogram is not nan friendly

jreback commented Jul 20, 2013

I think an instance method on series is about right (but maybe using groupy and cut as above)


I don't have strong feelings about this. It's certainly easy to work around, but I really hate working with Categoricals.

jreback commented Jul 22, 2013

problem is np.histogram doesn't like nan
maybe just drop em anyhow

Python for Data member
cpcloud commented Jul 22, 2013

or u can just get the bins from cut

cat, bins = cut(s, 10, retbins=True)
s = Series(cat.values, index=bins[:-1],

not sure if name should be there...

Python for Data member
hayd commented Aug 6, 2013

confusing having different hist and histogram methods... perhaps this could be an argument (bins) to value_counts ?

Python for Data member
hayd commented Aug 6, 2013

This is pretty easy, pr on the way, will also add more functionality to Series.value_counts which was never updated from pd.value_counts.

@hayd hayd closed this in #4502 Aug 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.