Should I write a Series.histogram that calls numpy.histogram and returns a new Series of counts indexed by bins?
In : s = Series(randn(100))
In : counts, bins = np.histogram(s)
In : Series(counts, index=bins[:-1])
...would be accomplished by...
which would accept at the arguments that np.histogram accepts (e.g., choosing bins manually, specifying number or range of bins, etc.).
The proposed method doesn't save all that much typing, but it would help users discover this convenient way of storing a histogram. Proceed?
What you are proposing is basically this
(I am not sure what is a better result in any event)
In : s.groupby(pd.cut(s,10)).count()
(-2.955, -2.419] 1
(-2.419, -1.889] 2
(-1.889, -1.359] 2
(-1.359, -0.829] 8
(-0.829, -0.299] 24
(-0.299, 0.231] 21
(0.231, 0.761] 15
(0.761, 1.291] 14
(1.291, 1.821] 10
(1.821, 2.351] 3
That's new to me. Not sure which output is more valuable for whatever users would do next.
yours more friendly for plotting
cut is NaN friendly (not sure if np.histogram is)
@jreback cut returns string row labels, right?
it returns a Categorical which has both labels and levels
would this be more useful as a top level function instead of a method?
np.histogram is not nan friendly
I think an instance method on series is about right (but maybe using groupy and cut as above)
I don't have strong feelings about this. It's certainly easy to work around, but I really hate working with Categoricals.
problem is np.histogram doesn't like nan
maybe just drop em anyhow
or u can just get the bins from cut
cat, bins = cut(s, 10, retbins=True)
s = Series(cat.values, index=bins[:-1], name=cat.name)
not sure if name should be there...
confusing having different hist and histogram methods... perhaps this could be an argument (bins) to value_counts ?
This is pretty easy, pr on the way, will also add more functionality to Series.value_counts which was never updated from pd.value_counts.