Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose histogram bin estimators? #10183

Closed
dstansby opened this issue Dec 9, 2017 · 10 comments
Closed

Expose histogram bin estimators? #10183

dstansby opened this issue Dec 9, 2017 · 10 comments

Comments

@dstansby
Copy link
Contributor

dstansby commented Dec 9, 2017

Would it be possible to expose the histogram bin estimators available in numpy? e.g.

def _hist_bin_auto(x):
and others.

We would find it useful in Matplotlib for computing bins without having to spend time computing the full histogram (xref matplotlib/matplotlib#8636)

@eric-wieser
Copy link
Member

Is the application here:

  • Compute the bins for np.concatenate([a, b, c])
  • Compute the histogram for each of a, b, c, using those bins?

That seems like a pretty reasonable application to me. I'm not sure if we should directly expose those functions, or just expose a bins_only=True argument to np.histogram.

@eric-wieser
Copy link
Member

I'm not sure how we can expose the uniform range optimization in that use case, but I suppose optimality is not the primary concern.

@eric-wieser
Copy link
Member

It's waiting on #10186, but I have a local patch that exposes bin_edges = histogram_bins(a, bins=10, range=None, weights).

Does that sound like a reasonable API?

@dstansby
Copy link
Contributor Author

I think the application is computing bins for np.concatenate([a,b,c]). All of your suggestions above sound good!

@shoyer
Copy link
Member

shoyer commented Dec 12, 2017

I like the separate histogram_bins function.

@eric-wieser
Copy link
Member

eric-wieser commented Dec 24, 2017

Alright, as of #10261 this is now trivial to patch on master in np.lib.histograms as

def histogram_bin_edges(a, bins=10, range=None, weights=None):
    a, weights = _ravel_and_check_weights(a, weights)
    bin_edges, _ = _histogram_bins(a, bins, range, weights)
    return bin_edges

Most of the work here is to add a docstring, and some simple tests - so tagging as an easy fix

I suppose emailing the mailing list to confirm the function name could be valuable too.

@kirit93
Copy link
Contributor

kirit93 commented Feb 14, 2018

Does this issue still need to be worked on? If so, I'd like to work on this!

@eric-wieser
Copy link
Member

@kirit93: Go for it!

@eric-wieser eric-wieser added this to the 1.15.0 release milestone Feb 14, 2018
@kirit93
Copy link
Contributor

kirit93 commented Feb 15, 2018

@eric-wieser Have you added this function to master?

    a, weights = _ravel_and_check_weights(a, weights)
    bin_edges, _ = _histogram_bins(a, bins, range, weights)
    return bin_edges

There's no function _histogram_bins in np.lib.histograms.

Shouldn't the function be

def histogram_bin_edges(a, bins=10, range=None, weights=None):
    a, weights = _ravel_and_check_weights(a, weights)
    bin_edges, _ = _get_bin_edges(a, bins, range, weights)
    return bin_edges

I just need to add this function to np.lib.histogram with a docstring and add a test for this in np.lib.tests.test_histograms right?

Thanks!

@eric-wieser
Copy link
Member

eric-wieser commented Mar 16, 2018

@dstansby: This will be in 1.15 - although matplotlib's minimum numpy version is currently 8 minor versions behind that, so you could be waiting a long time!

I suppose you could backport the np.lib.histograms module in its entirety?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants