feat: MonkeyClass pyplot #100

andrzejnovak · 2020-03-10T22:06:00Z

Allows for a more seamless matplotlib-like experience. As well as "patching" some objects in the future like set_xlabel before mpl catches up.

import mplhep.pyplot as plt
fig, ax = plt.subplots()
ax.histplot()

One downside is, this updates the Axes mpl class so while I don't know why you would need to, you cannot use matplotlib.pyplot at the same time. @HDembinski Do you know if this works for jax? If so I'll take a look at how to implement it.

Being able to do stuff like this main reason why I would prefer histplot not be renamed to hist but I'd be fine with hist_plot or some other non-conflicting name.

HDembinski · 2020-03-13T09:56:14Z

I think this feature needs wider discussion by the Scikit-HEP community. I can see the advantage of monkey-patching the Axes object.

We need to discuss the two options then regarding plotting of pre-binned histograms.

We overload the normal `hist` method

We overload the existing hist method. The existing method accepts a positional argument x, which normally is an array of values or a sequence of arrays (of equal length!). We could check for this case and then forward it to the original implementation. If x is a tuple like the one returned by numpy.histogram (which are arrays of non-equal length), we run our code. If x is a boost-histogram, we also run our code.

from mplhep import pyplot as plt
import numy as np
import boost_histogram as bh

x = (1, 2, 3, 4)
plt.hist(x) # calls original hist
plt.hist(np.histogram(x)) # calls our implementation
h = bh.Histogram(bh.axis.Regular(10, 0, 5))
h.fill(x)
plt.hist(h) # calls our implementation

Some arguments of original hist make no sense for pre-binned histograms, if these are set we need to raise an error. We can, however, also add new keywords that are only valid for our monkey-patched version and not the original hist.

Pro:

Straight-forward interface
Easy to learn and remember

Con:

Some arguments of original hist make no sense for pre-binned histograms but have to be included in the interface

We make a new function `plot_hist`

This gives us more freedom, which may seem great, but freedom is a two-sided edge. It means we have to make many more decisions as designers and our users have to learn more things to use our code.

Firstly regarding the name. As a general rule, methods and functions should be verbs since they model actions, and classes should be nouns. I read hist_plot as a noun, shorthand for "histogram plot", while plot_hist is an action, shorthand for "plot this histogram". Therefore, plot_hist is a better name than hist_plot.

plot_hist should still be as close as possible in interface to hist, to make adaptation easier. If you know how to make simple plots with pyplot.hist then those simple plots should also work with plot_hist. Nobody wants to look up the reference all the time. It would be especially confusing if the same keyword would do something different in hist and plot_hist.

Pro:

No superfluous keywords in the interface

Con:

Synchronization between hist and plot_hist interfaces has to be assured manually
Two similar functions with similar actions
One more interface to learn

Conclusion

I think we should override pyplot.hist.

jpivarski · 2020-03-13T13:49:12Z

I don't know the Matplotlib interface well enough to say that monkey-patching is a good idea here (it usually isn't a good idea, but there are exceptions). Note that if you want to support Python 2, the new methods have to be added with types.MethodType(...).

The two options @HDembinski suggested are, if I'm reading it right, a choice to make after already having decided that we're going to monkey-patch Axes. If option 1 replaces the original hist method, making it no longer available, and option 2 supplements it with a new method with a new name, I strongly prefer the latter, because then at least the monkey-patching is non-obtrusive. If Matplotlib uses its own Axes.hist inside some function, or a third-party library does, then replacing that method would break things unexpectedly. A new name would only break things if someone's iterating over all methods or introduces a new method with exactly that name in the future—both are much less likely.

As for noun or verb, it's good for them to be consistent, and I don't know how consistent Matplotlib is in its method names. The word "plot" can be a noun or a verb (more likely a verb if placed first, as @HDembinski has pointed out), but "show" and "draw" are unambiguous.

On the other hand, the thing that distinguishes this histogram method from the standard one is that it draws prebinned data. Is there any way to get the word "prebinned" into it without making the method name long and hard to type? Looking at a list of methods, the existence of hist and plot_hist tells me nothing about why I should prefer one over the other, and saying that the latter is for HEP isn't a good answer. But if there was hist and prebinned_hist, then I would know right away which is my case.

I know that prebinned_hist sounds a lot like a noun, but maybe there's some way to rework it? If there's more than one of these, then a suite featuring the tag "prebin" or "prebinned" would help to unify the interface, making it clear what its domain is.

andrzejnovak · 2020-03-13T19:24:13Z

I agree with @HDembinski that the primary guiding principle here should be

plot_hist should still be as close as possible in interface to hist, to make adaptation easier. If you know how to make simple plots with pyplot.hist then those simple plots should also work with plot_hist. Nobody wants to look up the reference all the time. It would be especially confusing if the same keyword would do something different in hist and plot_hist

(which is also why I think we should support passing an array of histograms, even though the points raised in the other thread are very valid)

I, however, draw the opposite conclusion and think that a separate method name, while keeping the rest of the API as close as possible to plt.hist is the least confusing. Regardless of possible monkey-patching issues raised by @jpivarski I think that a hist function that would at once take arrays on which to calculate histograms as well as already binned histograms, is not transparent.

Finding a good (short) name is a bit tricky. I understand the verb argument, though matplotlib is not very consistent with/on this. There are Axes methods that are interpretable as both verb and noun or just a noun. Consider: plot, scatter, step, fill_between vs errorbar, bar, pie, boxplot, violinplot There seems to be a good precedens for <something>plot hence the current name, but everything mentioned above could be justified

hist_plot (adding the "_")
plot_hist/draw_hist
prebinned_hist (bit too long imo)
binned_hist
bin_hist

Maybe we could have something of a vote

HDembinski · 2020-03-19T08:18:01Z

@jpivarski Thank you for commenting. I don't like monkey-patching much myself, but I am not against it in this case.

You misunderstood option 1. I explicitly wrote in my description that the original pyplot.hist should still work. We keep the original method around and when you pass a sequence that the original method would have handled, our method would simply pass that array to the orignal method. We just wrap the original hist to also accept pre-binned histograms. This is possible without ambiguity, since the data structures are different. I added a code example to make this more clear.

HDembinski · 2020-03-19T08:35:45Z

If you prefer to have a separate name, how about the name prebinned. It is short and it follows @jpivarski excellent advice to pick a name that signifies why you need to use this over hist. prebinned_hist is redundant. There is nothing else that can be binned, if it is binned it is a histogram.

As a general rule, functions and methods should be verbs, but pyplot is a bit special and local consistency within pyplot is more important than global consistency, I agree with that. @andrzejnovak Your remark makes sense to me that many methods on pyplot read like "plot something"

from matplotlib import pyplot as plot
plot.hist(...) # plot histogram
plot.scatter(...) # plot scatter graph

So I think the following fits in very well

plot.prebinned(...) # plot prebinned data

jpivarski · 2020-03-19T12:09:51Z

I like the name prebinned. I just find it funny that it's neither a verb nor a noun: it's an adjective.

andrzejnovak · 2020-03-19T19:43:32Z

prebinned makes a lot of sense. Though I still feel like it's a bit long. Do you think just binned would be confusing for users?

jpivarski · 2020-03-19T19:48:53Z

There's a tradeoff between brevity and clarity: clarity is some optimal point in the curve. I think "binned" is a bit too little because "hist" is binned as well. The new thing here is the "pre".

eduardo-rodrigues · 2020-03-19T20:52:38Z

For what it's worth I agree as well - I always go for clarity even if that means a bit longer a word. Honestly how much of a pain is 3 letters compared to what we have to write/code on a daily basis? It's not even a permille … ;-)

HDembinski · 2020-04-05T10:01:26Z

So it seems that we converged for the name to "prebinned" and perhaps "prebinned2d". I am not sure whether we need the latter though, since pyplot.pcolormesh is doing a decent job of plotting prebinned 2D histograms.

Todo:

add methods directly to Axes class, no need to inherit from Axes. There is only one class instance of Axes, so by adding methods to Axes, we change it everywhere.
finalize the interface for "prebinned", what keywords make sense? It is better to err on side of caution and do not add too many keywords at the first round. One can always add more later, but taking existing keywords away is cumbersome as it is breaking peoples code and you have to deprecate them and so on.

HDembinski · 2020-04-05T10:02:16Z

@andrzejnovak I have some time, so is it ok if I make the changes to this branch?

andrzejnovak · 2020-04-05T12:39:05Z

@HDembinski Sure, this makes sense to me.

I am thinking now that maybe the way forward would be to have prebinned as the "bare" method you were requesting, plotting one histogram at a time and not needing much else apart from (H, bins, errors). histplot could stay the "bloaty" user convenience interface for things like some basic error calculation, stacking/norming as well as the different histtypes

HDembinski · 2020-04-06T08:20:03Z

That's a possible compromise, but not a good one. I don't see the value in keeping histplot. Are you perhaps motivated by the feeling that you don't want to loose the time you already invested in this? But that's how software development goes. We write a lot of code for the trash bin. By discussing the design with the Scikit-HEP community on gitter before actually implementing things, you can reduce this work.

Again: Design is about reducing interface bloat (and hence implementation bloat), to reduce mental load when learning a new library and to ease maintenance in the long run. One of the design goals of Scikit-HEP is to write small specialized tools that interoperate well with each other and with standard libs like numpy and matplotlib. This means we must resist the temptation to duplicate interfaces and functionality in the name of "convenenice". If it can be easily achieved with standard numpy/matplotlib, then we should not implement it.

I am writing a lot of words to give you arguments why your choices are not optimal. You should argue for your case as well, not just say: I want histplot to stay.

HDembinski · 2020-04-06T08:31:49Z

One thing I think makes sense for prebinned is a histtype keyword like in pyplot.hist. In addition to the existing hist options bar, barstacked, step (which would draw our skyline), stepfilled, which we should also support, we could add errorbar to draw the equivalent of pyplot.errorbar, and box to draw a box with the width of a bin and vertical extension value +/- 1 std.dev.

andrzejnovak · 2020-04-06T08:49:44Z

I am writing a lot of words to give you arguments why your choices are not optimal. You should argue for your case as well, not just say: I want histplot to stay.

That's all I ask :)

I want to emphasize that I don't disagree with you on the underlying principles, I just don't arrive to the same conclusions. I am much less motivated by the time I've invested in mplhep already than by the time it's saving me on daily basis (if matplotlib ever grows a proper histplot function we can retire mplhep just to install the styles and fonts), so if prebinned has all the functionality I think I need, we can retire histplot too.

I think you are spot-on for the paramount principle here: "reduce mental load when learning" and like we agreed in the other thread, in this specific case it's more important to keep local consistency with matplotlib rather than a global strict adherence to python maxims. That's why I think that prebinned should be as intuitive as possible when coming from pyplot.hist and that means to include all of it's "conveniences".

If prebinned turns out to be just a thin wrap on pyplot.step, then there would be merit in keeping histplot to do what hist does, but your above post makes me think that's not in the end what you are aiming for, so there's some middle ground to be found here.

Could you elaborate on bar, barstacked and stepfilled, it sounds like one thing to me. I see

step (errors are drawn if passed in)
step (no edges)
fill
errorbar
box (currently missing from histplot)

andrzejnovak · 2020-04-06T08:55:41Z

I am curious on what your take is on whether if we go the monkey-patch route, we should also allow the functions to be used independently (like my original commit here). I know "There should be one-- and preferably only one --obvious way to do it.", but I don't know whether we should gate the functionality behind the monkey patch, especially if we also modify some existing fucntions like set_{x|y}label to have a loc/align kwarg.

HDembinski · 2020-04-07T12:04:29Z

I like the monkey patch solution, it seems like the most straight-forward.

Could you elaborate on bar, barstacked and stepfilled, it sounds like one thing to me.

I took those options from the docs of pyplot.hist, you can look it up there.

step (errors are drawn if passed in)

That does not work if we pass in the whole histogram, and it is not necessary. If you want a skyline and error bars on top (which is not a recommended plotting style), then users can simply call prebinned twice with different options, once with step and once with errorbar.

HDembinski · 2020-04-07T12:12:39Z

I want to emphasize that I don't disagree with you on the underlying principles, I just don't arrive to the same conclusions.

Still one of us must be wrong. Either the logic is flawed or the assumptions are different or we apply different weights to the same values. But anyhow, perhaps we can converge.

andrzejnovak · 2020-04-07T12:46:30Z

I took those options from the docs of pyplot.hist, you can look it up there.

Ok

"bar" behaviour is not needed because if the data is already binned, plt.bar does the job (anyway only makes a difference when multiple hists are passed in, otherwise it's same as "stepfilled")
"barstacked" is a weird conflation of stack kwarg and histtype kwarg, which is pretty dumb
So the 4 histtype options in plt.hist reduce to two for us "step" and "stepfilled" (maybe we should rename our "filled") on top of which we also want "errorbar" and "band"

step (errors are drawn if passed in)

That does not work if we pass in the whole histogram, and it is not necessary. If you want a skyline and error bars on top (which is not a recommended plotting style), then users can simply call prebinned twice with different options, once with step and once with errorbar.

I thought bh doesn't hold error info the same way a TH1 does no? I could have misread the README. Either way, if it holds errors or they'll be computed in mplhep, we could either have separate histtypes step and sth-like step-error or the
yerr kwarg taking True/False

I am against forcing users to make multiple calls to prebinned for the desired result. One call should be one "object". I don't want to argue, whether steps-with-errors should be recommended or not, but given how common it is to see ROOT plots like that and the community we're serving, it should be easy to do.

HDembinski · 2020-04-14T11:36:14Z

I thought bh doesn't hold error info the same way a TH1 does no?

Both can track uncertainties, but don't do it by default, because it costs extra. If you don't call TH1::Sumw2(), TH1 returns the sqrt(N) error, which you can also extract from a boost-histogram.

When you want to track uncertainties from weighted fills, you are supposed to use storage=Weight() when you construct a boost-histogram, while in TH1 you are supposed to call TH1::Sumw2().

HDembinski · 2020-04-17T11:47:57Z

We should wrap this PR up to have the monkey patching in place. The discussion about the design of prebinned should happen in a separate PR made about prebinned.

I am currently working on the improvements to patching code that I wrote about. I rebased to the current master and now I cannot run the tests.

========================================================== ERRORS ==========================================================
______________________________________________ ERROR collecting tests/test.py ______________________________________________
tests/test.py:9: in <module>
    import mplhep as hep  # noqa
mplhep/__init__.py:48: in <module>
    fm.fontManager.addfont(font)
E   AttributeError: 'FontManager' object has no attribute 'addfont'
===================================================== warnings summary =====================================================
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/ansiwrap/core.py:6
  /usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/ansiwrap/core.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I am using matplotlib-3.1.3. It looks like you are using matplotlib-2.x in CI testing. Perhaps addfont was removed in matplotlib-3?

andrzejnovak · 2020-04-17T12:59:33Z

Yeah it's a matplotlib==3.2 thing. https://github.com/scikit-hep/mplhep/blob/master/setup.py#L7

It looks like you are using matplotlib-2.x in CI testing

What makes it look like that?

I agree, monkey patching machinery and prebinned need not be in the same PR.

HDembinski · 2020-04-17T13:12:04Z

I wanted to work on this patch, and I cannot pytest it, unless this .addfont thing is fixed. What do you propose as a solution? You cannot require matplotlib-3.2 for this package, it also has to work with older matplotlib versions.

What makes it look like that?

I tried to find out how this problem could pass CI.

andrzejnovak · 2020-04-17T13:22:30Z

@HDembinski Why can we not require matplotlib-3.2?

The previous solution is still available in the master-py27 branch and raises a DeprecationWarning from matplotlib in 3.2 and will not work as soon as 3.3 rolls around.

We could check the version on import.

It looks like you are using matplotlib-2.x in CI testing

What makes it look like that?

I tried to find out how this problem could pass CI.

I don't understand, the CI obviously uses 3.2, or it would be failing.

andrzejnovak · 2020-05-20T14:54:02Z

@HDembinski Still planning to take a look at this, or should I put it back on my to-do list? The 3.2 backcomp was added :)

tacaswell · 2020-07-13T16:30:01Z

You should do this by registering a custom projection which lets you control the (sub)class of the Axes that is created. See https://matplotlib.org/gallery/misc/custom_projection.html .

This is supported by every version of Matplotlib that is plausible to support.

andrzejnovak · 2020-07-13T16:54:13Z

I am not sure I understand. The idea (which we, unfortunately, didn't follow up on) was to have a from mplhep import pyplot as plt kind of like you have from jax import numpy as np with the hopes of providing a temporary extension for things that would ultimately make it into matplotlib proper. So that user could write ax.histplot now instead of waiting for a suitable solution in mpl and user code would not have to change after it got properly implemented.

HDembinski · 2020-07-14T11:54:12Z

@andrzejnovak I will have a look at it.

feat: MonkeyClass pyplot

df94ad6

HDembinski mentioned this pull request Mar 13, 2020

Allow hist objects to go into histplot(), Use edges by default, small fixes #99

Merged

andrzejnovak mentioned this pull request Apr 17, 2020

[Discussion] prebinned spec #126

Closed

andrzejnovak closed this Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MonkeyClass pyplot #100

feat: MonkeyClass pyplot #100

andrzejnovak commented Mar 10, 2020

HDembinski commented Mar 13, 2020 •

edited

jpivarski commented Mar 13, 2020

andrzejnovak commented Mar 13, 2020

HDembinski commented Mar 19, 2020 •

edited

HDembinski commented Mar 19, 2020 •

edited

jpivarski commented Mar 19, 2020

andrzejnovak commented Mar 19, 2020

jpivarski commented Mar 19, 2020

eduardo-rodrigues commented Mar 19, 2020

HDembinski commented Apr 5, 2020 •

edited

HDembinski commented Apr 5, 2020

andrzejnovak commented Apr 5, 2020

HDembinski commented Apr 6, 2020

HDembinski commented Apr 6, 2020

andrzejnovak commented Apr 6, 2020 •

edited

andrzejnovak commented Apr 6, 2020

HDembinski commented Apr 7, 2020

HDembinski commented Apr 7, 2020 •

edited

andrzejnovak commented Apr 7, 2020

HDembinski commented Apr 14, 2020 •

edited

HDembinski commented Apr 17, 2020 •

edited

andrzejnovak commented Apr 17, 2020

HDembinski commented Apr 17, 2020

andrzejnovak commented Apr 17, 2020 •

edited

andrzejnovak commented May 20, 2020

tacaswell commented Jul 13, 2020

andrzejnovak commented Jul 13, 2020

HDembinski commented Jul 14, 2020

feat: MonkeyClass pyplot #100

feat: MonkeyClass pyplot #100

Conversation

andrzejnovak commented Mar 10, 2020

HDembinski commented Mar 13, 2020 • edited

We overload the normal hist method

We make a new function plot_hist

Conclusion

jpivarski commented Mar 13, 2020

andrzejnovak commented Mar 13, 2020

HDembinski commented Mar 19, 2020 • edited

HDembinski commented Mar 19, 2020 • edited

jpivarski commented Mar 19, 2020

andrzejnovak commented Mar 19, 2020

jpivarski commented Mar 19, 2020

eduardo-rodrigues commented Mar 19, 2020

HDembinski commented Apr 5, 2020 • edited

HDembinski commented Apr 5, 2020

andrzejnovak commented Apr 5, 2020

HDembinski commented Apr 6, 2020

HDembinski commented Apr 6, 2020

andrzejnovak commented Apr 6, 2020 • edited

andrzejnovak commented Apr 6, 2020

HDembinski commented Apr 7, 2020

HDembinski commented Apr 7, 2020 • edited

andrzejnovak commented Apr 7, 2020

HDembinski commented Apr 14, 2020 • edited

HDembinski commented Apr 17, 2020 • edited

andrzejnovak commented Apr 17, 2020

HDembinski commented Apr 17, 2020

andrzejnovak commented Apr 17, 2020 • edited

andrzejnovak commented May 20, 2020

tacaswell commented Jul 13, 2020

andrzejnovak commented Jul 13, 2020

HDembinski commented Jul 14, 2020

HDembinski commented Mar 13, 2020 •

edited

We overload the normal `hist` method

We make a new function `plot_hist`

HDembinski commented Mar 19, 2020 •

edited

HDembinski commented Mar 19, 2020 •

edited

HDembinski commented Apr 5, 2020 •

edited

andrzejnovak commented Apr 6, 2020 •

edited

HDembinski commented Apr 7, 2020 •

edited

HDembinski commented Apr 14, 2020 •

edited

HDembinski commented Apr 17, 2020 •

edited

andrzejnovak commented Apr 17, 2020 •

edited