Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MonkeyClass pyplot #100

Closed
wants to merge 1 commit into from
Closed

feat: MonkeyClass pyplot #100

wants to merge 1 commit into from

Conversation

andrzejnovak
Copy link
Member

Allows for a more seamless matplotlib-like experience. As well as "patching" some objects in the future like set_xlabel before mpl catches up.

import mplhep.pyplot as plt
fig, ax = plt.subplots()
ax.histplot()

One downside is, this updates the Axes mpl class so while I don't know why you would need to, you cannot use matplotlib.pyplot at the same time. @HDembinski Do you know if this works for jax? If so I'll take a look at how to implement it.

Being able to do stuff like this main reason why I would prefer histplot not be renamed to hist but I'd be fine with hist_plot or some other non-conflicting name.

@HDembinski
Copy link
Member

HDembinski commented Mar 13, 2020

I think this feature needs wider discussion by the Scikit-HEP community. I can see the advantage of monkey-patching the Axes object.

We need to discuss the two options then regarding plotting of pre-binned histograms.

We overload the normal hist method

We overload the existing hist method. The existing method accepts a positional argument x, which normally is an array of values or a sequence of arrays (of equal length!). We could check for this case and then forward it to the original implementation. If x is a tuple like the one returned by numpy.histogram (which are arrays of non-equal length), we run our code. If x is a boost-histogram, we also run our code.

from mplhep import pyplot as plt
import numy as np
import boost_histogram as bh

x = (1, 2, 3, 4)
plt.hist(x) # calls original hist
plt.hist(np.histogram(x)) # calls our implementation
h = bh.Histogram(bh.axis.Regular(10, 0, 5))
h.fill(x)
plt.hist(h) # calls our implementation

Some arguments of original hist make no sense for pre-binned histograms, if these are set we need to raise an error. We can, however, also add new keywords that are only valid for our monkey-patched version and not the original hist.

Pro:

  • Straight-forward interface
  • Easy to learn and remember

Con:

  • Some arguments of original hist make no sense for pre-binned histograms but have to be included in the interface

We make a new function plot_hist

This gives us more freedom, which may seem great, but freedom is a two-sided edge. It means we have to make many more decisions as designers and our users have to learn more things to use our code.

Firstly regarding the name. As a general rule, methods and functions should be verbs since they model actions, and classes should be nouns. I read hist_plot as a noun, shorthand for "histogram plot", while plot_hist is an action, shorthand for "plot this histogram". Therefore, plot_hist is a better name than hist_plot.

plot_hist should still be as close as possible in interface to hist, to make adaptation easier. If you know how to make simple plots with pyplot.hist then those simple plots should also work with plot_hist. Nobody wants to look up the reference all the time. It would be especially confusing if the same keyword would do something different in hist and plot_hist.

Pro:

  • No superfluous keywords in the interface

Con:

  • Synchronization between hist and plot_hist interfaces has to be assured manually
  • Two similar functions with similar actions
  • One more interface to learn

Conclusion

I think we should override pyplot.hist.

@jpivarski
Copy link
Member

I don't know the Matplotlib interface well enough to say that monkey-patching is a good idea here (it usually isn't a good idea, but there are exceptions). Note that if you want to support Python 2, the new methods have to be added with types.MethodType(...).

The two options @HDembinski suggested are, if I'm reading it right, a choice to make after already having decided that we're going to monkey-patch Axes. If option 1 replaces the original hist method, making it no longer available, and option 2 supplements it with a new method with a new name, I strongly prefer the latter, because then at least the monkey-patching is non-obtrusive. If Matplotlib uses its own Axes.hist inside some function, or a third-party library does, then replacing that method would break things unexpectedly. A new name would only break things if someone's iterating over all methods or introduces a new method with exactly that name in the future—both are much less likely.

As for noun or verb, it's good for them to be consistent, and I don't know how consistent Matplotlib is in its method names. The word "plot" can be a noun or a verb (more likely a verb if placed first, as @HDembinski has pointed out), but "show" and "draw" are unambiguous.

On the other hand, the thing that distinguishes this histogram method from the standard one is that it draws prebinned data. Is there any way to get the word "prebinned" into it without making the method name long and hard to type? Looking at a list of methods, the existence of hist and plot_hist tells me nothing about why I should prefer one over the other, and saying that the latter is for HEP isn't a good answer. But if there was hist and prebinned_hist, then I would know right away which is my case.

I know that prebinned_hist sounds a lot like a noun, but maybe there's some way to rework it? If there's more than one of these, then a suite featuring the tag "prebin" or "prebinned" would help to unify the interface, making it clear what its domain is.

@andrzejnovak
Copy link
Member Author

I agree with @HDembinski that the primary guiding principle here should be

plot_hist should still be as close as possible in interface to hist, to make adaptation easier. If you know how to make simple plots with pyplot.hist then those simple plots should also work with plot_hist. Nobody wants to look up the reference all the time. It would be especially confusing if the same keyword would do something different in hist and plot_hist

(which is also why I think we should support passing an array of histograms, even though the points raised in the other thread are very valid)

I, however, draw the opposite conclusion and think that a separate method name, while keeping the rest of the API as close as possible to plt.hist is the least confusing. Regardless of possible monkey-patching issues raised by @jpivarski I think that a hist function that would at once take arrays on which to calculate histograms as well as already binned histograms, is not transparent.

Finding a good (short) name is a bit tricky. I understand the verb argument, though matplotlib is not very consistent with/on this. There are Axes methods that are interpretable as both verb and noun or just a noun. Consider: plot, scatter, step, fill_between vs errorbar, bar, pie, boxplot, violinplot There seems to be a good precedens for <something>plot hence the current name, but everything mentioned above could be justified

  • hist_plot (adding the "_")
  • plot_hist/draw_hist
  • prebinned_hist (bit too long imo)
  • binned_hist
  • bin_hist

Maybe we could have something of a vote

@HDembinski
Copy link
Member

HDembinski commented Mar 19, 2020

@jpivarski Thank you for commenting. I don't like monkey-patching much myself, but I am not against it in this case.

You misunderstood option 1. I explicitly wrote in my description that the original pyplot.hist should still work. We keep the original method around and when you pass a sequence that the original method would have handled, our method would simply pass that array to the orignal method. We just wrap the original hist to also accept pre-binned histograms. This is possible without ambiguity, since the data structures are different. I added a code example to make this more clear.

@HDembinski
Copy link
Member

HDembinski commented Mar 19, 2020

If you prefer to have a separate name, how about the name prebinned. It is short and it follows @jpivarski excellent advice to pick a name that signifies why you need to use this over hist. prebinned_hist is redundant. There is nothing else that can be binned, if it is binned it is a histogram.

As a general rule, functions and methods should be verbs, but pyplot is a bit special and local consistency within pyplot is more important than global consistency, I agree with that. @andrzejnovak Your remark makes sense to me that many methods on pyplot read like "plot something"

from matplotlib import pyplot as plot
plot.hist(...) # plot histogram
plot.scatter(...) # plot scatter graph

So I think the following fits in very well

plot.prebinned(...) # plot prebinned data

@jpivarski
Copy link
Member

I like the name prebinned. I just find it funny that it's neither a verb nor a noun: it's an adjective.

@andrzejnovak
Copy link
Member Author

prebinned makes a lot of sense. Though I still feel like it's a bit long. Do you think just binned would be confusing for users?

@jpivarski
Copy link
Member

There's a tradeoff between brevity and clarity: clarity is some optimal point in the curve. I think "binned" is a bit too little because "hist" is binned as well. The new thing here is the "pre".

@eduardo-rodrigues
Copy link
Member

For what it's worth I agree as well - I always go for clarity even if that means a bit longer a word. Honestly how much of a pain is 3 letters compared to what we have to write/code on a daily basis? It's not even a permille … ;-)

@HDembinski
Copy link
Member

HDembinski commented Apr 5, 2020

So it seems that we converged for the name to "prebinned" and perhaps "prebinned2d". I am not sure whether we need the latter though, since pyplot.pcolormesh is doing a decent job of plotting prebinned 2D histograms.

Todo:

  • add methods directly to Axes class, no need to inherit from Axes. There is only one class instance of Axes, so by adding methods to Axes, we change it everywhere.
  • finalize the interface for "prebinned", what keywords make sense? It is better to err on side of caution and do not add too many keywords at the first round. One can always add more later, but taking existing keywords away is cumbersome as it is breaking peoples code and you have to deprecate them and so on.

@HDembinski
Copy link
Member

@andrzejnovak I have some time, so is it ok if I make the changes to this branch?

@andrzejnovak
Copy link
Member Author

@HDembinski Sure, this makes sense to me.

I am thinking now that maybe the way forward would be to have prebinned as the "bare" method you were requesting, plotting one histogram at a time and not needing much else apart from (H, bins, errors). histplot could stay the "bloaty" user convenience interface for things like some basic error calculation, stacking/norming as well as the different histtypes

@HDembinski
Copy link
Member

That's a possible compromise, but not a good one. I don't see the value in keeping histplot. Are you perhaps motivated by the feeling that you don't want to loose the time you already invested in this? But that's how software development goes. We write a lot of code for the trash bin. By discussing the design with the Scikit-HEP community on gitter before actually implementing things, you can reduce this work.

Again: Design is about reducing interface bloat (and hence implementation bloat), to reduce mental load when learning a new library and to ease maintenance in the long run. One of the design goals of Scikit-HEP is to write small specialized tools that interoperate well with each other and with standard libs like numpy and matplotlib. This means we must resist the temptation to duplicate interfaces and functionality in the name of "convenenice". If it can be easily achieved with standard numpy/matplotlib, then we should not implement it.

I am writing a lot of words to give you arguments why your choices are not optimal. You should argue for your case as well, not just say: I want histplot to stay.

@HDembinski
Copy link
Member

One thing I think makes sense for prebinned is a histtype keyword like in pyplot.hist. In addition to the existing hist options bar, barstacked, step (which would draw our skyline), stepfilled, which we should also support, we could add errorbar to draw the equivalent of pyplot.errorbar, and box to draw a box with the width of a bin and vertical extension value +/- 1 std.dev.

@andrzejnovak
Copy link
Member Author

andrzejnovak commented Apr 6, 2020

I am writing a lot of words to give you arguments why your choices are not optimal. You should argue for your case as well, not just say: I want histplot to stay.

That's all I ask :)

I want to emphasize that I don't disagree with you on the underlying principles, I just don't arrive to the same conclusions. I am much less motivated by the time I've invested in mplhep already than by the time it's saving me on daily basis (if matplotlib ever grows a proper histplot function we can retire mplhep just to install the styles and fonts), so if prebinned has all the functionality I think I need, we can retire histplot too.

I think you are spot-on for the paramount principle here: "reduce mental load when learning" and like we agreed in the other thread, in this specific case it's more important to keep local consistency with matplotlib rather than a global strict adherence to python maxims. That's why I think that prebinned should be as intuitive as possible when coming from pyplot.hist and that means to include all of it's "conveniences".

If prebinned turns out to be just a thin wrap on pyplot.step, then there would be merit in keeping histplot to do what hist does, but your above post makes me think that's not in the end what you are aiming for, so there's some middle ground to be found here.

Could you elaborate on bar, barstacked and stepfilled, it sounds like one thing to me. I see

  • step (errors are drawn if passed in)
  • step (no edges)
  • fill
  • errorbar
  • box (currently missing from histplot)

@andrzejnovak
Copy link
Member Author

I am curious on what your take is on whether if we go the monkey-patch route, we should also allow the functions to be used independently (like my original commit here). I know "There should be one-- and preferably only one --obvious way to do it.", but I don't know whether we should gate the functionality behind the monkey patch, especially if we also modify some existing fucntions like set_{x|y}label to have a loc/align kwarg.

@HDembinski
Copy link
Member

I like the monkey patch solution, it seems like the most straight-forward.

Could you elaborate on bar, barstacked and stepfilled, it sounds like one thing to me.

I took those options from the docs of pyplot.hist, you can look it up there.

step (errors are drawn if passed in)

That does not work if we pass in the whole histogram, and it is not necessary. If you want a skyline and error bars on top (which is not a recommended plotting style), then users can simply call prebinned twice with different options, once with step and once with errorbar.

@HDembinski
Copy link
Member

HDembinski commented Apr 7, 2020

I want to emphasize that I don't disagree with you on the underlying principles, I just don't arrive to the same conclusions.

Still one of us must be wrong. Either the logic is flawed or the assumptions are different or we apply different weights to the same values. But anyhow, perhaps we can converge.

@andrzejnovak
Copy link
Member Author

I took those options from the docs of pyplot.hist, you can look it up there.

Ok

  • "bar" behaviour is not needed because if the data is already binned, plt.bar does the job (anyway only makes a difference when multiple hists are passed in, otherwise it's same as "stepfilled")
  • "barstacked" is a weird conflation of stack kwarg and histtype kwarg, which is pretty dumb
    So the 4 histtype options in plt.hist reduce to two for us "step" and "stepfilled" (maybe we should rename our "filled") on top of which we also want "errorbar" and "band"

step (errors are drawn if passed in)

That does not work if we pass in the whole histogram, and it is not necessary. If you want a skyline and error bars on top (which is not a recommended plotting style), then users can simply call prebinned twice with different options, once with step and once with errorbar.

I thought bh doesn't hold error info the same way a TH1 does no? I could have misread the README. Either way, if it holds errors or they'll be computed in mplhep, we could either have separate histtypes step and sth-like step-error or the
yerr kwarg taking True/False

I am against forcing users to make multiple calls to prebinned for the desired result. One call should be one "object". I don't want to argue, whether steps-with-errors should be recommended or not, but given how common it is to see ROOT plots like that and the community we're serving, it should be easy to do.

@HDembinski
Copy link
Member

HDembinski commented Apr 14, 2020

I thought bh doesn't hold error info the same way a TH1 does no?

Both can track uncertainties, but don't do it by default, because it costs extra. If you don't call TH1::Sumw2(), TH1 returns the sqrt(N) error, which you can also extract from a boost-histogram.

When you want to track uncertainties from weighted fills, you are supposed to use storage=Weight() when you construct a boost-histogram, while in TH1 you are supposed to call TH1::Sumw2().

@HDembinski
Copy link
Member

HDembinski commented Apr 17, 2020

We should wrap this PR up to have the monkey patching in place. The discussion about the design of prebinned should happen in a separate PR made about prebinned.

I am currently working on the improvements to patching code that I wrote about. I rebased to the current master and now I cannot run the tests.

========================================================== ERRORS ==========================================================
______________________________________________ ERROR collecting tests/test.py ______________________________________________
tests/test.py:9: in <module>
    import mplhep as hep  # noqa
mplhep/__init__.py:48: in <module>
    fm.fontManager.addfont(font)
E   AttributeError: 'FontManager' object has no attribute 'addfont'
===================================================== warnings summary =====================================================
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/ansiwrap/core.py:6
  /usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/ansiwrap/core.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I am using matplotlib-3.1.3. It looks like you are using matplotlib-2.x in CI testing. Perhaps addfont was removed in matplotlib-3?

@andrzejnovak
Copy link
Member Author

Yeah it's a matplotlib==3.2 thing. https://github.com/scikit-hep/mplhep/blob/master/setup.py#L7

It looks like you are using matplotlib-2.x in CI testing

What makes it look like that?

I agree, monkey patching machinery and prebinned need not be in the same PR.

@HDembinski
Copy link
Member

I wanted to work on this patch, and I cannot pytest it, unless this .addfont thing is fixed. What do you propose as a solution? You cannot require matplotlib-3.2 for this package, it also has to work with older matplotlib versions.

What makes it look like that?

I tried to find out how this problem could pass CI.

@andrzejnovak
Copy link
Member Author

andrzejnovak commented Apr 17, 2020

@HDembinski Why can we not require matplotlib-3.2?

The previous solution is still available in the master-py27 branch and raises a DeprecationWarning from matplotlib in 3.2 and will not work as soon as 3.3 rolls around.

We could check the version on import.

It looks like you are using matplotlib-2.x in CI testing

What makes it look like that?

I tried to find out how this problem could pass CI.

I don't understand, the CI obviously uses 3.2, or it would be failing.

@andrzejnovak
Copy link
Member Author

@HDembinski Still planning to take a look at this, or should I put it back on my to-do list? The 3.2 backcomp was added :)

@tacaswell
Copy link

You should do this by registering a custom projection which lets you control the (sub)class of the Axes that is created. See https://matplotlib.org/gallery/misc/custom_projection.html .

This is supported by every version of Matplotlib that is plausible to support.

@andrzejnovak
Copy link
Member Author

I am not sure I understand. The idea (which we, unfortunately, didn't follow up on) was to have a from mplhep import pyplot as plt kind of like you have from jax import numpy as np with the hopes of providing a temporary extension for things that would ultimately make it into matplotlib proper. So that user could write ax.histplot now instead of waiting for a suitable solution in mpl and user code would not have to change after it got properly implemented.

@HDembinski
Copy link
Member

@andrzejnovak I will have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants