-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for batching NdOverlay plots #717
Conversation
We've wanted this feature for ages! Here are my overall comments regarding the structure of the code. As I don't know the details, you can explain why this won't work:
Now you'll tell me all the reasons it can't be done like this. :-) If it could be done this way, I would like it a lot. It would keep the class hierarchy as it is, and support batching just by implementing one method and setting one flag. Even if we needed two methods and one flag this would be a better way of supporting batching (imho) than introducing new classes. If batching needs to be enabled/disabled, maybe the attribute Anyway, it is good to see this prototypes and the speed ups are nice. We definitely do want this feature! |
I've already partially done this because I wasn't going to reimplement the entire Element plot but you definitely have a point, all Element plots could optionally support batched plotting directly without a separate plotting class. If I can implement this nicely the OverlayPlot that would have created a bunch of individual subplots can handle dispatching the objects, which would avoid the additional handling I had to put on the Renderer.
Right, I like this proposal and will prototype it.
If we go with this proposal we can simply make
There's only two places where the method is used so I can just use the correct one depending on the
Yes in this proposal there is no more problem, we should still come up with a better solution for adjoined plots though (in a different issue).
Ah yes, good we independently decided this might be a good idea.
Yes, this has a huge scope for speeding up plotting and the implementation is shaping up to be a lot simpler than I had originally imagined. Particularly for bokeh, where the performance penalty is both in creating the plot and rendering it in the browser this should have huge benefits, but even for matplotlib this will make certain plot types feasible. Specifically chloropleth plots of US counties using Polygons were basically not possible before but should actually become useable now. For an initial implementation I'd like to have batched plots for Points, Curves and Polygons in bokeh and matplotlib. |
Okay I've now implemented batching for Curves and Points in Bokeh. Implementing batching in matplotlib will be a bit more difficult because it will also need new methods to update the artists. With a bit more cleanup the bokeh implementation will be fairly clean and useable at least. You can now control batching via a plot option on the OverlayPlot. After working on the implementation a bit more there are a few more restrictions for batched mode, in some cases it will be impossible to provide legends. I don't think this is a major issue since batching only becomes necessary when there is so many items that a legend doesn't make sense anyway. I'd argue the batched parameter should be False by default though. For now I think merging this PR with a few implementations for batched bokeh plots will be sufficient since it addresses the biggest issues of seriously slowing down the browser when plotting a lot of overlaid curves or points. I'll open a separate issue to implement the equivalent for the matplotlib backend. |
894f9b6
to
275ea47
Compare
Here is a comparison between the non-batched (left) and batched versions of the three plot types I've implemented now: Polygons polygons = hv.NdOverlay({i: hv.Polygons([np.random.rand(10,2)]) for i in range(10)})
polygons(plot=dict(batched=False)) + polygons Points points = hv.NdOverlay({i: hv.Scatter(np.random.rand(10,2)) for i in range(50)})
points(plot=dict(batched=False)) + points Curve curves = hv.NdOverlay({i: hv.Curve(np.random.rand(10,2)) for i in range(50)})
curves(plot=dict(batched=False)) + curves The missing legend is the major difference, however during batched plotting controlling the zorder of the individual plot Elements is often impossible too, because they are all rendered as one glyph, which has only one zorder. Not much we can do about it, and as long as you're forced to explicitly enable batching I don't think it's an issue. This notebook has some additional profiling information. |
Great! I'm glad my proposal is reasonable. My only potential objection is making Here is what I propose: make Edit 1: I just saw what you said about z-order. I understand that the entire batch has a single z-order but why does that mean the legend can't still be there on top of the whole batch? Edit 2: Given that |
Because usually you'd have one legend entry per artist, but in these batched plots I only create a single artist, which won't allow separate legend entries.
That's correct, although it would be nicer if it wasn't a plot option for Overlays, as they don't actually support batching, only NdOverlays, which are guaranteed to contain Elements of a consistent type, do. |
That makes sense. Though maybe it would be possible to fake? I.e create a single invisible point somewhere in the plot for each entry you want in the legend? Not saying we should, but I think you can fake the old behavior in a simple and inexpensive way. |
In matplotlib maybe, in bokeh legends are currently not exposed at a low enough level to really make this happen and adding fake glyphs would defeat the entire point of batching. |
I'm not sure that is true. The point of batching in my mind is optimization, the fake glyphs you would need to add would be as small as possible and therefore inexpensive. Is it really not possible to add hidden fake glyphs (i.e set visible to False) and get the legend out of that? Note that I'm not saying this is necessarily a good idea, but we should try consider some possibilities anyway. |
The expensive part is precisely in creating lots of individual glyphs, how much data each one contains is secondary. The entire speedup we get from batch plotting comes from the fact that you're no longer creating a bunch of individual glyphs, since the overall amount of data plotted is the same in both cases. I'd also still argue that once you have so many artists that batch plotting is needed, a legend will be pretty much useless to you anyway. Based on the current support for legends in bokeh, I don't think we can fake legends in any sensible way without losing most of the performance improvement that we're hoping to gain. We can revisit this at a later point though, when there is more general support for legends in bokeh, at which point we should be able to create a custom legend more easily. Therefore I'd say for now |
To me, that is the answer. I can't imagine a legend being useful with 100 entries or even 50 entries. What I am talking about is batch mode being on by default, and used for cases where there are say, only 10 entries in the legend which is reasonable. I would argue for batch mode being always on (so people automatically get the benefits), with support from legends when it makes sense (i.e up to some limit in the number of entries in the legend). For instance, maybe To me that seems like a good tradeoff. Fast when there are lots of entries (legends disable themselves by default), fast when legends are off and only slow when there are lots of entries and the user explicitly asks for a huge legend (which is unlikely to be useful or desirable anyway). Edit: Instead of overloading |
I'd agree that that's reasonable behavior. That said, I have not yet implemented a way to update legends in bokeh anyway, I think it should be possible but I can't be absolutely sure. So I think if we agree to do this, it will be in a separate PR. Then there's the zorder issue to work out, we should file an issue with bokeh to see if we can get the I'd suggest I do a final cleanup of this PR setting batched to False for now, and then I'll follow it up with another PR to work on your suggestion for legends, at which point the behavior should match and we can enable it by default. |
Ok great! At least we agree on what the behaviour should be if we can get it to work that way. I'm happy to merge this PR once you've done the cleanup. I think the |
You guys came to the conclusions that I was going to propose anyway. :-) Having the results differ due to lack of z-order control is indeed alarming, and it seems like Bokeh should be respecting the order. Alternatively, if the order is well defined in the batched case, one could consider reordering the non-batched case to match, but it sounds like the bahavior in the non-batched case is correct already (respecting the original order). Isn't the fact that the legend is appearing below the data a bug, unrelated to this PR? It's happening in the non-batched case, and seems like it should never be happening. You could imagine having batched be a Boolean Dynamic parameter that has access to the number of items (so that the rest of the code doesn't have to worry about the limit parameter), but I'm not sure how the number of items would be provided to such an object. We do need to make Booleans and Strings be dynamic, in any case. |
Yes, I'll try to work out if there is any well defined order and then file an issue.
Yes, that is also a bug. The legend is set to the 'annotation' level which I thought would be on top, but I can try setting it to 'overlay' to fix that. I'll do that in the separate legends PR.
That might be nice but since the OverlayPlot decides whether to batch, it happens in only one place anyway. However even in the absence of that instead of going through the work of faking a legend, which I believe destroys all performance gains, we simply offer control over the number of items it should render normally before batching is enabled and no legend is generated. That means we don't have to complicate the implementation further but offer the same level of control. Batching is only enabled when you have too many items for a legend to be sensible, which is exactly when you need it. Plotting 30 glyphs is not really a problem, plotting >100 is and that's well beyond the point where a legend is sensible. |
Sounds good, but there's a grey area between when legends start being not very usable and when they become completely ridiculous. I'd think legends are only typically useful up to about 12 or so (actually only 9 fit with the current spacing for Points legends, but 12 could presumably fit, especially if it's moved out of the plot), but they are still sensible (i.e., conceivable in some cases) up to 30 or so. I'd put the limit fairly conservatively, i.e. only show a legend when it is definitely useful, but then allow people to up the limit if they verify it makes sense in their case. |
The main issue is that control over legends is still limited in bokeh, it can't be placed outside the plot for instance and I'm dubious about being able to create hidden artists which only show up in the legend since they will also be hidden automatically. I think that along with things like colorbars, this is something the bokeh team will look at after the layout work is done. So in the meantime long legends are of limited usefulness in bokeh anyway and I'd like to avoid implementing some custom hack to fake them with hidden artists. We can revisit that decision at any point by making batching independent of the
We can probably drop the first requirement for an eventual matplotlib batched implementation and potentially also for bokeh if they expose more control over the legends. |
That sounds good. I wonder if Bokeh is sorting by x, then y, or something else bizarre like that. |
After zooming in it's broken again, something iffy on the bokehJS side. |
The
That's also a question we'll have to ask the bokeh team. I've read the patches code and it simply separates paths with NaNs, which should be pretty solid. So it could be WebGL,
Note that it seemingly looks better in the batched case. I think there's fixes to hiDPI WebGL rendering in bokeh 0.12 though, so these differences might not persist. I'll be checking out the latest 0.12 dev release tomorrow, to check how we're doing on compatibility in general. |
Ok great! I couldn't tell which way round Jim's screenshots were. I assumed the worst and that batched version was lower quality. :-) |
Right; batched looks better, i.e. sharper and clearer, though I don't know which version actually matches the true extent of the line segment (as they can't both). The originals for my screenshot are towards the middle of this thread above; it's just a screenshot of a zoomed in screenshot. |
52267e4
to
baa9fb4
Compare
baa9fb4
to
6e398e6
Compare
Okay, I'll get the tests in order now, there was one instance with a plot with more than 20 legend items in the tests, so I've upped the limit slightly to 25. I've also added a comment to the Not sure what to do about the remaining differences though. I can briefly try to determine which line drawing call is more correct and then check whether there's any changes/improvements in the latest dev version but we may just have to come to terms with the differences for now. |
I would consider simply trying to call Generally, I think we should always assume a batched implementation is better and should always try to use |
As for Then you only need to specify the 'single' keyword in most cases: _plot_method = dict(single='scatter') Or both if they differ: _plot_methods=dict(single='line', batched='multi_line') Then for getting the batched plot method, it can fall back to the 'single' one if not specified: batch_plot_method = self._plot_method.get('batch', self._plot_method['single']) Seems less redundant in many cases and avoids adding an attribute which I expect will be rarely needed and would mirror |
That doesn't really help since it's the OverlayPlot that has to check whether the plot supports batching or not and either create multiple subplots or a single one.
I don't think this is generally true, yes in most cases it's what you want when you have a lot of plots but there are certain restrictions, such as the inability to use various hover and selection tools, which means we should be very clear about what batched mode does and does not support.
That sounds like a good idea.
Fallbacks like that won't work, as I said above by this point the OverlayPlot will have already decided whether to assign the plot an NdOverlay of Elements or a single Element, it will choose the appropriate one for the object it has been passed. It should never end up in a situation where the plot that doesn't support batching has been given an NdOverlay. |
Ok that makes sense. You could use
I was not implying that we would ignore the How about this...instead of A list of plotting classes that support batching would do, assuming all the necessary plotting classes will be defined (which must be true if One of the reasons I feel this is better (other than avoiding |
Not a huge fan of having another registry for a specific type of plots. We have a lot of them and I'm not convinced that's actually any cleaner than declaring it on the plot itself. That said, I've decided to unify the matplotlib and bokeh backends a bit more by creating a general |
37ac89d
to
2582fa8
Compare
Ok, I'm very happy with this use of I'll have a quick look again to see if I have any other comments, but right now I am happy to merge (once the tests pass). |
|
||
|
||
def get_zorder(self, overlay, key, el): | ||
spec = util.get_overlay_spec(overlay, key, el) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring here would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, doing that now.
I've made one minor comment above and I am happy to merge once tests pass. In general, I think
I remember wanting something like this ages ago when I investigated getting style options using matplotlibs eccentric documentation system. Edit: Linking to the underlying documentation of the plotting system would be useful for users as well. E.g to look up what style options really mean... |
2582fa8
to
c3de8c2
Compare
c3de8c2
to
34dd383
Compare
PR build passed, not sure why the push build didn't restart but the branch is up to date anyway. Ready to merge. |
Great! |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Following the proposal in #716 I've prototyped an initial implementation of batched plots for NdOverlays. The idea behind this is that creating a lot of individual plot instances for very similar plot elements is expensive and results in the creation of a huge number of artists and glyphs. Hooking into HoloViews to create plot classes, which plot a bunch of Elements at once is pretty straightforward and can provide considerable speed benefits. To test this I've prototyped a BatchedCurvePlot class, which can plot large NdOverlays of Curves very quickly in the bokeh backend.
I've tested this with this simple example:
After quick testing the batched plot renders this in 0.25 seconds vs the regular implementation which takes about 1.25 seconds to generate. More importantly however the generated bokeh output no longer freezes the browser temporarily, because it renders all the curves at once rather than as separate glyphs. Visually the two plots are identical except in bokeh the batched plot is coming out a lot crisper for me for some reason (I think this has been fixed in bokeh master).
There's still a few fixes to make and we need to decide on a general place for specialized plot registries such as for the batched plots here and for adjoined plots in other cases.