New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a plot API to HoloViews #2446
Comments
Sounds like an excellent idea! I'd prefer it to stay part of HoloViews itself so that we can use it whenever is convenient. |
Frankly, I'm not the least bit enthusiastic about this style of API.
I think holoviews core should either support this idea or the That doesn't mean there couldn't be a separate extension that would live along I do agree that maintaining multiple redundant codebases is annoying so I am happy to see the common code live somewhere: the question is where. |
Supporting easy use of wide datasets is very important in the real world, where you can't control choices made by someone providing data to you, and you don't always want to tidy everything up just to get a plot. We can always make our own data we generate tidy, and our own examples tidy, but that's not the situation most people are in. That said, would there be a way to provide the wide-data support as part of |
I would much prefer this approach. |
The number of people who are familiar with this style of API vastly exceeds the HoloViews userbase and if we want to reach a larger number of people providing an easy and consistent API that addresses most users needs is essential. This also ignores the central issue in HoloViews this proposal addresses which is the lack of an API that allows users to explore wide datasets, a recurring limitation, which absolutely needs to be addressed in some form.
I think that's an avenue that might be worth considering, there's two main things I'd do to make the
If we extended Nonetheless even if the plotting API does not live in HoloViews itself, it seems a shame not to offer |
Extending |
I fully agree, but I suspect @jlstevens will not. The limitation of |
I people are familiar with this style, why would they switch to holoviews instead of continuing to use the tools they are already using? What is the point of offering something they already have? It seems to make more work for us for no benefit. As the |
Because what they don't have is easy composability ( |
They will never know about any of those things as they will just be sticking to the API they are already familiar with. This API is catering for people who don't want to learn anything new...so they won't. In other words we will just be offering what they already have except with the additional burden of maintaining everything. |
I'll be maintaining the API anyway for intake, streamz and pandas at least so this is basically a moot point.
It's not about the precise incantation of this API, there will be differences in any case because I'm not copying 100 different (and inconsistent) matplotlib based options that the pandas matplotlib API uses. It's simply about familiarity and consistency, learning the incantations for a wide a range of elements is a lot of learning overhead and this API will smooth over those differences by providing APIs that are consistent within a few broad classes of plots starting with charts and statistical plots as shown in the intake example and in future for path/shape data and gridded data. A lot of the benefits they will immediately get for free, e.g. bokeh interactivity and composition, the only additional thing they will have to learn is parameter exploration through groupbys which is one extra argument and therefore not a huge leap to make. In any case, I'm happy to start by extending |
I respectfully but vehemently disagree with the idea that people won't learn something new and won't get anything out of this. The proposed API gives people starting with one of the dataset types an easy way in to discovering all the features of Bokeh and HoloViews, eliminating what is typically a terminally large band gap that most people, most of the time, will fail to jump over. |
Improving the |
You mentioned that this somehow doesn't conform to the vision outlined in the SciPy paper, but going through the core design principles 1-by-1 it seems to me none of that original vision is lost:
This still applies, the API simply provides a convenient and consistent way to assign your data a useful representation after which it reveals itself. I'd argue it's superior on the "must be easy" front due to improved consistency, something we probably can't address in HoloViews itself until version 3.0 maybe.
Also still applies, the API still outputs elements providing atomic wrappers around your data.
The core signature which consists of
The visual options are optional and whether these are specified in a separate Overall the visual options handled by the API basically reduces to the plot and style options that are shared across backends: e.g.
All API methods return compositional objects so none of this is lost. Personally I'm in love with the API because you get the ease of use and consistency of a pandas/xarray-like plot API with all the benefits of HoloViews - it's the best of both worlds:
Anyway, we can shop this API around with intake, pandas, dask, streamz, xarray and geopandas and if there's strong uptake we can make the decision at the HoloViews level. Therefore I'm happy to postpone this discussion. |
I'll read through your response shortly but for now I'll just say that my biggest problem is having a method called |
That is true, in other libraries the name makes sense, in HoloViews not so much. |
I want to be clear that I'm not trying to put my foot down and say 'no' to this idea - I think it is inevitable in some form and I am also against the current code duplication. I just want to find a way to make this API available and useful while not confusing the message about the separation between data/plotting/options that is at the core of the design. I think there are some ideas that make it more palatable to me, for instance documenting this on pyviz.org - a website explicitly about getting different tools to work together - which can point to holoviews.org (which of course can also point back). One thing which would make me happier would be if it was something like |
Or maybe Edit: I do realize |
I just checked..our abstract class is |
I would be pretty happy calling it |
I think it makes things worse, not better, if the same API has a different name in different contexts. For better or worse, it's called I vote for calling it the PyViz .plot() API, explaining that (a) that it's supported across pandas, xarray, streamz, geopandas, intake, holoviews, and geoviews, and (b) that regardless of the context, it returns HoloViews objects, which you can find out more about at holoviews.org but which effectively work like plots that can be composed. |
That isn't true as this won't render anything:
I can agree with all that, just saying that it returns a holoviews viewable object (and then in holoviews it is |
At the very least, we can alias |
Any other name makes it worse, not better, because it becomes more confusing, not less. And I didn't say |
The difference is that in other libraries which plot, |
That gives me an idea for another option, we could subclass Dataset in pyviz (or wrap it in some other way). Much in the same way pyviz provides a global place to get the imports it could provide a universal Dataset object, providing a starting place to explore a Dataset of almost any type. |
I have no objection to this approach if you are happy with it. |
And thus no one is going to do that with the other libraries, so it's a moot point... |
That is exactly the problem! They won't do that which means they won't have a handle on the result and therefore won't use the compositonality that holoviews offers! |
We can worry about whether a subclassed Dataset goes in PyViz or HV itself later; no rush on that... |
Okay, then the last thing for now, what to call the plotting API package, |
Maybe just |
I'm not sure what you mean; Dataset is in holoviews.core.data, which is in holoviews itself, and indeed in holoviews core? |
Yeah, sorry, I meant the new repo and library I'll be developing the API in. |
Maybe pyviz/pvplot. |
Why does it need a prefix? Why not |
For the import, |
When I say |
Not sure I like |
I vote for |
@rsignell-usgs |
@philippjfr , yes, I was following the scipy2018 pyviz tutorial and the first lesson was on It blows my mind that: url = 'http://thredds.ucar.edu/thredds/dodsC/grib/FNMOC/WW3/Global_1p0deg/FNMOC_WW3_Global_1p0deg_20180818_0000.grib1'
ds = xr.open_dataset(url)
ds['sig_wav_ht_surface'].hvplot(groupby='time1', clim=(0,5)) Here's the full notebook. Amazing! Thanks! |
Over the past few months I have been working on HoloViews based plotting APIs for a number of libraries including
intake
,streamz
andpandas
. In general I have borrowed heavily from the pandas DataFrame plotting API while mostly staying consistent with the HoloViews plot options. The API defines a plot namespace on the dataset objects of the respective libraries, which defines a wide array of plot types: including.area
,.bars
,.box
,.heatmap
,.histogram
,.kde
,.line
,.scatter
,.table
and.violin
. A fully fleshed out example for the intake library can be seen here.All of these APIs are almost identical and maintaining them separately does not make much sense, since any divergences will become quite annoying. Therefore I've been wondering whether it might not make more sense to introduce the API to HoloViews instead, providing an easy, and familiar introduction to HoloViews and a powerful companion to the
.to
interface.The
.to
interface is very powerful when dealing with tidy data, however we have long struggled to deal with wide data, where observations along some dimensions are grouped by column rather than row (see #2341, #2015, #2162). The plot interface provides a clean solution to this problem, automatically grouping and overlaying each column/variable. Adding this API toDataset
would I think be a good option, providing an easy, more familiar and consistent API to specify plots (while still constructing declarative HoloViews objects), which I think could be made highly consistent with the HoloViews API.As a brief summary I'll outline the two main ways of using the API:
x
/y
columns and optional an optionalby
kwarg to group the data by another variable (this spelling is very similar to the.to
method)use_index
or an explicitindex
column and optionally a list ofcolumns
to plot, which will overlay the different columns (useful for wide datasets).So far the interfaces I've designed only work with pandas/dask datasets but I'll soon be working on extending it to also cover xarray and geopandas types.
I think this API would compliment the explicit declarative approach to constructing HoloViews objects and would therefore be a very valuable addition to the core library. However we could also consider creating a new library for this interface, which other libraries could use, but this way we would not get the benefit of adding the plot interface to our Datasets (by default at least).
The text was updated successfully, but these errors were encountered: