Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hvPlot as the universal entry point to HoloViz tools #533

Open
3 of 11 tasks
jbednar opened this issue Dec 10, 2020 · 15 comments
Open
3 of 11 tasks

hvPlot as the universal entry point to HoloViz tools #533

jbednar opened this issue Dec 10, 2020 · 15 comments

Comments

@jbednar
Copy link
Member

jbednar commented Dec 10, 2020

HoloViz.org shows how all the various HoloViz tools (Panel, hvPlot, HoloViews, GeoViews, Datashader, Param, Colorcet, etc.) fit together, forming a coherent suite of complementary tools that add up to solve a very wide range of problems. However, it can be very difficult for new users to make sense of this ecosystem, not knowing whether to start with hvPlot, HoloViews/GeoViews, Panel, or Datashader, each of which can be used on their own for some things but which also make sense together. Many users end up either giving up and choosing a less-powerful but more approachable alternative outside of HoloViz, or they end up not choosing the right tool for the job, or they struggle longer than necessary to make sense of things before they can start becoming productive.

HoloViz.org was introduced as a way to solve this problem, with a tutorial introducing the various tools and telling people how and when to use each one. But it's still difficult, because each of the tools individually tells one story about what it's for and how to use it, and these stories each differ from the one at holoviz.org. Probably not that many people actually make it through all the material, and probably even fewer really retain what it is saying enough to be sure they are using the right tool for the job, and which subset of that tool's docs they need to focus on. Plus, we have dozens of separate examples scattered around each using some "best practice approach of year XXXX", adding to the information overload. Of course, we could go through and update our docs with the latest and best advice for each example, but the improvements have mostly been incremental, so the motivation to do all that work has been low. Even our best-practice advice has still been difficult to communicate and get a handle on, i.e. to use hvPlot to construct simple layouts, but then learn about Panel as soon as you need to control a widget, learn about HoloViews as soon as you need a stream or opts, and so on.

I think this situation has recently changed dramatically, and that we now have an opportunity to rally around a consistent and far simpler story for what to tell our users to do so that they are able to have the maximum power for the minimum investment (i.e., minimal learning of new APIs and concepts). Specifically, the recent introduction of the .interactive() API in hvPlot makes it possible for the first time to make very nearly all of the power that is available in HoloViz usable from hvPlot, using a simple, clean, and easy to explore API that users are already learning anyway (e.g. the Pandas or xarray APIs). hvPlot already provided access to much of the power of HoloViews, GeoViews, and Datashader, but Panel separately had an equally good claim to be a good starting point for using HoloViz, so we had to tell people "use Panel if you want to make apps and dashboards, and/or use hvPlot if you want to make plots". Now, it's finally plausible to say to start with hvPlot even if they want to make dashboards.

Capitalizing on this new situation, I propose that we use hvPlot as our single entry point to the HoloViz ecosystem, with each of our tools' docs loudly recommending that users approach via hvPlot, not that tool directly, and recommending that users only dive into the individual tools when certain clearly state-able conditions are met. I think if we can achieve a good experience there, we will make a very large fraction of the power supported by these tools available without users having to know or master:

  • defining functions or callbacks (nearly everything is just a method call, which is tab-discoverable)
  • defining classes or using decorators (used often under the hood, but not part of the user experience)
  • Elements, HoloMaps, value dimensions, key dimensions, dimensions in general (key concepts from HoloViews)
  • Canvas, glyphs, aggregations, or shading (key concepts from Datashader)

That way users who invest even a little effort in learning hvPlot will get a big payoff, those who invest just a little bit more will get even more, and if they stop there they can still do at least a large fraction of what HoloViz tools can do, without ever having to dive deeper.

Most of what it takes to make this happen is just documentation, but there are a few key limitations in the hvPlot-first approach that I've listed as separate issues:

  • holoviews#4739: can't get to streams without using hv concepts
  • hvplot#531: how to use .apply or other methods to achieve what can be done with DynamicMaps
  • panel#1824: Panel support for .interactive objects
  • panel#1826: widgets() helper function to make .interactive usable without learning about specific widget types
  • hvplot#532: hvPlot needs to return consistent object types

I imagine there are a few other rough edges, but if these can be addressed, I think we can have a much more compelling approach and starting point where we ruthlessly defend hvplot.holoviz.org as the starting point, never letting complex APIs or tricky concepts from other projects spill over into it, and ensuring that what it presents gives functionality without requiring wrestling with any of those APIs or concepts. We can then have a strict boundary between hvPlot and all other projects, happily sending people off from hvPlot into the other projects when truly necessary, but making it clear that it's a step up in complexity or commitment, and that staying with hvPlot alone may be all they ever need.

If we do go this route, in addition to the above functionality issues, I think we'll need:

  • Finish hvPlot's Reference Gallery. Basically, if it's not in there, we should assume people will not know it exists! So we can't be relying on the HoloViews or Panel reference galleries; we have to duplicate some part of that in an hvPlot-focused form that we fully expect to be all that many users will ever see.
  • Fully document hvPlot as if it were the only library available for a set of well-defined functionality. This may duplicate some material currently on other sites, but it's not really duplication, it's a subset with a very different (and larger) audience.
  • Bifurcate our set of examples, e.g. those at examples.pyviz.org: All those that are covered by hvPlot's functionality (or can be gently massaged to be covered), make them use only hvPlot-documented APIs and interfaces. Each other example should explicitly explain at the start why it needs the special features of library X, and should be prepared to defend and justify that choice.
  • Polish up our tab-completion support to give a good experience for users exploring what a given object can do or provide, getting them to a solution without having to consult the docs whenever possible.
  • Improve our online-docs-lookup experience; see this SO discussion

Of course, there are downsides of this approach, at least for now:

  • hvPlot only fully supports the Bokeh backend, so far. It's not a huge amount of work to support other backends, but Matplotlib and Plotly are both second-class citizens for hvPlot, usable but not mapping Pandas plotting options in the same way. So for now we have to be ok saying that if users want Matplotlib or Plotly plots, hvPlot is not the starting point for them.
  • hvPlot isn't declarative the same way HoloViews is, requiring information to be provided at plotting time that could have been declared up front in HoloViews. That's true, but is data-backend specific; e.g. xarray already allows a lot of the semantic metadata to be stored persistently, and there are proposals to let Pandas store that metadata as well (Allow custom metadata to be attached to panel/df/series? pandas-dev/pandas#2485, potential for adding metadata to pandas data frames jupyterlab/jupyterlab-metadata-service#10). While it's a valid objection, especially for Pandas users, in practice most users generally create their HoloViews objects on the fly anyway, at which point there's no net loss -- whether the semantic metadata is provided in a throwaway HoloViews Element constructor or in a throwaway .plot() call, it's just as lost. So the solution here is to improve the capabilities of the underlying libraries, which isn't really an issue with using hvPlot per se.

How do people feel about this overall plan and vision?

@julioasotodv
Copy link

IMHO this has more to do with hvPlot visibility more than anything else.

I give lectures on data visualization with Python in a Msc, (mostly using the holoviz ecosystem). I start by showing how Bokeh works and how extensible/customizable it is.

Then I switch to HoloViews, and the API design is very polarizing: for some students thnking in kdims and vdims becomes second nature, whereas others struggle.

After that I present GeoViews and datashader and pretty much everyone likes their capabilities.

By the time I start talking about hvPlot, lots of students ask me: Why didn't we start with hvPlot in the first place? It is simple, and similar to what I'd say a lot of people expect for a chart: defining its elements based on columns/dimensions they have got in their dataset. It's straightforward and simple.

I'd say that hvPlot's largest drawback is the lack of popularity. Bokeh is popular because it has been around for quite some years, but most people don't know about hvPlot. And by the looks of it, I would say that most of the Pythonistas creating interactive charts would find in hvPlot 99% of what they need. Even in data visualization, a lot of users only want something that works (this is, as high level as possible). That is also why Plotly rolled out Plotly Express, and basically they encourage users to use it as the main API. In fact, Plotly Express is all over Plotly's docs. If it weren't for Plotly Express being announced everywhere, I'm sure lots of users would have given up on Plotly (not because it is hard, but because it is inconvenient if everything you want is a simple, fast interactive chart).

If more people knew hvPlot exists, the Holoviz ecosystem would probably end up having a larger userbase. If they want more fine-grained detail, they can allways fall back to hv / bokeh.

My two cents :)

@WesleyTheGeolien
Copy link

Having picked up the library recently I must say I have been confused into which library I have been using notably between what the difference between holoviews and hvplot was (I don't think it helped that I import holoviews as hv -> which is similar to hvplot) so yeah having a "correct" starting place is a great idea 👍

I also agree with the best practices I fancy(?) the API is different between different packages or maybe like you say the documentation uses the standard of year XXXX and I took that as the golden standard (I was thinking of geoviews.Dataset().to(geoviews.SOME_PLOT) whereas holoviews does holoviews.SOME_PLOT(holoviews.Dataset)

Not sure I can chime in on some of the more technical questions

@jbednar
Copy link
Member Author

jbednar commented Dec 18, 2020

If more people knew hvPlot exists, the HoloViz ecosystem would probably end up having a larger userbase. If they want more fine-grained detail, they can always fall back to hv / bokeh.

I completely agree, and indeed that's one of the main reasons we created hvPlot, so that these tools would be accessible to a larger userbase. Here I'm basically saying that this plan has now been successful on a technical level, so let's do what it takes to make it successful on a community level, by inverting all of our messaging so that hvPlot is the first story, not an afterthought. Let's ignore history and tell the story the right way now!

@SandervandenOord
Copy link
Contributor

My position is that visual analytics should be at the speed of thought.
It should take little (mental) effort to create the plots I need. This is why I use hvplot.

It's very similar to pandas plotting (df.plot()), which is why it was relatively easy to switch to.

Only when you know a library well, you can be highly effective and fast in it. That's why I don't use bokeh, plotly and matplotlib. Takes much more time to know it well and I need too many lines of code to get things done, which is slowing analysis down.

The only serious interactive alternative is plotly express.

The .interactive() feature could be a killer feature as it would make it even easier to explore data quickly.

If you need help on hvplot, I'm more than willing to help.

@jbednar
Copy link
Member Author

jbednar commented Dec 18, 2020

Thanks! Also see https://discourse.holoviz.org/t/using-the-new-interactive-with-pandas-not-xarray/1583/5, where some of these issues are discussed further.

@mycarta
Copy link

mycarta commented Dec 19, 2020

@jbednar : so, if one were to start from day zero again, would this be the right place (if you like video tutorials)?
PyViz Unifying Python Tools for In Browser Data Visualization | SciPy 2018 (introducing hvplot at 12'33" and onwards)

@MarcSkovMadsen
Copy link
Collaborator

I also think hvplot would be the right entry point. Together with a modernized styling of the underlying bokeh engine.

With that you technically have a more powerful and extensible package than Plotly and plotly express. And a lot of Pandas users who would understand what you/ we are talking about. Then there is still the docs, examples, community and communication where Plotly is light years ahead.

One thing for me to understand. How would it be possible to not introduce Panel just a little bit if .interactive is a the center of hvPlot? You need the widgets from somewhere.

@mycarta
Copy link

mycarta commented Dec 19, 2020

recommending that users only dive into the individual tools when certain clearly state-able conditions are met

I think the overall plan sounds awesome. I also think listing those conditions somewhere in the documentation right now would already help new users.

@MarcSkovMadsen
Copy link
Collaborator

After having looked at the hvplot site I would think it needed an overhaul. The current site does not signal that hvPlot is at the center of HoloViz. The documentation is sparse and there is not even a search function.

@jbednar
Copy link
Member Author

jbednar commented Dec 20, 2020

Right; before making it the center of the universe, I wanted to get some buy-in. The current site reflects the point in history at which hvPlot was introduced -- a bit cleaner design, a bit simpler story, but still quite sparse because most things were already documented on other sites, and hvPlot was an afterthought. To change that requires a lot of work, which we can do if there's agreement it's a good idea, which so far there seems to be.

@MarcSkovMadsen , for the widgets, I have a partial proposal at holoviz/panel#1826 , where I propose doing something like interactive does to infer widgets from scalars and ranges. I don't think this approach would address the ambitious apps that you yourself are building, but that's not the point of this proposal; detailed control over widgets can still be passed off to Panel (particularly if pointing to specific sections of the Panel website, such as the widgets reference gallery). The goal would be for most users most of the time to not need to go figure out the name of such a widget or to look up its detailed options, but for it to be clear that if people need to do that they are welcome to do so.

@mycarta , sure, that's a reasonable starting point for plotting with hvPlot, in that the material there hasn't changed since then. It doesn't cover building apps using your hvPlot objects, which previously would have required callbacks and decorators, and now with .interactive only requires method calls. The hope is to be able to do viz and even apps "at the speed of thought", as @SandervandenOord says (as long as you think in terms of Pandas or Xarray API calls!).

@mycarta
Copy link

mycarta commented Dec 20, 2020

The hope is to be able to do viz and even apps "at the speed of thought", as @SandervandenOord says (as long as you think in terms of Pandas or Xarray API calls!).

That would be fantastic!!! For my part, I cannot foresee needing much more than that in my scientific computing and explorations, whether by day (job) or night (hobby).

@SandervandenOord
Copy link
Contributor

Here's some of my thoughts still:

  • HoloViews has such an interesting and original way of looking at data and visualizing data. I hope that doesn't get lost when putting more focus on hvplot.

  • Users have a need for quick plotting, but there's quite some good alternatives, plt.plot(), df.plot(), sns.scatterplot(), px.scatter(), df.hvplot(). So what do users need/want? I still don't understand that people choose plt.plot() over any of the other possibilities. How often does one need a completely customized specifically designed plot?

  • What i wonder is that if I check this page, there are so relatively few downloads for both hvplot but also plotly express: https://pyviz.org/tools.html
    The normal plotly has 3 million downloads a month, plotly express has only 65k a month. Although maybe that's not fair, since plotly express is now 'embedded' into plotly and you can just do: import plotly.express as px. But bokeh has 1.5 million, holoviews 140k and hvplot 75k. Curious how these numbers have been developing over time. Is hvplot catching up on holoviews? There are more questions on holoviews than on hvplot on stackoverflow and discourse.holoviz.org.

@jbednar
Copy link
Member Author

jbednar commented Dec 21, 2020

HoloViews has such an interesting and original way of looking at data and visualizing data. I hope that doesn't get lost when putting more focus on hvplot.

Definitely. The original way that HoloViews looks at data is both its strength and its liability. It gets power from its deep model of how data works and what it means to plot it, but for users to exploit that power, they have to expand their own mental models to accommodate HoloViews. hvPlot takes the opposite approach: try to surface as much of the power available in HoloViews while staying roughly within the mental model people already have. At first it wasn't clear to me just how much could be made available in that way, but with .interactive it's clear to me that very, very much can be! The native HoloViews API will still be available for people like @SandervandenOord and the HoloViz developers who have developed that mental model, but in most cases people will no longer have to embrace the underlying HoloViews philosophy and model; they can benefit from it anyway!

Users have a need for quick plotting, but there's quite some good alternatives,

I've argued that of the available high-level APIs, it makes sense for users to focus on the Pandas .plot API because it's the only one that is available from a wide variety (at least 6!) of different libraries with different strengths. Thus given that no user can ever learn all APIs, and that most users will simply learn one and get back to what they were doing, I feel confident strongly recommending that that one API is .plot, regardless of whether you choose hvPlot.

What i wonder is that if I check this page, there are so relatively few downloads for both hvplot but also plotly express:

Download counts are very hard to reason about, but there are many reasons why low-level tools like Matplotlib and Bokeh will have higher download counts than high-level tools:

  • Often the low-level tool is much older, which means that it had first-mover and networking effects (recommendations in courses, tutorials, blogs, etc.), making most people find it sooner than high-level tools
  • Viz tools are very sticky -- once you've learned one API, it's very hard to justify learning another, so again older tools have a strong advantage
  • Both low-level and older tools are more likely to be integrated into CI systems for other libraries, which can inflate download counts
  • A low-level tool is typically installed alongside every high-level tool installed. E.g. every hvPlot conda download is also a HoloViews and Matplotlib and Bokeh download. Separating core and user packages (matplotlib vs. matplotlib-core) can help here, but only if other tools respect that distinction, which again is sticky

Personally, I strongly believe that nearly all users should start with a high-level tool and leave it behind only when they outgrow it or get frustrated with its limitations. So I think there should be vastly more people who use hvPlot than Bokeh or Matplotlib's native APIs, if things were able to be made rational at this point. But first-mover and history and network effects and fear of lockin instead lead users to start with low-level tools, making their lives much more difficult than they need to be. This proposal is basically proposing to (a) really polish and defend hvPlot to make it a suitable solution for the vast majority of users, then (b) try to point people to hvPlot directly from as many locations as possible so that they don't get stuck in complexity or learning curves that they don't actually have to navigate to solve their problems.

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Dec 22, 2020

What does the developers of holoviews think? Will they be just as happy to continue contributing to it? Or will they find it less attractive? and the development stop?

@jbednar
Copy link
Member Author

jbednar commented Dec 23, 2020

hvPlot is only a thin layer on top of HoloViews. Nearly all innovation and bugfixing happens at the hv level, automatically fixing hvPlot or adding features to it. So there's no danger of hv atrophying. The big issue will just be that things will need documenting at the hvPlot level for people to realize they exist. So fixes that make things "just work" will have no extra cost; ones that require explaining will have to be evaluated for whether they are surfaced into the hvPlot universe. Either way, they'll exist first in HoloViews and only then, optionally, in hvPlot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants