Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: xarray.interactive module #3709

Closed
TomNicholas opened this issue Jan 20, 2020 · 36 comments
Closed

Feature Proposal: xarray.interactive module #3709

TomNicholas opened this issue Jan 20, 2020 · 36 comments

Comments

@TomNicholas
Copy link
Contributor

Feature proposal: xarray.interactive module

I've been experimenting with ipython widgets in jupyter notebooks, and I've been working on how we might use them to make xarray more interactive.

Motivation:

For most users who are exploring their data, it will be common to find themselves rerunning the same cells repeatedly but with slightly different values.
In xarray's case that will often be in an .isel() or .sel() call, or selecting variables from a dataset.
IPython widgets allow you to interact with your functions in a very intuitive way, which we could exploit.
There are lots of tutorials on how to interact with pandas data (e.g. this great one), but I haven't seen any for interacting with xarray objects.

Relationship to other libraries:

Some downstream plotting libaries (such as @hvplot) already use widgets when interactively plotting xarray-derived data structures, but they don't seem to go the full N dimensions.
This also isn't something that should be confined to plotting functions - you often choose slices or variables at the start of analysis, not just at the end.
I'll come back to this idea later.

The default ipython widgets are pretty good, but we could write an xarray.interactive module in such a way that downstream developers can easily replace them with their own widgets.

Usage examples:

# imports
import ipywidgets as widgets
import xarray.plot as xplot
import xarray.interactive as interactive

# Load tutorial data
ds = xr.tutorial.open_dataset('air_temperature')['air']

Plotting against multiple dimensions interactively

interactive.isel(da, xplot.plot, lat=10, lon=50)

isel_lat_and_lon

Interactively select a range from a dimension

def plot_mean_over_time(da):
    da.mean(dim=time)
interactive.isel(da, plot_mean_over_time, time=slice(100, 500))

mean_over_time_slice

Animate over one dimension

from ipywidgets import Play
interactive.isel(da, xplot.plot, time=Play())

Play

API ideas:

We can write a function like this

interactive.isel(da, func=xplot.plot, time=10)

which could also be used as a decorator something like this

@interactive.isel(da, time=10)
def plot(da)
    return xplot.plot(da)

It would be nicer to be able to do this

@Interactive(da).isel(time=10)
def plot(da)
    return xplot.plot(da)

but Guido forbade it.

But we can attach these functions to an accessor to get

da.interactive.isel(xplot.plot, time=10)

Other ideas

Select variables from datasets

@interactive.data_vars(da1=ds['n'], da2=ds['T'], ...)
def correlation(da1, da2, ...)
    ...

# Would produce a dropdown list of variables for each dataset

Choose dimensions to apply functions over

@interactive.dims(dim='time')
def mean(da, dim)
    ...
    
# Would produce a dropdown list of dimensions in the dataarray

General interactive.explore() method to see variation over any number of dimensions, the default being all of them.

What do people think about this? Is it something that makes sense to include within xarray itself? (Dependencies aren't a problem because it's fine to have ipywidgets as an optional dependency just for this module.)

@TomNicholas
Copy link
Contributor Author

Difficulties with method chaining

Arbitraily long method chaining would be great, i.e.

da.interactive.isel(time=10).mean('time').plot()

but I think it will be considerably more complicated.

The problem is that the way the ipywidgets.interactive() function works means that each time a widget value is altered (e.g. a slider dragged to a new position), then the function wrapped by interactive must be recomputed.
For single functions that's fine, but for method chaining it means the final .plot() method has to know about all the previous methods back up to the .interactive input.

I've found a way to get around this, but I'd like some feedback on the approach because it might be needlessly complicated.

I would like to do it by subclassing to create an InteractiveDataArray, which you could create with an interactive accessor method like

ida = da.interactive.isel(time=10)

This class would store the widgets and decorate it's inherited methods to either propagate them (e.g. through ida.reduce()) or display them (e.g. after ida.plot()).
It would define the _ipython_display_() method so that calling display(ida) revealed the widgets.

To allow for the final method to recompute all the previous steps, each inherited computation method would be wrapped by a decorator which records the function used and it's arguments.
That way the final method (which really you know will either be .plot(), or __print__()) can revaluate it's whole history when the slider tells it to recompute.

I've got a very rough example of this working, but as I said there might be a much easier way...

method_chaining

@TomNicholas
Copy link
Contributor Author

TomNicholas commented Jan 20, 2020

(also I realise that the suggestion at the end is similar to a task graph of dask.delayed objects, but I assume something will go wrong if I try to wrap dask arrays with xarray dataarrays with dask delayed?)

@benbovy
Copy link
Member

benbovy commented Jan 21, 2020

This looks fantastic @TomNicholas!!

IMHO, I would rather see this maintained in a separate project (something like ipyxarray ? or xarray-interactive as you already suggests). Adding an optional dependency is not really a problem indeed, but it's more about trying to avoid adding too much maintenance burden to this repository (issues/PRs list, CI, etc.).

@mathause
Copy link
Collaborator

related: #2034

@TomNicholas
Copy link
Contributor Author

IMHO, I would rather see this maintained in a separate project

Yeah that's a fair point. I think this is another case where the ecosystem of packages orbiting xarray could do with being more explicitly organised.

Reasons for direct integration in xarray:

  • Availability to all users: Functionality should be of general interest to anyone using xarray with jupyter, it's not domain-specific at all,
  • Makes writing robust code a bit easier because can then rely on private xarray methods for parsing indexers and so on

Reasons for a separate xarray-interactive repository:

  • Keeps developer maintenance / issue tracking separate
  • If plotting library-specific interfaces are desired they can be adding without cluttering main repo

I guess either way I could just write it in a separate repo and if in future we decided to include it in xarray master then move it.

@philippjfr @rabernat would be interested in your perspectives as developers/users of these downstream libraries? Would this be useful or not really?

@philippjfr
Copy link

This looks really cool and I like the API! I'll have to give it a try to give more detailed feedback. Note that I'm not a core developer of xarray but I also think this is best managed as an external project.

Just wanted to ask some clarification on some of your comments.

Some downstream plotting libaries (such as @hvplot) already use widgets when interactively plotting xarray-derived data structures, but they don't seem to go the full N dimensions.

What do you mean by this? hvPlot does let you explore n-dimensional data using widgets, what is the limitation you were seeing there?

This also isn't something that should be confined to plotting functions - you often choose slices or variables at the start of analysis, not just at the end.

This is a good point, but I guess I'm not yet entirely clear on how your proposed APIs would deal with this.

@TomNicholas
Copy link
Contributor Author

This looks really cool and I like the API!

Great!

I'll have to give it a try to give more detailed feedback.

Thanks, but it's definitely not ready for that yet, I'll post here and tag you when it is.

What do you mean by this? hvPlot does let you explore n-dimensional data using widgets, what is the limitation you were seeing there?

I had a go with hvPlot's gridded data classes and although it worked well for plotting variation along one dimension with a single slider, I got some errors when I tried to plot N-D data with multiple slider widgets along more than one dimension. It looks like that might have been user error though... I'll compare more closely and raise issues if necessary.

I'm not yet entirely clear on how your proposed APIs would deal with this.

I'm referring to the discussion on method chaining: that proposed API (using an InteractiveDataArray) would allow you to interactively select a subset of data

ida = da.interactive.isel(lat=50, lon=60)

before specifying the analysis to perform on it

ida = (ida - ida.mean('time')).std(dim='time')

and an ida.plot() or compute call on the same object later would still be tied to the original sliders. That's quite different to only being able to create the sliders in the final call.

@dcherian
Copy link
Contributor

Also see https://xrviz.readthedocs.io/en/latest/ and napari/napari#14 (napari/napari#14)

@TomNicholas
Copy link
Contributor Author

Thanks @dcherian , I hadn't seen those.

I think the difference between what I'm proposing here and what already exists (e.g. in holoviews, xrviz, etc.) is considering interactivity as something that is useful independent of plotting.

The aim would be to allow interactive parameterization of arbitrary functions, which could (and often would) be plotting functions, but could actually be anything. That way analysis can be interactively parameterized, and the plotting can be handled by any library. (Plotting libraries could also choose to reuse these interactivity functions, but wouldn't have to.) I think that approach would integrate well with being able to change plotting backends too (#3553).

@benbovy
Copy link
Member

benbovy commented Jan 22, 2020

The aim would be to allow interactive parameterization of arbitrary functions, which could (and often would) be plotting functions, but could actually be anything.

That would be awesome! I have a strong interest in that with xarray-simlab, i.e., setting-up model parameters and running simulations interactively.

@jbednar
Copy link
Contributor

jbednar commented Jan 22, 2020

I think the difference between what I'm proposing here and what already exists (e.g. in holoviews, xrviz, etc.) is considering interactivity as something that is useful independent of plotting.

The interactive widgets in holoviews and xrviz are obtained from Panel, which is a separate library that is already explicitly designed for specifying and constructing interactivity independent of plotting. E.g. we often use Panel widgets with no plotting to set up simulations or analyses interactively, then run whatever we specified. The interactive function in Panel already works much like what you laid out above, unless I'm missing something.

It sounds like you're hoping for something that is independent of plotting (like Panel) and provides interactive widgets (like Panel) but also has specific support for multidimensional arrays (like HoloViews)? I don't think that's much code, but it could be useful to provide for Xarray in a convenient API.

@philippjfr
Copy link

philippjfr commented Jan 22, 2020

I think the real power in this proposal is in the ability to chain operations on interactive components using an API that will be familiar to xarray users. We have a similar concept in HoloViews which allows you to build complex processing and visualization pipelines. I'll work through some examples in HoloViz ecosystem to show what is possible there and maybe provide some ideas or approaches that might work here.

Let's work with a relatively contrived but simple example and load the air_temperature sample dataset:

airtemps = xr.tutorial.open_dataset('air_temperature')
ds = hv.Dataset(airtemps)

In this example you explode your dataset into individual chunks for each longitude, then apply a reduction along the latitude and finally cast the output to a Curve giving us a Curve of the mean temperature at each longitude:

curves = ds.groupby('lon', dynamic=True).apply.reduce(lat=np.mean).apply(hv.Curve).opts(width=600, framewise=True)

Screen Shot 2020-01-23 at 12 15 27 AM

Now we decide we want to resample the data too, so we import the resample operation and apply it to our existing pipeline:

from holoviews.operation.timeseries import resample
resample(curves, rule='7d')

Screen Shot 2020-01-23 at 12 15 03 AM

But really we don't just want to compute the mean we want to pick the reduce function and we also want to be able to set the resampling frequency and pick a color. By combining Panel and HoloViews you can inject widget parameters at every stage:

function = pn.widgets.Select(name='Function', options={'mean': np.mean, 'min': np.min, 'max': np.max})
color = pn.widgets.ColorPicker(name='Color', value='#000000')
rule = pn.widgets.TextInput(name='Rule', value='7d')

obj = (ds.groupby('lon', dynamic=True)
 .apply.reduce(lat=function)
 .apply(hv.Curve)
 .apply.opts(width=600, color=color, framewise=True)
 .apply(resample, rule=rule)
)

hv_pane = pn.pane.HoloViews(obj)

pn.Row(
    hv_pane[0],
    pn.Column(*hv_pane[1][0], function, color, rule)
)

Screen Shot 2020-01-23 at 12 17 13 AM

So this shows pretty clearly how useful this kind of chaining/pipeline building can be, especially when built on top of an API like xarray which allows for very powerful data manipulation. I don't have enough of a perspective to say how feasible it would be to implement something like this that comprehensively wraps xarray's API but I'd certainly love to see it. Whether it is built on Panel (which I am of course partial to as the author) or ipywidgets or even supporting both.

My main comments therefore are about the API, it is not clear to me based on what you have said so far which parts of the API are actually interactive, e.g. in this case:

ida = da.interactive.isel(lat=50, lon=60)
ida = (ida - ida.mean('time')).std(dim='time')

Is only sel/isel ever interactive or can other methods also be interactively set? If the answer is no then that's all clear enough and the scope relatively narrow but well defined. If however you intend the entire API (or at least some well defined subset of it) to be interactive then I think there should be some explicit way to declare which parts are interactive and where the values are coming from (and what the values should be if they can't be automatically determined). In the HoloViews example I showed above you explicitly supply widgets but if you don't want users to deal with manually laying things out then you could also just let the user supply the specification of the valid values. Something like in your first example:

interactive.isel(da, plot_mean_over_time, time=slice(100, 500))

but expanded to include support for discrete lists of items, explicit widgets, and so on.

Hope that's at all helpful! I think the idea is really neat and it could be very powerful indeed.

@philippjfr
Copy link

One thing I didn't mention above is that in the pipeline I showed HoloViews will cache the intermediate changes so that if you change the color or change the resampling frequency it only executes the part of the pipeline downstream from where the parameter changed.

@TomNicholas
Copy link
Contributor Author

It sounds like you're hoping for something that is independent of plotting (like Panel) and provides interactive widgets (like Panel) but also has specific support for multidimensional arrays (like HoloViews)? I don't think that's much code, but it could be useful to provide for Xarray in a convenient API.

Thanks @jbednar , I think that's a good summary of most of what I was imagining.

I think the real power in this proposal is in the ability to chain operations on interactive components using an API that will be familiar to xarray users.

Yes exactly. There will be a lot of users who do their work in xarray and being able to achieve interactivity in their existing workflows with almost exactly the same API would improve their experience without presenting much of a barrier to adoption.

Thanks for the (impressive) example @philippjfr !

I think there should be some explicit way to declare which parts are interactive

I was imagining that functions/methods following the .interactive accessor was the only place where interactivity occurred, but it might well be possible to do it more generally than that and still keep it intuitive.

I didn't appreciate exactly how much of this panels/holoviews can already do - I think I need to go away and experiment with using/wrapping them but aiming for an xarray-like syntax.

@philippjfr
Copy link

I didn't appreciate exactly how much of this panels/holoviews can already do - I think I need to go away and experiment with using/wrapping them but aiming for an xarray-like syntax.

Maybe wait until early next week when I anticipate new Panel and HoloViews releases to be out which smooth out some issues with these workflows.

@jbednar
Copy link
Contributor

jbednar commented Jan 23, 2020

I didn't appreciate exactly how much of this panel/holoviews can already do

On the one hand, yes, HoloViews + Panel is quite powerful and clean for what it can already do. But just so everyone is on the same page, the workflow @philippjfr shows above is only possible for the operations that HoloViews has implemented internally. The operations available in HoloViews are only a small subset of what can be done with the native Xarray or Pandas APIs, and adding new capability like that to HoloViews is difficult because HoloViews supports many different underlying data formats (lists, dictionaries, NumPy, Pandas, Xarray, etc.). So while there are advantages to what's already available in HoloViews:

  • Same syntax for working with a wide variety of data libraries or native Python types
  • Easy interactive, reactive pipelines (lazy operations that replay on demand)
  • Native support for multiple plotting libraries

there are also major disadvantages:

  • You have to learn HoloViews syntax for operations you quite likely already know how to do in your data library of choice (Xarray, Pandas, etc.)
  • The supported operations aren't ever going to be as rich as what's available from individual specific libraries

Note that hvPlot injects the plotting capability from HoloViews into Xarray and Pandas, letting you use the native data APIs for plotting, but it doesn't give you the control over lazy/interactive/reactive pipelines that HoloViews' native API offers. So to me what this issue's proposal would entail is taking the idea of hvPlot further, making Xarray (and Pandas) natively act like HoloViews already does -- with lazy operations where interactive controls can be inserted at every stage, letting people stay in their preferred rich, native data API while having the power to easily make anything interactive and to easily make anything visualizable.

@philippjfr
Copy link

philippjfr commented Apr 7, 2020

Having taken the ideas presented here as inspiration the latest HoloViews release actually extends what we had described above and provides the capability to use arbitrary xarray methods to transform the data and control the parameters of those transforms using Panel based widgets. The HoloViews docs show one such example built on xarray which is built around so call dim expressions:

import panel as pn
import xarray as xr

air_temp = xr.tutorial.load_dataset('air_temperature')

# We declare a dim expression which uses the `quantile` method from the `xr` namespace
# and provides a panel FloatSlider as the argument to the expression 
q = pn.widgets.FloatSlider(name='quantile')
quantile_expr = hv.dim('air').xr.quantile(q, dim='time')

# We now wrap the xarray Dataset in a HoloViews one, apply the dim expression and cast the result to an image
temp_ds = hv.Dataset(air_temp, ['lon', 'lat'])
transformed = temp_ds.apply.transform(air=quantile_expr).apply(hv.Image)

# Now we display the resulting transformation pipeline alongside the widget
pn.Column(q, transformed.opts(colorbar=True, width=400))

transform

I am likely to integrate this capability with hvPlot with a more intuitive API, e.g. in this case I'd expect to be able to spell this something like this:

xrds = xr.tutorial.load_dataset('air_temperature')
q = pn.widgets.FloatSlider(name='quantile')
quantile_expr = hv.dim('air').xr.quantile(q, dim='time')
xrds.hvplot.image(transforms={'air': quantile_expr})

@jbednar
Copy link
Contributor

jbednar commented Apr 7, 2020

Thanks, @philippjfr!

What Philipp outlines above addresses the key limitation that I pointed out previously:

The operations available in HoloViews are only a small subset of what can be done with the native Xarray or Pandas APIs, and adding new capability like that to HoloViews is difficult

As of HoloViews release 1.13.2 that limitation is now completely gone, because a HoloViews interactive operation pipeline can now invoke arbitrary Xarray or Pandas API calls. So you're no longer limited to what has been encapsulated in HoloViews, and you can use the native Xarray method syntax that you're used to. Thus it's now possible to achieve most (all?) of the functionality discussed above, i.e. easily constructing arbitrarily deep Xarray-method pipelines with interactive widgets controlling any step along the way, replaying only that portion of the pipeline when that widget is changed.

So, what's left? As Philipp suggests, we can make the syntax for working with this functionality simpler in hvPlot. At that point we should probably show the syntax required for each of the interactive pipelines demonstrated or suggested in this issue, and see if there's any change to Xarray that would help make the syntax easier or more natural for Xarray users. Either way, the power is now there already!

@TomNicholas
Copy link
Contributor Author

This looks absolutely great @philippjfr ! I would be keen to help you and @jbednar with making the syntax as intuitive and familiar as possible for xarray users. If you have any relevant issues/PR's in holoviews or here then please tag me :)

@philippjfr
Copy link

@TomNicholas I've been playing around with an interactive accessor, very much an experiment for now (and requires some small fixes in HoloViews) but I think this could be heading in the right direction:

https://anaconda.org/philippjfr/xarray_interactive/notebook

@jbednar
Copy link
Contributor

jbednar commented Apr 26, 2020

That is so cool! I think the syntax is already as good as I can imagine.

@TomNicholas
Copy link
Contributor Author

@philippjfr that looks incredible!

The accessor syntax is exactly what I was imagining too, great job.

requires some small fixes in HoloViews

I would love to have a go, plus I had a few other ideas I would like to try out - is there a branch somewhere I could check out to get it going locally?

@max-sixty
Copy link
Collaborator

This is very cool, nice work @philippjfr !

@StanczakDominik
Copy link
Contributor

That's amazing. This would single-handedly turn xarray from "nice to have, pretty useful" to "I recommend it to all my friends". I would absolutely love to be able to use it.

@jbednar
Copy link
Contributor

jbednar commented Dec 16, 2020

hvPlot's .interactive() support for xarray and pandas was released in in hvPlot 0.7.0 (installable with conda install hvplot=0.7) and is now documented on the website.

There are a few things I think we can still improve (listed at holoviz/panel#1824, holoviz/panel#1826, holoviz/hvplot#531, holoviz/hvplot#533), but it's already really fun to use -- just take your xarray or pandas pipeline da.method1(val1=arg1).method2(val2=arg2,val3=arg3).plot(), add .interactive, and then substitute a Panel widget or ipywidget for any of the arguments: da.interactive.method1(val1=widget1).method2(val2=arg2,val3=widget2).plot()

You can use this with the native .plot() plotting, interactive .hvplot() plots, or pretty much anything you can get out of such a pipeline (table, text, etc.). Try it out and let us know how it goes (here, on one of the issues linked above, or in a new issue at https://github.com/holoviz/hvplot/issues)! Thanks for all the suggestions and ideas here...

@jbednar
Copy link
Contributor

jbednar commented Jul 28, 2021

Update: hvPlot's .interactive support has been greatly improved and expanded in the new hvPlot 0.7.3 release. It is now showcased at holoviz.org, which introduces how to use hvPlot to build plots, then how to use xarray .interactive and pandas .interactive to add widgets (whether to hvPlot plots or to anything else, including .plot output or tables or xarray reprs). There are still plenty of improvements to make, but apart from documenting .interactive in xarray's docs, I would think this issue can now be closed.

@TomNicholas
Copy link
Contributor Author

@jbednar that all looks amazing! Can't wait to properly try it out.

Given that much of what I imagined is now available in holoviews, I will close this issue now. But if you would like to raise a PR pointing towards this functionality somewhere in xarray's docs (maybe either as a more detailed description in the Ecosystem page or as a note in the plotting page of the user guide) then that would be welcome!

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Nov 5, 2021

Just for completeness. You can find @philippjfr PyData 2021 .interactive talk here https://pydata.org/global2021/schedule/presentation/51/build-polished-data-driven-applications-directly-from-your-pandas-or-xarray-pipelines/. Quite powerful.

image

Inspired by that I've created a gist here https://gist.github.com/MarcSkovMadsen/e666503df2aa1d8d047dcb9555b5da6d. It's for a pandas DataFrame. But the principle is the same for xarray.

hvplot-interactive-speedup15.mp4

@TomNicholas
Copy link
Contributor Author

Just for completeness. You can find @philippjfr PyData 2021 .interactive talk here

Oh awesome! Can I watch this talk anywhere? That link just seems to have a summary.

@jbednar
Copy link
Contributor

jbednar commented Nov 5, 2021

I'm not sure if this link will expire, but until it's on youtube, you can watch the talk at https://zoom.us/rec/play/DzaWjz_hMBP23Vqv7T5jPcY1zU4fps2ZL-yAi8MyM5-lbYq-ZQS4ejWMzwxRW53vGu2F1DybYiKSb8M.mYwmkdDSK6ECc8Ux?startTime=1635508803000&_x_zm_rtaid=hMxhM6kwS-ae1hLStT7UXA.1635955310424.1ade0b45b8e3297ff743d3acc0aa08e1&_x_zm_rhtaid=397

@MarcSkovMadsen
Copy link

I meant to at this link to the PyData Talk on .interactive including video https://discourse.holoviz.org/t/pydata-2021-build-polished-data-driven-applications-directly-from-your-pandas-or-xarray-pipelines/3017/4

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Jan 19, 2022

Sophia Yang and I wrote a blog post about hvplot interactive. It's based on Pandas dataframes but it works the same way for Xarray. Check it out https://towardsdatascience.com/the-easiest-way-to-create-an-interactive-dashboard-in-python-77440f2511d1

the-easiest-way

You can also find the repo and links to binder+colab here https://github.com/sophiamyang/hvplot_interactive

hvplot-interactive-binder (2)

@nvaytet
Copy link

nvaytet commented May 30, 2022

Just been sent a link to this discussion after having worked on something very similar for our project (which resembles Xarray in many ways): scipp/scipp#2573
I am now wondering if we could somehow use the .interactive approach for our needs instead.

@philippjfr how much work would it be to implement an .interactive method for our own classes? Our DataArray is slightly different from Xarray's. Thanks!

@philippjfr
Copy link

We'd probably have to write a so called HoloViews DataInterface for scipp. See the equivalent xarray implementation: https://github.com/holoviz/holoviews/blob/master/holoviews/core/data/xarray.py

@nvaytet
Copy link

nvaytet commented May 31, 2022

Great, I'll look at that implementation. Thanks!

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Oct 27, 2023

FYI. This has concept has now been generalized further by @philippjfr into Reactive Expressions which is now a part of Param. See https://param.holoviz.org/user_guide/Reactive_Expressions.html

Here are a couple of examples with Panel

reactive-expressions-basic

reactive-expressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants