Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: first draft of MPL artist #200

Closed
wants to merge 1 commit into from
Closed

Conversation

tacaswell
Copy link
Contributor

@tacaswell tacaswell commented Jul 16, 2016

Minimal datashader aware matplotlib artist.

from datashader.mpl_ext import DSArtist
import matplotlib.pyplot as plt
import matplotlib.colors as mocolor

fig, ax = plt.subplots()
da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm());
ax.add_artist(da); ax.set_aspect('equal');

fig.colorbar(da)

so

This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.

Minimal datashader aware matplotlib artist.
@tacaswell
Copy link
Contributor Author

attn @mdboom @astrofrog

@jbednar
Copy link
Member

jbednar commented Jul 18, 2016

Looks great, thanks! I'll try it out and merge if it's all ok.

@tacaswell
Copy link
Contributor Author

This requires the 2.0 beta to work (lost track of when that private class
created). The beta is on conda forge.

I also have an idea on how to make the connection to the ds pipeline more
general.

On Mon, Jul 18, 2016, 15:42 James A. Bednar notifications@github.com
wrote:

Looks great, thanks! I'll try it out and merge if it's all ok.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhRZXXwWdY-3jikR95WuOm-4-8Jxqks5qW9cVgaJpZM4JN8jc
.

@astrofrog
Copy link

@tacaswell - would it make sense to expose (as public) the currently private class and private _make_image methods so that we can reliably rely on them for examples like this?

@tacaswell
Copy link
Contributor Author

Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

@tacaswell, I'm not sure how to get the mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge  matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

# All requested packages already installed.
# packages in environment at /Users/jbednar/anaconda:
#
matplotlib                1.5.2               np111py27_4    conda-forge

@tacaswell
Copy link
Contributor Author

You need to ask for the rc channel as well

conda install -c conda-forge/label/rc -c conda-forge matplotlib

Sorry for not being clearer about that.

On Thu, Jul 21, 2016 at 12:52 PM James A. Bednar notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell, I'm not sure how to get the
mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

All requested packages already installed.

packages in environment at /Users/jbednar/anaconda:

matplotlib 1.5.2 np111py27_4 conda-forge


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhSUHgM9ephj1kchTcYO49ksip2blks5qX6PCgaJpZM4JN8jc
.

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

Very nice! I couldn't get scroll zooming to work, but box zooming was very snappy. I had to make some edits for Python2 compatibility:

0172-jbednar:~/datashader/datashader> diff mpl_ext.py~ mpl_ext.py
14c14
<         super().__init__(ax, **kwargs)
---
>         super(DSArtist,self).__init__(ax, **kwargs)
48c48
<         return (*self.axes.get_xlim(), *self.axes.get_ylim())
---
>         return self.axes.get_xlim() + self.axes.get_ylim()

We'd want to include a runnable example with the distribution, so I adapted your snippet above into a new file examples/nyc_taxi_mpl.py:

import pandas as pd
df = pd.read_csv('data/nyc_taxi.csv',usecols=['dropoff_x','dropoff_y', 'passenger_count'])

import datashader as ds
from datashader.mpl_ext import DSArtist
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

fig, ax = plt.subplots()
da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm(), cmap='viridis_r');
ax.add_artist(da); ax.set_aspect('equal');

fig.colorbar(da)
plt.show()

This pair of files worked well for me, anyway! Note that I reversed the colormap, so that it works better on a white background:

image

@tacaswell
Copy link
Contributor Author

scroll zooming is not one of the default interactions which is why it didn't work 😈

The reversed color map does look much better.

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

Overall, it seems like this approach will work well for simple datashader pipelines, as in the case illustrated above (basically anything supported by datashader.pipeline.Pipeline). But it won't support more complex pipelines, where it's not just the reduction (argument "agg" of DSArtist) that needs to be overridden, but the pipeline itself. E.g. in the census example, there are user-defined operations on the aggregate array before it is displayed:

tf.colorize(agg.where(agg.sel(race='w') < agg.sel(race='b')).fillna(0), color_key, how='eq_hist')

I'm not sure how a user could inject the agg.sel operation into the DSArtist, which would mean they'd have to copy that class and edit it to do what should be a simple operation.

Those examples also use colorize, which takes categorical information that I'm not sure how to integrate in this approach, if matplotlib is doing the colorizing.

It seems like it would be more general if mpl_ext could support a create_image() callback instead of the current approach, as datashader.InteractiveImage does now, so that users could supply any arbitrary pipeline in just a few lines of code. Supporting a create_aggregate() callback would also be useful, so that users could employ mpl's own colormapping, though I'm not sure how that would work for categorical information.

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

| It seems like it would be more general if mpl_ext could support a create_image() callback

| Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

These two suggestions may amount to the same thing; if so then it's clear how to move forward!

@tacaswell
Copy link
Contributor Author

My current best thought is to have the users provide a callback which has a signature like

def ds_cb(canvas, data):
    return float_or_int_img

mpl has support for discrete color maps (if your norm returns integers the values are used as direct lookups in the color table).

Given that mpl users already know how to use the mpl colorization code (one hopes), I would greatly prefer that level get delegated to us, but making this class smart enough to check if it got back a NxM or NxMx4 is not too much work (or it might just fall through correctly now).

I agree that we seem to be in agreement.

attn @story645 (who is a GSOC student working on integrating categorical plotting into mpl)

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

I agree that we'd want as much of the processing to use mpl's code as is practical, to help integrate it more easily into mpl users' workflows and make it more familiar to them. Let datashader do what datashader is best at, and let mpl handle the rest!

mpl's discrete colormap support may work for categorical information, but I don't know enough about it to be sure. datashader.tf.colorize() does use discrete colors, but it then (a) mixes those discrete colors according to the counts in each category for that pixel, and (b) adjusts the alpha value of the color from a continuous range, depending on the total count for that pixel compared to the others. So the result is an arbitrarily large set of colors, starting from the nominally discrete colormap like the 5 base colors used here:

image

Not sure if that's similar to what mpl supports or will support.

@tacaswell
Copy link
Contributor Author

There is not support of the catagorical blending (yet but we have been talking about generalizing the norm/colormap chain for a while now).

@jbednar
Copy link
Member

jbednar commented Jul 21, 2016

Ok, then it sounds like supporting both NxMx1 and NxMx4 would be good in the meantime.

@jbednar jbednar self-assigned this Sep 8, 2016
@jbednar
Copy link
Member

jbednar commented Sep 9, 2016

I'd love to get matplotlib support into datashader. Any progress on addressing some of the issues above?

@tacaswell
Copy link
Contributor Author

Sorry, I have been swamped with other work.

@StevenCHowell
Copy link

@tacaswell @jbednar Have there been any updates to this?

Looking at the travis output, it seems to work fine with python 3. I will see how far I can get with the example above.

@StevenCHowell
Copy link

I realize my question is more related to usage (and probably just demonstrates my unfamiliarity with datashader and matplotlib) and is not specifically related to accepting/updating this PR. This was just the best resource I found when searching how to use datashader with matplotlib. Let me know if you prefer I move this to stack overflow and I will delete this.

Following the above example, with DSArtist definition from this PR, I almost have what I need. I am not sure how to get matplotlib coloring to normalize the same way datashader does by default. It is so much more faint than 'eq_hist' (the default), 'log', and even 'linear'.

Here is what I get using datashader with 'eq_hist':
image

This is the much fainter version I get using matplotlib:
image
My end goal is to use matplotlib to add the axes, labels, and red success points (blue are the ~30,000 trials, red are the ~10 successes) then plot this using a script in a headless environment (on a remote server without X).

This screenshot shows what I have been trying. Any thoughts @tacaswell?

mpl_datashader_almost

@jbednar
Copy link
Member

jbednar commented Feb 3, 2017

For now, this PR's discussion is fine as a place to collect anything about MPL support for datashader.

I'm surprised that you aren't seeing comparable results between MPL's shading and the linear shading in datashader. It would be good to post a side by side comparison using the same colormap and ranges with linear mapping; those should be at least very nearly the same regardless of who is doing the colormapping.

@tacaswell
Copy link
Contributor Author

As a side note, I am including a reference to this in mpl's GSoC ideas list.

@jbednar
Copy link
Member

jbednar commented Feb 5, 2017

Great! I'd be happy to work with a GSoC-er to help make this move forward. We are hoping to have funding for datashader start up again soon, and we'll be putting various functions in place that will help make it simpler to build legends, colorbars, etc. Those functions should help any downstream plotting library to summarize what's in the plot accurately and easily.

@StevenCHowell
Copy link

StevenCHowell commented Feb 6, 2017

I have had some difficulty using/defining matplotlib palettes as they are not as simple as the lists used by datashader and bokeh but I these should both be the reversed viridis palette. Also, I am not sure how to change the MPL shading scheme. The example above using norm=mcolors.LogNorm() which I took to be the same as datashader's how='log' option. So I used these for the plots below.

# some setup definitions
width = 6
height = width/3.0
x_range = [0, 40] 
y_range = [trial['energy'].min(), trial['energy'].max()]
w = int(width * 100)
h = int(height * 100)
# code for the datashader version
from bokeh.palettes import Viridis256 as palette
palette.reverse()
canvas = datashader.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='log')
img

image

# code for matplotlib version
fig, ax = plt.subplots(figsize=(width, height), dpi=400)
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), norm=mcolors.LogNorm(),
              cmap='viridis_r')
ax.add_artist(da)
ax.set_xlabel(r'$\chi^2$')
ax.set_ylabel(r'LR Docking Score')
ax.set_xlim(x_range)
ax.set_ylim(y_range)

plt.show()

image

I expected these to look essentially the same but the difference is obvious. I am not sure how to account for this but it is possible I simply do not understand how to use matplotlib well enough. In case I am not actually using logarithmic binning in MPL, here is the datashader plot using how='linear':
image
and hera is the datashader plot using how='eq_hist':
image

Overall, the matplotlib plot looks too sparse, like I should not need to use datashader. That said, here is what the plot looks like without datashader:

plt.plot(trial.chi2, trial.energy, '.', ms='.1')

image

For comparison, here are these same plots using bokeh.
bokeh_plot 1
bokeh_plot 2
bokeh_plot

@StevenCHowell
Copy link

StevenCHowell commented Feb 6, 2017

In addition to the general question on how to make the matplotlib version match the datashader version, I have two specific questions.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?
  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(trial, 'chi2', 'energy', agg=ds.count()), 
                                       cmap=['white', 'firebrick'], how='log')

image

@jbednar
Copy link
Member

jbednar commented Feb 6, 2017

In addition to the general question on how to make the matplotlib version match the datashader version

To answer that, can you please post the same size image from both mpl and datashader colormapping, using a grayscale colormap, with linear mapping? You might have already provided enough info above, but I can't find any pair of images that should truly be mathematically identical, which is always the safest place to start. Grayscale should be comparable across all libraries.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?

I'm not aware of any histogram equalization option in matplotlib or bokeh, or else we probably would have just used those instead of adding our own to datashader. It would be very convenient if plotting libraries would support eq_hist directly, which would make it simpler to have meaningful colorbars, legends, and hover information. MPL is welcome to steal our eq-hist code; it's only 15 lines of Numpy-based Python, adapted from scikit-image.

  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(df, 'x', 'y', agg=ds.count()), cmap=['white', 'firebrick'], how='log')

See: http://matplotlib.org/examples/pylab_examples/custom_cmap.html

@tacaswell
Copy link
Contributor Author

See http://matplotlib.org/users/colormapnorms.html for details of how the color mapping process in mpl works.

Also, turn the DPI down on the mpl plots, the spatial bins passed to datashader are set by the physical pixels in the axes.

@StevenCHowell
Copy link

StevenCHowell commented Feb 6, 2017

Thanks for the links. Based on those, I created a black and white palette and colormap that should match

from matplotlib.colors import ListedColormap
palette = ['white', 'black']
cmap = ListedColormap(palette)

then defined the image size and plot ranges

width = 600  # in units of pixels
height = 300  # in units of pixels
x_range = [0, 40] 
y_range = [-100, 20]

then generated the datashader plot using how='linear'

canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  

datashader
and matplotlib plot using linear normalization: norm=mcolors.Normalize()

dpi = 100
x_inches = width/dpi
y_inches = height/dpi
fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi)
ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), 
              norm=mcolors.Normalize(), cmap=cmap)  
ax.add_artist(da)
plt.savefig('mpl.png', dpi=dpi, transparent=True)

mpl

Note that datashader uses pixel unit and matplotlib uses inches and dpi. You can see the code I used to convert between these and eliminate the axes on the matplotlib plot so the image uses the entire space.

I verified these are each 600x300 pixels

(datashader) ➜  odin: test/> file mpl.png datashader.png
mpl.png:        PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced
datashader.png: PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced

@jbednar
Copy link
Member

jbednar commented Feb 6, 2017

Looks like the mpl linear version isn't respecting the NaN mask in the same way, but it's good to see the logarithmic versions matching.

@StevenCHowell
Copy link

Another quirk of either matplotlib or jupyter is the plot looks very different in jupyter from the saved version. I just repeated this with my real data and in jupyter the colors went back to incredibly faint. (note that I want to use white and blue but this shows red and blue because I worried the white was not visible)

Here is the jupyter version (right click and save)
mpl_log_real_data_jupyter

and here is the version saved right before running plt.show()
mpl_log_real_data

then here is the datashader version
ds_log_real_data

@philippjfr
Copy link
Member

philippjfr commented Feb 6, 2017

Looks to me like this is happening because different renderers are drawing at different resolutions. My hypothesis is that the inline backend is using the hi-dpi option (presumably because you have a macbook) and therefore sampling at a higher resolution than you get when saving the datashader plot directly or using matplotlib to save it.

@jbednar
Copy link
Member

jbednar commented Feb 7, 2017

Right -- the results will vary a lot at different resolutions, by design, though you can use tf.spread() or tf.dyn_spread() to ensure that individual dots are visible at high resolutions.

@jbednar
Copy link
Member

jbednar commented Feb 15, 2017

BTW, note that recent dev releases of HoloViews now support datashader, with matplotlib or any other backend. Here's an example: https://anaconda.org/jbednar/census-hv-mpl/notebook

@maartenbreddels
Copy link

I noticed this thread on twitter, it reminded me to put in a matplotlib backend for vaex based in ipympl. The code lives here and might be useful for this discussion, since it tries to attach a similar problem. What might be useful is the debounced decorate that I use for instance here that will only execute after 0.5 seconds have passed, to avoid many update when moving and zooming. It only works when there is an ipykernel, for Qt you need a different debounce method (should have that code somewhere).

@stonebig
Copy link

datashader is not made available on pypi, a solution that could also support "pypi-compatible" alternatives, like "mpl-scatter-density", would be great.

@ruiyangl
Copy link

I couldn't do:
import from datashader.mpl_ext import DSArtist
here's the error:
DLL load failed: The specified module could not be found.

@jbednar
Copy link
Member

jbednar commented Aug 22, 2018

@ruiyangl, to get this experimental code, you would have to check out the branch of datashader associated with this PR, and use that instead of any released datashader version. We'd be happy to merge support for Matplotlib into Datashader whenever this PR can be completed. In the meantime, you can use HoloViews+Matplotlib to see static Datashader output inside a Matplotlib plot as mentioned above.

@jolespin
Copy link

Is this available in the current version?

@jbednar
Copy link
Member

jbednar commented Sep 26, 2019

The HoloViews support for Matplotlib+Datashader is in any recent version, but is only Agg (image) based (not interactive). No one has ever finished up this PR and made it mergeable, but if anyone wants to take it on, I'd be very happy to help get it merged! Meanwhile, https://github.com/astrofrog/mpl-scatter-density does similar things (though only for points).

@jbednar
Copy link
Member

jbednar commented Sep 26, 2019

datashader is not made available on pypi

BTW, Datashader is available on PyPi, nowadays (since 2018 at least).

@jolespin
Copy link

I'm pretty new to datashader but plan on integrating it heavily into my ecosystem. Forgive my ignorance but does DSArtist allow one to convert datashader objects to matplotlib ax? Or does it actually create the plots natively with matplotlib?

Just to be clear, there is no way to use datashader layouts with matplotlib because that would defeat the purpose of plotting large complex graphs?

@jbednar
Copy link
Member

jbednar commented Sep 27, 2019

This PR allows creating a Matplotlib artist that uses Datashader internally; it doesn't accept any Datashader pipelines you've already created. So there's no conversion involved. I'm not sure what you mean by a Datashader layout or why that would defeat the purpose, but the purpose of this PR is to create an object that will interactively re-draw on zooming; you can already create a matrix that you can plot manually with Matplotlib. If that's not what you're asking, please elaborate!

@jolespin
Copy link

I'm not sure what you mean by a Datashader layout or why that would defeat the purpose, but the purpose of this PR is to create an object that will interactively re-draw on zooming; you can already create a matrix that you can plot manually with Matplotlib.

Sorry, by datashader layout I meant the hammer_bundle algorithm http://datashader.org/api.html#datashader.bundling.hammer_bundle

I noticed it returns many more rows than there are nodes. What are these rows? Is it possible to use this output to plot using matplotlib natively?

Regarding "defeating the purpose" I was referring to fact that the output of hammer_bundle is a very large dataframe. If those are arc coords then it may take a while plotting in matplotlib vs. datashader.

Hope that makes sense!

@jbednar
Copy link
Member

jbednar commented Sep 27, 2019

Ah, I see what you mean. I don't think this PR will help you. This PR hard-codes a call to cvs.points() to plot a 2D histogram of points, whereas a bundled-edge network graph requires line plotting, not point plotting.

Each row of the output of hammer_bundle is a line segment that approximates one bit of a curve connecting two nodes. Given that there are typically more edges than nodes and given that each curve will be approximated by many line segments, yes, there will be many more rows in the output than nodes.

These rows can be visualized using any plotting program capable of plotting line segments, including presumably Matplotlib, if you want to write code for that library. But you are correct that for large networks the number of line segments involved will typically be too large for most plotting programs.

With that in mind, you can use Datashader on the resulting segment dataframe to create a rectangular array with the fully rendered output, and then display that in any plotting program capable of plotting rectangular arrays, including Matplotlib.

This PR won't be useful for any of those options unless it is heavily generalized, so I'd recommend just sticking with the functions described at Datashader.org, using the output from them with whatever your favorite plotting library is.

@jbednar
Copy link
Member

jbednar commented Jan 27, 2020

It would be great to have a more configurable version of this PR, especially if it works with ipympl so that it would be interactive in notebooks, but for the moment, I think most or all of what it provides is covered by mpl_scatter_density (zoomable point density plots) and the Matplotlib backend already available in HoloViews (not zoomable, but which can be dynamically updated with widgets). So I'll close this for now, but we would welcome more full featured Matplotlib support for Datashader whenever anyone wants to work on it.

@jbednar jbednar closed this Jan 27, 2020
@tacaswell
Copy link
Contributor Author

This should work with ipympl out of the box and tweaking it to take in a pre-configured pipeline should be straight forward.

@jbednar
Copy link
Member

jbednar commented Jan 27, 2020

Are you interested in making it configurable in that way? If not, do you consider it useful in its current form? If so I'm happy to merge; it just seemed to be sitting here getting more stale while still being labeled "draft". If it works with ipympl I could put an example showing how to do this on examples.pyviz.org for people to run, which would make a cool demo...

@tacaswell
Copy link
Contributor Author

I suspect it would be much easier for someone familiar with datashader to make that change, I don't know what people would expect the API to be.

@jbednar
Copy link
Member

jbednar commented Apr 7, 2020

Reopening this PR so that we don't forget about it; the work that there is to do is relatively minor and at the Datashader side, so if we ever get a chance to look at it, we should be able to merge this.

@jbednar
Copy link
Member

jbednar commented Aug 27, 2020

Closing in favor of #939.

@jbednar jbednar closed this Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants