# ENH: first draft of MPL artist #200

Open
wants to merge 1 commit into
from
+66 −0

## Conversation

Projects
None yet
8 participants

### tacaswell commented Jul 16, 2016 • edited

 Minimal datashader aware matplotlib artist. from datashader.mpl_ext import DSArtist import matplotlib.pyplot as plt import matplotlib.colors as mocolor fig, ax = plt.subplots() da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'), norm=mcolors.LogNorm()); ax.add_artist(da); ax.set_aspect('equal'); fig.colorbar(da)  This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.
 ENH: first draft of MPL artist 
Minimal datashader aware matplotlib artist.
 5cd5623 

Collaborator

### jbednar commented Jul 18, 2016

 Looks great, thanks! I'll try it out and merge if it's all ok.

### tacaswell commented Jul 18, 2016

 This requires the 2.0 beta to work (lost track of when that private class created). The beta is on conda forge. I also have an idea on how to make the connection to the ds pipeline more general. On Mon, Jul 18, 2016, 15:42 James A. Bednar notifications@github.com wrote: Looks great, thanks! I'll try it out and merge if it's all ok. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #200 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAMMhRZXXwWdY-3jikR95WuOm-4-8Jxqks5qW9cVgaJpZM4JN8jc .

### astrofrog commented Jul 18, 2016

 @tacaswell - would it make sense to expose (as public) the currently private class and private _make_image methods so that we can reliably rely on them for examples like this?

### tacaswell commented Jul 18, 2016

 Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.
Collaborator

### jbednar commented Jul 21, 2016

 @tacaswell, I'm not sure how to get the mpl 2.0 beta from conda-forge. It's only offering me 1.5.2: 0172-jbednar:~> conda install -c conda-forge matplotlib Using Anaconda Cloud api site https://api.anaconda.org # All requested packages already installed. # packages in environment at /Users/jbednar/anaconda: # matplotlib 1.5.2 np111py27_4 conda-forge 

### tacaswell commented Jul 21, 2016

You need to ask for the rc channel as well

conda install -c conda-forge/label/rc -c conda-forge matplotlib

Sorry for not being clearer about that.

On Thu, Jul 21, 2016 at 12:52 PM James A. Bednar notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell, I'm not sure how to get the
mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

# packages in environment at /Users/jbednar/anaconda:

matplotlib 1.5.2 np111py27_4 conda-forge

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
.

Collaborator

### jbednar commented Jul 21, 2016

 Very nice! I couldn't get scroll zooming to work, but box zooming was very snappy. I had to make some edits for Python2 compatibility: 0172-jbednar:~/datashader/datashader> diff mpl_ext.py~ mpl_ext.py 14c14 < super().__init__(ax, **kwargs) --- > super(DSArtist,self).__init__(ax, **kwargs) 48c48 < return (*self.axes.get_xlim(), *self.axes.get_ylim()) --- > return self.axes.get_xlim() + self.axes.get_ylim() We'd want to include a runnable example with the distribution, so I adapted your snippet above into a new file examples/nyc_taxi_mpl.py: import pandas as pd df = pd.read_csv('data/nyc_taxi.csv',usecols=['dropoff_x','dropoff_y', 'passenger_count']) import datashader as ds from datashader.mpl_ext import DSArtist import matplotlib.pyplot as plt import matplotlib.colors as mcolors fig, ax = plt.subplots() da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'), norm=mcolors.LogNorm(), cmap='viridis_r'); ax.add_artist(da); ax.set_aspect('equal'); fig.colorbar(da) plt.show()  This pair of files worked well for me, anyway! Note that I reversed the colormap, so that it works better on a white background:

### tacaswell commented Jul 21, 2016

 scroll zooming is not one of the default interactions which is why it didn't work 😈 The reversed color map does look much better.
Collaborator

### jbednar commented Jul 21, 2016 • edited

 Overall, it seems like this approach will work well for simple datashader pipelines, as in the case illustrated above (basically anything supported by datashader.pipeline.Pipeline). But it won't support more complex pipelines, where it's not just the reduction (argument "agg" of DSArtist) that needs to be overridden, but the pipeline itself. E.g. in the census example, there are user-defined operations on the aggregate array before it is displayed: tf.colorize(agg.where(agg.sel(race='w') < agg.sel(race='b')).fillna(0), color_key, how='eq_hist')  I'm not sure how a user could inject the agg.sel operation into the DSArtist, which would mean they'd have to copy that class and edit it to do what should be a simple operation. Those examples also use colorize, which takes categorical information that I'm not sure how to integrate in this approach, if matplotlib is doing the colorizing. It seems like it would be more general if mpl_ext could support a create_image() callback instead of the current approach, as datashader.InteractiveImage does now, so that users could supply any arbitrary pipeline in just a few lines of code. Supporting a create_aggregate() callback would also be useful, so that users could employ mpl's own colormapping, though I'm not sure how that would work for categorical information.
Collaborator

### jbednar commented Jul 21, 2016 • edited

 | It seems like it would be more general if mpl_ext could support a create_image() callback | Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument. These two suggestions may amount to the same thing; if so then it's clear how to move forward!

### tacaswell commented Jul 21, 2016

 My current best thought is to have the users provide a callback which has a signature like def ds_cb(canvas, data): return float_or_int_img mpl has support for discrete color maps (if your norm returns integers the values are used as direct lookups in the color table). Given that mpl users already know how to use the mpl colorization code (one hopes), I would greatly prefer that level get delegated to us, but making this class smart enough to check if it got back a NxM or NxMx4 is not too much work (or it might just fall through correctly now). I agree that we seem to be in agreement. attn @story645 (who is a GSOC student working on integrating categorical plotting into mpl)
Collaborator

### jbednar commented Jul 21, 2016 • edited

 I agree that we'd want as much of the processing to use mpl's code as is practical, to help integrate it more easily into mpl users' workflows and make it more familiar to them. Let datashader do what datashader is best at, and let mpl handle the rest! mpl's discrete colormap support may work for categorical information, but I don't know enough about it to be sure. datashader.tf.colorize() does use discrete colors, but it then (a) mixes those discrete colors according to the counts in each category for that pixel, and (b) adjusts the alpha value of the color from a continuous range, depending on the total count for that pixel compared to the others. So the result is an arbitrarily large set of colors, starting from the nominally discrete colormap like the 5 base colors used here: Not sure if that's similar to what mpl supports or will support.

### tacaswell commented Jul 21, 2016

 There is not support of the catagorical blending (yet but we have been talking about generalizing the norm/colormap chain for a while now).
Collaborator

### jbednar commented Jul 21, 2016

 Ok, then it sounds like supporting both NxMx1 and NxMx4 would be good in the meantime.

Collaborator

### jbednar commented Sep 9, 2016

 I'd love to get matplotlib support into datashader. Any progress on addressing some of the issues above?

### tacaswell commented Sep 9, 2016

 Sorry, I have been swamped with other work.

### StevenCHowell commented Feb 2, 2017

 @tacaswell @jbednar Have there been any updates to this? Looking at the travis output, it seems to work fine with python 3. I will see how far I can get with the example above.

### StevenCHowell commented Feb 2, 2017

 I realize my question is more related to usage (and probably just demonstrates my unfamiliarity with datashader and matplotlib) and is not specifically related to accepting/updating this PR. This was just the best resource I found when searching how to use datashader with matplotlib. Let me know if you prefer I move this to stack overflow and I will delete this. Following the above example, with DSArtist definition from this PR, I almost have what I need. I am not sure how to get matplotlib coloring to normalize the same way datashader does by default. It is so much more faint than 'eq_hist' (the default), 'log', and even 'linear'. Here is what I get using datashader with 'eq_hist': This is the much fainter version I get using matplotlib: My end goal is to use matplotlib to add the axes, labels, and red success points (blue are the ~30,000 trials, red are the ~10 successes) then plot this using a script in a headless environment (on a remote server without X). This screenshot shows what I have been trying. Any thoughts @tacaswell?
Collaborator

### jbednar commented Feb 3, 2017

 For now, this PR's discussion is fine as a place to collect anything about MPL support for datashader. I'm surprised that you aren't seeing comparable results between MPL's shading and the linear shading in datashader. It would be good to post a side by side comparison using the same colormap and ranges with linear mapping; those should be at least very nearly the same regardless of who is doing the colormapping.

### tacaswell commented Feb 4, 2017

 As a side note, I am including a reference to this in mpl's GSoC ideas list.
Collaborator

### jbednar commented Feb 5, 2017

 Great! I'd be happy to work with a GSoC-er to help make this move forward. We are hoping to have funding for datashader start up again soon, and we'll be putting various functions in place that will help make it simpler to build legends, colorbars, etc. Those functions should help any downstream plotting library to summarize what's in the plot accurately and easily.

### StevenCHowell commented Feb 6, 2017 • edited

 I have had some difficulty using/defining matplotlib palettes as they are not as simple as the lists used by datashader and bokeh but I these should both be the reversed viridis palette. Also, I am not sure how to change the MPL shading scheme. The example above using norm=mcolors.LogNorm() which I took to be the same as datashader's how='log' option. So I used these for the plots below. # some setup definitions width = 6 height = width/3.0 x_range = [0, 40] y_range = [trial['energy'].min(), trial['energy'].max()] w = int(width * 100) h = int(height * 100)  # code for the datashader version from bokeh.palettes import Viridis256 as palette palette.reverse() canvas = datashader.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range) agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count()) img = datashader.transfer_functions.shade(agg, cmap=palette, how='log') img  # code for matplotlib version fig, ax = plt.subplots(figsize=(width, height), dpi=400) da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), norm=mcolors.LogNorm(), cmap='viridis_r') ax.add_artist(da) ax.set_xlabel(r'$\chi^2$') ax.set_ylabel(r'LR Docking Score') ax.set_xlim(x_range) ax.set_ylim(y_range) plt.show()  I expected these to look essentially the same but the difference is obvious. I am not sure how to account for this but it is possible I simply do not understand how to use matplotlib well enough. In case I am not actually using logarithmic binning in MPL, here is the datashader plot using how='linear': and hera is the datashader plot using how='eq_hist': Overall, the matplotlib plot looks too sparse, like I should not need to use datashader. That said, here is what the plot looks like without datashader: plt.plot(trial.chi2, trial.energy, '.', ms='.1')  For comparison, here are these same plots using bokeh.

### StevenCHowell commented Feb 6, 2017 • edited

 In addition to the general question on how to make the matplotlib version match the datashader version, I have two specific questions. Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this? How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']: transfer_functions.shade(canvas.points(trial, 'chi2', 'energy', agg=ds.count()), cmap=['white', 'firebrick'], how='log') 
Collaborator

### jbednar commented Feb 6, 2017

 In addition to the general question on how to make the matplotlib version match the datashader version To answer that, can you please post the same size image from both mpl and datashader colormapping, using a grayscale colormap, with linear mapping? You might have already provided enough info above, but I can't find any pair of images that should truly be mathematically identical, which is always the safest place to start. Grayscale should be comparable across all libraries. Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this? I'm not aware of any histogram equalization option in matplotlib or bokeh, or else we probably would have just used those instead of adding our own to datashader. It would be very convenient if plotting libraries would support eq_hist directly, which would make it simpler to have meaningful colorbars, legends, and hover information. MPL is welcome to steal our eq-hist code; it's only 15 lines of Numpy-based Python, adapted from scikit-image. How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']: transfer_functions.shade(canvas.points(df, 'x', 'y', agg=ds.count()), cmap=['white', 'firebrick'], how='log') 

### tacaswell commented Feb 6, 2017

 See http://matplotlib.org/users/colormapnorms.html for details of how the color mapping process in mpl works. Also, turn the DPI down on the mpl plots, the spatial bins passed to datashader are set by the physical pixels in the axes.

### StevenCHowell commented Feb 6, 2017 • edited

 Thanks for the links. Based on those, I created a black and white palette and colormap that should match from matplotlib.colors import ListedColormap palette = ['white', 'black'] cmap = ListedColormap(palette)  then defined the image size and plot ranges width = 600 # in units of pixels height = 300 # in units of pixels x_range = [0, 40] y_range = [-100, 20]  then generated the datashader plot using how='linear' canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range) agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count()) img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  and matplotlib plot using linear normalization: norm=mcolors.Normalize() dpi = 100 x_inches = width/dpi y_inches = height/dpi fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi) ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[]) da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), norm=mcolors.Normalize(), cmap=cmap) ax.add_artist(da) plt.savefig('mpl.png', dpi=dpi, transparent=True)  Note that datashader uses pixel unit and matplotlib uses inches and dpi. You can see the code I used to convert between these and eliminate the axes on the matplotlib plot so the image uses the entire space. I verified these are each 600x300 pixels (datashader) ➜ odin: test/> file mpl.png datashader.png mpl.png: PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced datashader.png: PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced 

### StevenCHowell commented Feb 6, 2017 • edited

 Here is an example using the sample data from the pipeline notebook: import datashader import matplotlib.pyplot as plt import matplotlib.colors as mcolors import numpy as np import pandas as pd from matplotlib.colors import LinearSegmentedColormap np.random.seed(1) num=100000 dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num), y=np.random.normal(y,s,num), val=val,cat=cat)) for x,y,s,val,cat in [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]} df = pd.concat(dists,ignore_index=True) df["cat"]=df["cat"].astype("category") palette = ['gray', 'black'] # white created issues with the low blending into the background cmap = LinearSegmentedColormap.from_list('test', palette, N=256) width = 300 # in units of pixels height = 300 # in units of pixels x_range = [-15, 15] y_range = [-15, 15] canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range) agg = canvas.points(df, 'x', 'y', agg=datashader.count()) img = datashader.transfer_functions.shade(agg, cmap=palette, how='log') img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear') # uncomment for linear img dpi = 100 x_inches = width/dpi y_inches = height/dpi fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi) ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[]) da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(), norm=mcolors.LogNorm(), cmap=cmap) # da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(), # uncomment for linear # norm=mcolors.Normalize(), cmap=cmap) ax.add_artist(da) ax.set_xlim(x_range) ax.set_ylim(y_range) plt.savefig('mpl_sample_data.png', dpi=dpi, transparent=True)  Here are the logarithmic results. datashader matplotlib These do seem to match now. Here are the linear results. datashader matplotlib These still do not match.
Collaborator

### jbednar commented Feb 6, 2017

 For matplotlib, I think you want a LinearSegmentedColormap (which interpolates between colors), not a ListedColormap (which uses only the provided colors).

### tacaswell commented Feb 6, 2017

 Or just use the existing gray color map.

### StevenCHowell commented Feb 6, 2017

 It is not clear how to use the same color map for both datashader, which takes a list of strings, and matplotlib, which takes a more custom format. This may be the root problem. I changed to using LinearSegmentedColormap.from_list('name', palette, N=100) and they still differ. I am not sure what N to use to make the two colormaps comparable. I will update my comment above using the sample data to reflect this new colormap definition and the generated plots.
Collaborator

### jbednar commented Feb 6, 2017

 Datashader uses a 256-color palette, so in this case N=256 (or use "gray" as suggested).

### StevenCHowell commented Feb 6, 2017

 Using LinearSegmentedColormap.from_list('name', palette, N=256) I was able to get the logarithmic plots to match. While the linear plots do not match, neither looks helpful anyway. Since matplotlib does not offer equalized histogram (yet?), I think the logarithmic option with this colormap definition provides the best solution. I will try this with the real data.
Collaborator

### jbednar commented Feb 6, 2017

 Looks like the mpl linear version isn't respecting the NaN mask in the same way, but it's good to see the logarithmic versions matching.

### StevenCHowell commented Feb 6, 2017

 Another quirk of either matplotlib or jupyter is the plot looks very different in jupyter from the saved version. I just repeated this with my real data and in jupyter the colors went back to incredibly faint. (note that I want to use white and blue but this shows red and blue because I worried the white was not visible) Here is the jupyter version (right click and save) and here is the version saved right before running plt.show() then here is the datashader version
Collaborator

### philippjfr commented Feb 6, 2017 • edited

 Looks to me like this is happening because different renderers are drawing at different resolutions. My hypothesis is that the inline backend is using the hi-dpi option (presumably because you have a macbook) and therefore sampling at a higher resolution than you get when saving the datashader plot directly or using matplotlib to save it.
Collaborator

### jbednar commented Feb 7, 2017

 Right -- the results will vary a lot at different resolutions, by design, though you can use tf.spread() or tf.dyn_spread() to ensure that individual dots are visible at high resolutions.
Collaborator

### jbednar commented Feb 15, 2017

 BTW, note that recent dev releases of HoloViews now support datashader, with matplotlib or any other backend. Here's an example: https://anaconda.org/jbednar/census-hv-mpl/notebook

Closed

Open

### maartenbreddels commented Apr 13, 2017

 I noticed this thread on twitter, it reminded me to put in a matplotlib backend for vaex based in ipympl. The code lives here and might be useful for this discussion, since it tries to attach a similar problem. What might be useful is the debounced decorate that I use for instance here that will only execute after 0.5 seconds have passed, to avoid many update when moving and zooming. It only works when there is an ipykernel, for Qt you need a different debounce method (should have that code somewhere).

### stonebig commented Jul 22, 2017

 datashader is not made available on pypi, a solution that could also support "pypi-compatible" alternatives, like "mpl-scatter-density", would be great.

Open

### ruiyangl commented Aug 22, 2018

 I couldn't do: import from datashader.mpl_ext import DSArtist here's the error: DLL load failed: The specified module could not be found.
Collaborator

### jbednar commented Aug 22, 2018

 @ruiyangl, to get this experimental code, you would have to check out the branch of datashader associated with this PR, and use that instead of any released datashader version. We'd be happy to merge support for Matplotlib into Datashader whenever this PR can be completed. In the meantime, you can use HoloViews+Matplotlib to see static Datashader output inside a Matplotlib plot as mentioned above.

Open