New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: first draft of MPL artist #200

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
8 participants
@tacaswell

tacaswell commented Jul 16, 2016

Minimal datashader aware matplotlib artist.

from datashader.mpl_ext import DSArtist
import matplotlib.pyplot as plt
import matplotlib.colors as mocolor

fig, ax = plt.subplots()
da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm());
ax.add_artist(da); ax.set_aspect('equal');

fig.colorbar(da)

so

This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.

ENH: first draft of MPL artist
Minimal datashader aware matplotlib artist.
@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell commented Jul 16, 2016

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 18, 2016

Collaborator

Looks great, thanks! I'll try it out and merge if it's all ok.

Collaborator

jbednar commented Jul 18, 2016

Looks great, thanks! I'll try it out and merge if it's all ok.

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 18, 2016

This requires the 2.0 beta to work (lost track of when that private class
created). The beta is on conda forge.

I also have an idea on how to make the connection to the ds pipeline more
general.

On Mon, Jul 18, 2016, 15:42 James A. Bednar notifications@github.com
wrote:

Looks great, thanks! I'll try it out and merge if it's all ok.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhRZXXwWdY-3jikR95WuOm-4-8Jxqks5qW9cVgaJpZM4JN8jc
.

tacaswell commented Jul 18, 2016

This requires the 2.0 beta to work (lost track of when that private class
created). The beta is on conda forge.

I also have an idea on how to make the connection to the ds pipeline more
general.

On Mon, Jul 18, 2016, 15:42 James A. Bednar notifications@github.com
wrote:

Looks great, thanks! I'll try it out and merge if it's all ok.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhRZXXwWdY-3jikR95WuOm-4-8Jxqks5qW9cVgaJpZM4JN8jc
.

@astrofrog

This comment has been minimized.

Show comment
Hide comment
@astrofrog

astrofrog Jul 18, 2016

@tacaswell - would it make sense to expose (as public) the currently private class and private _make_image methods so that we can reliably rely on them for examples like this?

astrofrog commented Jul 18, 2016

@tacaswell - would it make sense to expose (as public) the currently private class and private _make_image methods so that we can reliably rely on them for examples like this?

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 18, 2016

Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

tacaswell commented Jul 18, 2016

Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

@tacaswell, I'm not sure how to get the mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge  matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

# All requested packages already installed.
# packages in environment at /Users/jbednar/anaconda:
#
matplotlib                1.5.2               np111py27_4    conda-forge
Collaborator

jbednar commented Jul 21, 2016

@tacaswell, I'm not sure how to get the mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge  matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

# All requested packages already installed.
# packages in environment at /Users/jbednar/anaconda:
#
matplotlib                1.5.2               np111py27_4    conda-forge
@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 21, 2016

You need to ask for the rc channel as well

conda install -c conda-forge/label/rc -c conda-forge matplotlib

Sorry for not being clearer about that.

On Thu, Jul 21, 2016 at 12:52 PM James A. Bednar notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell, I'm not sure how to get the
mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

All requested packages already installed.

packages in environment at /Users/jbednar/anaconda:

matplotlib 1.5.2 np111py27_4 conda-forge


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhSUHgM9ephj1kchTcYO49ksip2blks5qX6PCgaJpZM4JN8jc
.

tacaswell commented Jul 21, 2016

You need to ask for the rc channel as well

conda install -c conda-forge/label/rc -c conda-forge matplotlib

Sorry for not being clearer about that.

On Thu, Jul 21, 2016 at 12:52 PM James A. Bednar notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell, I'm not sure how to get the
mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:

0172-jbednar:~> conda install -c conda-forge matplotlib
Using Anaconda Cloud api site https://api.anaconda.org

All requested packages already installed.

packages in environment at /Users/jbednar/anaconda:

matplotlib 1.5.2 np111py27_4 conda-forge


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#200 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAMMhSUHgM9ephj1kchTcYO49ksip2blks5qX6PCgaJpZM4JN8jc
.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

Very nice! I couldn't get scroll zooming to work, but box zooming was very snappy. I had to make some edits for Python2 compatibility:

0172-jbednar:~/datashader/datashader> diff mpl_ext.py~ mpl_ext.py
14c14
<         super().__init__(ax, **kwargs)
---
>         super(DSArtist,self).__init__(ax, **kwargs)
48c48
<         return (*self.axes.get_xlim(), *self.axes.get_ylim())
---
>         return self.axes.get_xlim() + self.axes.get_ylim()

We'd want to include a runnable example with the distribution, so I adapted your snippet above into a new file examples/nyc_taxi_mpl.py:

import pandas as pd
df = pd.read_csv('data/nyc_taxi.csv',usecols=['dropoff_x','dropoff_y', 'passenger_count'])

import datashader as ds
from datashader.mpl_ext import DSArtist
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

fig, ax = plt.subplots()
da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm(), cmap='viridis_r');
ax.add_artist(da); ax.set_aspect('equal');

fig.colorbar(da)
plt.show()

This pair of files worked well for me, anyway! Note that I reversed the colormap, so that it works better on a white background:

image

Collaborator

jbednar commented Jul 21, 2016

Very nice! I couldn't get scroll zooming to work, but box zooming was very snappy. I had to make some edits for Python2 compatibility:

0172-jbednar:~/datashader/datashader> diff mpl_ext.py~ mpl_ext.py
14c14
<         super().__init__(ax, **kwargs)
---
>         super(DSArtist,self).__init__(ax, **kwargs)
48c48
<         return (*self.axes.get_xlim(), *self.axes.get_ylim())
---
>         return self.axes.get_xlim() + self.axes.get_ylim()

We'd want to include a runnable example with the distribution, so I adapted your snippet above into a new file examples/nyc_taxi_mpl.py:

import pandas as pd
df = pd.read_csv('data/nyc_taxi.csv',usecols=['dropoff_x','dropoff_y', 'passenger_count'])

import datashader as ds
from datashader.mpl_ext import DSArtist
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

fig, ax = plt.subplots()
da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm(), cmap='viridis_r');
ax.add_artist(da); ax.set_aspect('equal');

fig.colorbar(da)
plt.show()

This pair of files worked well for me, anyway! Note that I reversed the colormap, so that it works better on a white background:

image

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 21, 2016

scroll zooming is not one of the default interactions which is why it didn't work 😈

The reversed color map does look much better.

tacaswell commented Jul 21, 2016

scroll zooming is not one of the default interactions which is why it didn't work 😈

The reversed color map does look much better.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

Overall, it seems like this approach will work well for simple datashader pipelines, as in the case illustrated above (basically anything supported by datashader.pipeline.Pipeline). But it won't support more complex pipelines, where it's not just the reduction (argument "agg" of DSArtist) that needs to be overridden, but the pipeline itself. E.g. in the census example, there are user-defined operations on the aggregate array before it is displayed:

tf.colorize(agg.where(agg.sel(race='w') < agg.sel(race='b')).fillna(0), color_key, how='eq_hist')

I'm not sure how a user could inject the agg.sel operation into the DSArtist, which would mean they'd have to copy that class and edit it to do what should be a simple operation.

Those examples also use colorize, which takes categorical information that I'm not sure how to integrate in this approach, if matplotlib is doing the colorizing.

It seems like it would be more general if mpl_ext could support a create_image() callback instead of the current approach, as datashader.InteractiveImage does now, so that users could supply any arbitrary pipeline in just a few lines of code. Supporting a create_aggregate() callback would also be useful, so that users could employ mpl's own colormapping, though I'm not sure how that would work for categorical information.

Collaborator

jbednar commented Jul 21, 2016

Overall, it seems like this approach will work well for simple datashader pipelines, as in the case illustrated above (basically anything supported by datashader.pipeline.Pipeline). But it won't support more complex pipelines, where it's not just the reduction (argument "agg" of DSArtist) that needs to be overridden, but the pipeline itself. E.g. in the census example, there are user-defined operations on the aggregate array before it is displayed:

tf.colorize(agg.where(agg.sel(race='w') < agg.sel(race='b')).fillna(0), color_key, how='eq_hist')

I'm not sure how a user could inject the agg.sel operation into the DSArtist, which would mean they'd have to copy that class and edit it to do what should be a simple operation.

Those examples also use colorize, which takes categorical information that I'm not sure how to integrate in this approach, if matplotlib is doing the colorizing.

It seems like it would be more general if mpl_ext could support a create_image() callback instead of the current approach, as datashader.InteractiveImage does now, so that users could supply any arbitrary pipeline in just a few lines of code. Supporting a create_aggregate() callback would also be useful, so that users could employ mpl's own colormapping, though I'm not sure how that would work for categorical information.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

| It seems like it would be more general if mpl_ext could support a create_image() callback

| Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

These two suggestions may amount to the same thing; if so then it's clear how to move forward!

Collaborator

jbednar commented Jul 21, 2016

| It seems like it would be more general if mpl_ext could support a create_image() callback

| Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument.

These two suggestions may amount to the same thing; if so then it's clear how to move forward!

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 21, 2016

My current best thought is to have the users provide a callback which has a signature like

def ds_cb(canvas, data):
    return float_or_int_img

mpl has support for discrete color maps (if your norm returns integers the values are used as direct lookups in the color table).

Given that mpl users already know how to use the mpl colorization code (one hopes), I would greatly prefer that level get delegated to us, but making this class smart enough to check if it got back a NxM or NxMx4 is not too much work (or it might just fall through correctly now).

I agree that we seem to be in agreement.

attn @story645 (who is a GSOC student working on integrating categorical plotting into mpl)

tacaswell commented Jul 21, 2016

My current best thought is to have the users provide a callback which has a signature like

def ds_cb(canvas, data):
    return float_or_int_img

mpl has support for discrete color maps (if your norm returns integers the values are used as direct lookups in the color table).

Given that mpl users already know how to use the mpl colorization code (one hopes), I would greatly prefer that level get delegated to us, but making this class smart enough to check if it got back a NxM or NxMx4 is not too much work (or it might just fall through correctly now).

I agree that we seem to be in agreement.

attn @story645 (who is a GSOC student working on integrating categorical plotting into mpl)

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

I agree that we'd want as much of the processing to use mpl's code as is practical, to help integrate it more easily into mpl users' workflows and make it more familiar to them. Let datashader do what datashader is best at, and let mpl handle the rest!

mpl's discrete colormap support may work for categorical information, but I don't know enough about it to be sure. datashader.tf.colorize() does use discrete colors, but it then (a) mixes those discrete colors according to the counts in each category for that pixel, and (b) adjusts the alpha value of the color from a continuous range, depending on the total count for that pixel compared to the others. So the result is an arbitrarily large set of colors, starting from the nominally discrete colormap like the 5 base colors used here:

image

Not sure if that's similar to what mpl supports or will support.

Collaborator

jbednar commented Jul 21, 2016

I agree that we'd want as much of the processing to use mpl's code as is practical, to help integrate it more easily into mpl users' workflows and make it more familiar to them. Let datashader do what datashader is best at, and let mpl handle the rest!

mpl's discrete colormap support may work for categorical information, but I don't know enough about it to be sure. datashader.tf.colorize() does use discrete colors, but it then (a) mixes those discrete colors according to the counts in each category for that pixel, and (b) adjusts the alpha value of the color from a continuous range, depending on the total count for that pixel compared to the others. So the result is an arbitrarily large set of colors, starting from the nominally discrete colormap like the 5 base colors used here:

image

Not sure if that's similar to what mpl supports or will support.

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Jul 21, 2016

There is not support of the catagorical blending (yet but we have been talking about generalizing the norm/colormap chain for a while now).

tacaswell commented Jul 21, 2016

There is not support of the catagorical blending (yet but we have been talking about generalizing the norm/colormap chain for a while now).

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Jul 21, 2016

Collaborator

Ok, then it sounds like supporting both NxMx1 and NxMx4 would be good in the meantime.

Collaborator

jbednar commented Jul 21, 2016

Ok, then it sounds like supporting both NxMx1 and NxMx4 would be good in the meantime.

@jbednar jbednar self-assigned this Sep 8, 2016

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Sep 9, 2016

Collaborator

I'd love to get matplotlib support into datashader. Any progress on addressing some of the issues above?

Collaborator

jbednar commented Sep 9, 2016

I'd love to get matplotlib support into datashader. Any progress on addressing some of the issues above?

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Sep 9, 2016

Sorry, I have been swamped with other work.

tacaswell commented Sep 9, 2016

Sorry, I have been swamped with other work.

@jbednar jbednar added the in progress label Sep 21, 2016

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 2, 2017

@tacaswell @jbednar Have there been any updates to this?

Looking at the travis output, it seems to work fine with python 3. I will see how far I can get with the example above.

StevenCHowell commented Feb 2, 2017

@tacaswell @jbednar Have there been any updates to this?

Looking at the travis output, it seems to work fine with python 3. I will see how far I can get with the example above.

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 2, 2017

I realize my question is more related to usage (and probably just demonstrates my unfamiliarity with datashader and matplotlib) and is not specifically related to accepting/updating this PR. This was just the best resource I found when searching how to use datashader with matplotlib. Let me know if you prefer I move this to stack overflow and I will delete this.

Following the above example, with DSArtist definition from this PR, I almost have what I need. I am not sure how to get matplotlib coloring to normalize the same way datashader does by default. It is so much more faint than 'eq_hist' (the default), 'log', and even 'linear'.

Here is what I get using datashader with 'eq_hist':
image

This is the much fainter version I get using matplotlib:
image
My end goal is to use matplotlib to add the axes, labels, and red success points (blue are the ~30,000 trials, red are the ~10 successes) then plot this using a script in a headless environment (on a remote server without X).

This screenshot shows what I have been trying. Any thoughts @tacaswell?

mpl_datashader_almost

StevenCHowell commented Feb 2, 2017

I realize my question is more related to usage (and probably just demonstrates my unfamiliarity with datashader and matplotlib) and is not specifically related to accepting/updating this PR. This was just the best resource I found when searching how to use datashader with matplotlib. Let me know if you prefer I move this to stack overflow and I will delete this.

Following the above example, with DSArtist definition from this PR, I almost have what I need. I am not sure how to get matplotlib coloring to normalize the same way datashader does by default. It is so much more faint than 'eq_hist' (the default), 'log', and even 'linear'.

Here is what I get using datashader with 'eq_hist':
image

This is the much fainter version I get using matplotlib:
image
My end goal is to use matplotlib to add the axes, labels, and red success points (blue are the ~30,000 trials, red are the ~10 successes) then plot this using a script in a headless environment (on a remote server without X).

This screenshot shows what I have been trying. Any thoughts @tacaswell?

mpl_datashader_almost

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 3, 2017

Collaborator

For now, this PR's discussion is fine as a place to collect anything about MPL support for datashader.

I'm surprised that you aren't seeing comparable results between MPL's shading and the linear shading in datashader. It would be good to post a side by side comparison using the same colormap and ranges with linear mapping; those should be at least very nearly the same regardless of who is doing the colormapping.

Collaborator

jbednar commented Feb 3, 2017

For now, this PR's discussion is fine as a place to collect anything about MPL support for datashader.

I'm surprised that you aren't seeing comparable results between MPL's shading and the linear shading in datashader. It would be good to post a side by side comparison using the same colormap and ranges with linear mapping; those should be at least very nearly the same regardless of who is doing the colormapping.

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Feb 4, 2017

As a side note, I am including a reference to this in mpl's GSoC ideas list.

tacaswell commented Feb 4, 2017

As a side note, I am including a reference to this in mpl's GSoC ideas list.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 5, 2017

Collaborator

Great! I'd be happy to work with a GSoC-er to help make this move forward. We are hoping to have funding for datashader start up again soon, and we'll be putting various functions in place that will help make it simpler to build legends, colorbars, etc. Those functions should help any downstream plotting library to summarize what's in the plot accurately and easily.

Collaborator

jbednar commented Feb 5, 2017

Great! I'd be happy to work with a GSoC-er to help make this move forward. We are hoping to have funding for datashader start up again soon, and we'll be putting various functions in place that will help make it simpler to build legends, colorbars, etc. Those functions should help any downstream plotting library to summarize what's in the plot accurately and easily.

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

I have had some difficulty using/defining matplotlib palettes as they are not as simple as the lists used by datashader and bokeh but I these should both be the reversed viridis palette. Also, I am not sure how to change the MPL shading scheme. The example above using norm=mcolors.LogNorm() which I took to be the same as datashader's how='log' option. So I used these for the plots below.

# some setup definitions
width = 6
height = width/3.0
x_range = [0, 40] 
y_range = [trial['energy'].min(), trial['energy'].max()]
w = int(width * 100)
h = int(height * 100)
# code for the datashader version
from bokeh.palettes import Viridis256 as palette
palette.reverse()
canvas = datashader.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='log')
img

image

# code for matplotlib version
fig, ax = plt.subplots(figsize=(width, height), dpi=400)
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), norm=mcolors.LogNorm(),
              cmap='viridis_r')
ax.add_artist(da)
ax.set_xlabel(r'$\chi^2$')
ax.set_ylabel(r'LR Docking Score')
ax.set_xlim(x_range)
ax.set_ylim(y_range)

plt.show()

image

I expected these to look essentially the same but the difference is obvious. I am not sure how to account for this but it is possible I simply do not understand how to use matplotlib well enough. In case I am not actually using logarithmic binning in MPL, here is the datashader plot using how='linear':
image
and hera is the datashader plot using how='eq_hist':
image

Overall, the matplotlib plot looks too sparse, like I should not need to use datashader. That said, here is what the plot looks like without datashader:

plt.plot(trial.chi2, trial.energy, '.', ms='.1')

image

For comparison, here are these same plots using bokeh.
bokeh_plot 1
bokeh_plot 2
bokeh_plot

StevenCHowell commented Feb 6, 2017

I have had some difficulty using/defining matplotlib palettes as they are not as simple as the lists used by datashader and bokeh but I these should both be the reversed viridis palette. Also, I am not sure how to change the MPL shading scheme. The example above using norm=mcolors.LogNorm() which I took to be the same as datashader's how='log' option. So I used these for the plots below.

# some setup definitions
width = 6
height = width/3.0
x_range = [0, 40] 
y_range = [trial['energy'].min(), trial['energy'].max()]
w = int(width * 100)
h = int(height * 100)
# code for the datashader version
from bokeh.palettes import Viridis256 as palette
palette.reverse()
canvas = datashader.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='log')
img

image

# code for matplotlib version
fig, ax = plt.subplots(figsize=(width, height), dpi=400)
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), norm=mcolors.LogNorm(),
              cmap='viridis_r')
ax.add_artist(da)
ax.set_xlabel(r'$\chi^2$')
ax.set_ylabel(r'LR Docking Score')
ax.set_xlim(x_range)
ax.set_ylim(y_range)

plt.show()

image

I expected these to look essentially the same but the difference is obvious. I am not sure how to account for this but it is possible I simply do not understand how to use matplotlib well enough. In case I am not actually using logarithmic binning in MPL, here is the datashader plot using how='linear':
image
and hera is the datashader plot using how='eq_hist':
image

Overall, the matplotlib plot looks too sparse, like I should not need to use datashader. That said, here is what the plot looks like without datashader:

plt.plot(trial.chi2, trial.energy, '.', ms='.1')

image

For comparison, here are these same plots using bokeh.
bokeh_plot 1
bokeh_plot 2
bokeh_plot

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

In addition to the general question on how to make the matplotlib version match the datashader version, I have two specific questions.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?
  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(trial, 'chi2', 'energy', agg=ds.count()), 
                                       cmap=['white', 'firebrick'], how='log')

image

StevenCHowell commented Feb 6, 2017

In addition to the general question on how to make the matplotlib version match the datashader version, I have two specific questions.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?
  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(trial, 'chi2', 'energy', agg=ds.count()), 
                                       cmap=['white', 'firebrick'], how='log')

image

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 6, 2017

Collaborator

In addition to the general question on how to make the matplotlib version match the datashader version

To answer that, can you please post the same size image from both mpl and datashader colormapping, using a grayscale colormap, with linear mapping? You might have already provided enough info above, but I can't find any pair of images that should truly be mathematically identical, which is always the safest place to start. Grayscale should be comparable across all libraries.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?

I'm not aware of any histogram equalization option in matplotlib or bokeh, or else we probably would have just used those instead of adding our own to datashader. It would be very convenient if plotting libraries would support eq_hist directly, which would make it simpler to have meaningful colorbars, legends, and hover information. MPL is welcome to steal our eq-hist code; it's only 15 lines of Numpy-based Python, adapted from scikit-image.

  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(df, 'x', 'y', agg=ds.count()), cmap=['white', 'firebrick'], how='log')

See: http://matplotlib.org/examples/pylab_examples/custom_cmap.html

Collaborator

jbednar commented Feb 6, 2017

In addition to the general question on how to make the matplotlib version match the datashader version

To answer that, can you please post the same size image from both mpl and datashader colormapping, using a grayscale colormap, with linear mapping? You might have already provided enough info above, but I can't find any pair of images that should truly be mathematically identical, which is always the safest place to start. Grayscale should be comparable across all libraries.

  • Is it possible to use equal histogram shading in matplotlib, the datashader option how='eq_hist'? If so, how would I do this?

I'm not aware of any histogram equalization option in matplotlib or bokeh, or else we probably would have just used those instead of adding our own to datashader. It would be very convenient if plotting libraries would support eq_hist directly, which would make it simpler to have meaningful colorbars, legends, and hover information. MPL is welcome to steal our eq-hist code; it's only 15 lines of Numpy-based Python, adapted from scikit-image.

  • How do I give matplotlib a colormap with a single color, or technically white and some other color? This seems to be the default for datashader but if I want to manually select the color I could define it simply using a list, e.g., cmap=['white', 'firebrick']:
transfer_functions.shade(canvas.points(df, 'x', 'y', agg=ds.count()), cmap=['white', 'firebrick'], how='log')

See: http://matplotlib.org/examples/pylab_examples/custom_cmap.html

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Feb 6, 2017

See http://matplotlib.org/users/colormapnorms.html for details of how the color mapping process in mpl works.

Also, turn the DPI down on the mpl plots, the spatial bins passed to datashader are set by the physical pixels in the axes.

tacaswell commented Feb 6, 2017

See http://matplotlib.org/users/colormapnorms.html for details of how the color mapping process in mpl works.

Also, turn the DPI down on the mpl plots, the spatial bins passed to datashader are set by the physical pixels in the axes.

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

Thanks for the links. Based on those, I created a black and white palette and colormap that should match

from matplotlib.colors import ListedColormap
palette = ['white', 'black']
cmap = ListedColormap(palette)

then defined the image size and plot ranges

width = 600  # in units of pixels
height = 300  # in units of pixels
x_range = [0, 40] 
y_range = [-100, 20]

then generated the datashader plot using how='linear'

canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  

datashader
and matplotlib plot using linear normalization: norm=mcolors.Normalize()

dpi = 100
x_inches = width/dpi
y_inches = height/dpi
fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi)
ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), 
              norm=mcolors.Normalize(), cmap=cmap)  
ax.add_artist(da)
plt.savefig('mpl.png', dpi=dpi, transparent=True)

mpl

Note that datashader uses pixel unit and matplotlib uses inches and dpi. You can see the code I used to convert between these and eliminate the axes on the matplotlib plot so the image uses the entire space.

I verified these are each 600x300 pixels

(datashader) ➜  odin: test/> file mpl.png datashader.png
mpl.png:        PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced
datashader.png: PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced

StevenCHowell commented Feb 6, 2017

Thanks for the links. Based on those, I created a black and white palette and colormap that should match

from matplotlib.colors import ListedColormap
palette = ['white', 'black']
cmap = ListedColormap(palette)

then defined the image size and plot ranges

width = 600  # in units of pixels
height = 300  # in units of pixels
x_range = [0, 40] 
y_range = [-100, 20]

then generated the datashader plot using how='linear'

canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range)
agg = canvas.points(trial, 'chi2', 'energy', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  

datashader
and matplotlib plot using linear normalization: norm=mcolors.Normalize()

dpi = 100
x_inches = width/dpi
y_inches = height/dpi
fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi)
ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])
da = DSArtist(ax, trial, 'chi2', 'energy', agg=datashader.count(), 
              norm=mcolors.Normalize(), cmap=cmap)  
ax.add_artist(da)
plt.savefig('mpl.png', dpi=dpi, transparent=True)

mpl

Note that datashader uses pixel unit and matplotlib uses inches and dpi. You can see the code I used to convert between these and eliminate the axes on the matplotlib plot so the image uses the entire space.

I verified these are each 600x300 pixels

(datashader) ➜  odin: test/> file mpl.png datashader.png
mpl.png:        PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced
datashader.png: PNG image data, 600 x 300, 8-bit/color RGBA, non-interlaced
@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

Here is an example using the sample data from the pipeline notebook:

import datashader
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

np.random.seed(1)
num=100000

dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num),
                                y=np.random.normal(y,s,num),
                                val=val,cat=cat))
         for x,y,s,val,cat in 
         [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]}

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")
palette = ['gray', 'black']  # white created issues with the low blending into the background
cmap = LinearSegmentedColormap.from_list('test', palette, N=256)

width = 300  # in units of pixels
height = 300  # in units of pixels
x_range = [-15, 15] 
y_range = [-15, 15]

canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range)
agg = canvas.points(df, 'x', 'y', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='log')
img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  # uncomment for linear
img

dpi = 100
x_inches = width/dpi
y_inches = height/dpi
fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi)
ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])

da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(), 
              norm=mcolors.LogNorm(), cmap=cmap)
# da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(),  # uncomment for linear
#              norm=mcolors.Normalize(), cmap=cmap)
ax.add_artist(da)
ax.set_xlim(x_range)
ax.set_ylim(y_range)
plt.savefig('mpl_sample_data.png', dpi=dpi, transparent=True)

Here are the logarithmic results.
datashader
ds_log_sample_data

matplotlib
mpl_log_sample_data

These do seem to match now.

Here are the linear results.
datashader
ds_linear_sample_data

matplotlib
mpl_linear_sample_data

These still do not match.

StevenCHowell commented Feb 6, 2017

Here is an example using the sample data from the pipeline notebook:

import datashader
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

np.random.seed(1)
num=100000

dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num),
                                y=np.random.normal(y,s,num),
                                val=val,cat=cat))
         for x,y,s,val,cat in 
         [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]}

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")
palette = ['gray', 'black']  # white created issues with the low blending into the background
cmap = LinearSegmentedColormap.from_list('test', palette, N=256)

width = 300  # in units of pixels
height = 300  # in units of pixels
x_range = [-15, 15] 
y_range = [-15, 15]

canvas = datashader.Canvas(plot_width=width, plot_height=height, x_range=x_range, y_range=y_range)
agg = canvas.points(df, 'x', 'y', agg=datashader.count())
img = datashader.transfer_functions.shade(agg, cmap=palette, how='log')
img = datashader.transfer_functions.shade(agg, cmap=palette, how='linear')  # uncomment for linear
img

dpi = 100
x_inches = width/dpi
y_inches = height/dpi
fig = plt.figure(figsize=(x_inches, y_inches), dpi=dpi)
ax = plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])

da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(), 
              norm=mcolors.LogNorm(), cmap=cmap)
# da = DSArtist(ax, df, 'x', 'y', agg=datashader.count(),  # uncomment for linear
#              norm=mcolors.Normalize(), cmap=cmap)
ax.add_artist(da)
ax.set_xlim(x_range)
ax.set_ylim(y_range)
plt.savefig('mpl_sample_data.png', dpi=dpi, transparent=True)

Here are the logarithmic results.
datashader
ds_log_sample_data

matplotlib
mpl_log_sample_data

These do seem to match now.

Here are the linear results.
datashader
ds_linear_sample_data

matplotlib
mpl_linear_sample_data

These still do not match.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 6, 2017

Collaborator

For matplotlib, I think you want a LinearSegmentedColormap (which interpolates between colors), not a ListedColormap (which uses only the provided colors).

Collaborator

jbednar commented Feb 6, 2017

For matplotlib, I think you want a LinearSegmentedColormap (which interpolates between colors), not a ListedColormap (which uses only the provided colors).

@tacaswell

This comment has been minimized.

Show comment
Hide comment
@tacaswell

tacaswell Feb 6, 2017

Or just use the existing gray color map.

tacaswell commented Feb 6, 2017

Or just use the existing gray color map.

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

It is not clear how to use the same color map for both datashader, which takes a list of strings, and matplotlib, which takes a more custom format. This may be the root problem.

I changed to using LinearSegmentedColormap.from_list('name', palette, N=100) and they still differ. I am not sure what N to use to make the two colormaps comparable.

I will update my comment above using the sample data to reflect this new colormap definition and the generated plots.

StevenCHowell commented Feb 6, 2017

It is not clear how to use the same color map for both datashader, which takes a list of strings, and matplotlib, which takes a more custom format. This may be the root problem.

I changed to using LinearSegmentedColormap.from_list('name', palette, N=100) and they still differ. I am not sure what N to use to make the two colormaps comparable.

I will update my comment above using the sample data to reflect this new colormap definition and the generated plots.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 6, 2017

Collaborator

Datashader uses a 256-color palette, so in this case N=256 (or use "gray" as suggested).

Collaborator

jbednar commented Feb 6, 2017

Datashader uses a 256-color palette, so in this case N=256 (or use "gray" as suggested).

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

Using LinearSegmentedColormap.from_list('name', palette, N=256) I was able to get the logarithmic plots to match. While the linear plots do not match, neither looks helpful anyway. Since matplotlib does not offer equalized histogram (yet?), I think the logarithmic option with this colormap definition provides the best solution. I will try this with the real data.

StevenCHowell commented Feb 6, 2017

Using LinearSegmentedColormap.from_list('name', palette, N=256) I was able to get the logarithmic plots to match. While the linear plots do not match, neither looks helpful anyway. Since matplotlib does not offer equalized histogram (yet?), I think the logarithmic option with this colormap definition provides the best solution. I will try this with the real data.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 6, 2017

Collaborator

Looks like the mpl linear version isn't respecting the NaN mask in the same way, but it's good to see the logarithmic versions matching.

Collaborator

jbednar commented Feb 6, 2017

Looks like the mpl linear version isn't respecting the NaN mask in the same way, but it's good to see the logarithmic versions matching.

@StevenCHowell

This comment has been minimized.

Show comment
Hide comment
@StevenCHowell

StevenCHowell Feb 6, 2017

Another quirk of either matplotlib or jupyter is the plot looks very different in jupyter from the saved version. I just repeated this with my real data and in jupyter the colors went back to incredibly faint. (note that I want to use white and blue but this shows red and blue because I worried the white was not visible)

Here is the jupyter version (right click and save)
mpl_log_real_data_jupyter

and here is the version saved right before running plt.show()
mpl_log_real_data

then here is the datashader version
ds_log_real_data

StevenCHowell commented Feb 6, 2017

Another quirk of either matplotlib or jupyter is the plot looks very different in jupyter from the saved version. I just repeated this with my real data and in jupyter the colors went back to incredibly faint. (note that I want to use white and blue but this shows red and blue because I worried the white was not visible)

Here is the jupyter version (right click and save)
mpl_log_real_data_jupyter

and here is the version saved right before running plt.show()
mpl_log_real_data

then here is the datashader version
ds_log_real_data

@philippjfr

This comment has been minimized.

Show comment
Hide comment
@philippjfr

philippjfr Feb 6, 2017

Collaborator

Looks to me like this is happening because different renderers are drawing at different resolutions. My hypothesis is that the inline backend is using the hi-dpi option (presumably because you have a macbook) and therefore sampling at a higher resolution than you get when saving the datashader plot directly or using matplotlib to save it.

Collaborator

philippjfr commented Feb 6, 2017

Looks to me like this is happening because different renderers are drawing at different resolutions. My hypothesis is that the inline backend is using the hi-dpi option (presumably because you have a macbook) and therefore sampling at a higher resolution than you get when saving the datashader plot directly or using matplotlib to save it.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 7, 2017

Collaborator

Right -- the results will vary a lot at different resolutions, by design, though you can use tf.spread() or tf.dyn_spread() to ensure that individual dots are visible at high resolutions.

Collaborator

jbednar commented Feb 7, 2017

Right -- the results will vary a lot at different resolutions, by design, though you can use tf.spread() or tf.dyn_spread() to ensure that individual dots are visible at high resolutions.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Feb 15, 2017

Collaborator

BTW, note that recent dev releases of HoloViews now support datashader, with matplotlib or any other backend. Here's an example: https://anaconda.org/jbednar/census-hv-mpl/notebook

Collaborator

jbednar commented Feb 15, 2017

BTW, note that recent dev releases of HoloViews now support datashader, with matplotlib or any other backend. Here's an example: https://anaconda.org/jbednar/census-hv-mpl/notebook

@maartenbreddels

This comment has been minimized.

Show comment
Hide comment
@maartenbreddels

maartenbreddels Apr 13, 2017

I noticed this thread on twitter, it reminded me to put in a matplotlib backend for vaex based in ipympl. The code lives here and might be useful for this discussion, since it tries to attach a similar problem. What might be useful is the debounced decorate that I use for instance here that will only execute after 0.5 seconds have passed, to avoid many update when moving and zooming. It only works when there is an ipykernel, for Qt you need a different debounce method (should have that code somewhere).

maartenbreddels commented Apr 13, 2017

I noticed this thread on twitter, it reminded me to put in a matplotlib backend for vaex based in ipympl. The code lives here and might be useful for this discussion, since it tries to attach a similar problem. What might be useful is the debounced decorate that I use for instance here that will only execute after 0.5 seconds have passed, to avoid many update when moving and zooming. It only works when there is an ipykernel, for Qt you need a different debounce method (should have that code somewhere).

@stonebig

This comment has been minimized.

Show comment
Hide comment
@stonebig

stonebig Jul 22, 2017

datashader is not made available on pypi, a solution that could also support "pypi-compatible" alternatives, like "mpl-scatter-density", would be great.

stonebig commented Jul 22, 2017

datashader is not made available on pypi, a solution that could also support "pypi-compatible" alternatives, like "mpl-scatter-density", would be great.

@ruiyangl

This comment has been minimized.

Show comment
Hide comment
@ruiyangl

ruiyangl Aug 22, 2018

I couldn't do:
import from datashader.mpl_ext import DSArtist
here's the error:
DLL load failed: The specified module could not be found.

ruiyangl commented Aug 22, 2018

I couldn't do:
import from datashader.mpl_ext import DSArtist
here's the error:
DLL load failed: The specified module could not be found.

@jbednar

This comment has been minimized.

Show comment
Hide comment
@jbednar

jbednar Aug 22, 2018

Collaborator

@ruiyangl, to get this experimental code, you would have to check out the branch of datashader associated with this PR, and use that instead of any released datashader version. We'd be happy to merge support for Matplotlib into Datashader whenever this PR can be completed. In the meantime, you can use HoloViews+Matplotlib to see static Datashader output inside a Matplotlib plot as mentioned above.

Collaborator

jbednar commented Aug 22, 2018

@ruiyangl, to get this experimental code, you would have to check out the branch of datashader associated with this PR, and use that instead of any released datashader version. We'd be happy to merge support for Matplotlib into Datashader whenever this PR can be completed. In the meantime, you can use HoloViews+Matplotlib to see static Datashader output inside a Matplotlib plot as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment