Tell datashader to use a specific color for NaNs in categorical data #1019

Noskario · 2021-08-23T18:59:08Z

I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?

I show you the way I do it now (as more or less proposed in https://holoviews.org/user_guide/Large_Data.html ). I provide you an example:

import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,dynspread



raw_data = [('Alice', 60, 'London', 5) ,
           ('Bob', 14, 'Delhi' , 7) ,
           ('Charlie', 66, np.NaN, 11) ,
           ('Dave', np.NaN,'Delhi' , 15) ,
           ('Eveline', 33, 'Delhi' , 4) ,
           ('Fred', 32, 'New York', np.NaN ),
           ('George', 95, 'Paris', 11)
            ]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])



x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories





# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
    #Just to make clear how many points of a given category we have
    print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})


# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)
datashaded=datashade(points,aggregator=ds.by(color)).opts(width=800, height=480)

(dynspread(datashaded)*color_points).opts(legend_position='right')

It produces the following picture:

Although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.

It would be nice to provide the possiblilty to give a parameter like NA_color='gray' to the datashade-method. Also an option for not plotting NA-category-points at all would be nice with the same kind of interface. But that is less important.

The text was updated successfully, but these errors were encountered:

jbednar · 2021-08-23T19:49:21Z

I've transferred this issue to the Datashader repo since the code changes involved would be at the Datashader level. For current versions of Datashader, my advice would be to use Pandas to modify the data before plotting, either to replace the NaNs with 'Unknown' or 'Other', or to delete rows where the category is NaN. That way the data will either be clearly labeled or not included, as desired.

That said, there are some reasonable feature requests here for Datashader's shade() function:

Accept a separate nan_color value to use for NaN categorical values (presumably gray by default), as Bokeh provides already. Note that you will then need to handle NaN specially when you construct the legend as above, making sure that the legend shows what color is used for NaNs and has an appropriate label ('Unknown", "Missing", etc.)
Add a flag skip_nans or skip_missing_categories, defaulting to False, which silently drops points where the category information is not known.

Compared to the rest of Datashader, the shade() function is fairly self contained and is not terribly tricky for a new contributor to figure out. I'd be happy to review a PR adding either or both of these features.

jbednar changed the title ~~tell datashader to use a specific color for NA's when plotting categorial data~~ Tell datashader to use a specific color for NaNs in categorical data Aug 23, 2021

jbednar transferred this issue from holoviz/holoviews Aug 23, 2021

maximlt added this to the wishlist milestone Nov 29, 2021

maximlt added the enhancement label Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tell datashader to use a specific color for NaNs in categorical data #1019

Tell datashader to use a specific color for NaNs in categorical data #1019

Noskario commented Aug 23, 2021

jbednar commented Aug 23, 2021 •

edited

Loading

Tell datashader to use a specific color for NaNs in categorical data #1019

Tell datashader to use a specific color for NaNs in categorical data #1019

Comments

Noskario commented Aug 23, 2021

jbednar commented Aug 23, 2021 • edited Loading

jbednar commented Aug 23, 2021 •

edited

Loading