You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,dynspread
raw_data = [('Alice', 60, 'London', 5) ,
('Bob', 14, 'Delhi' , 7) ,
('Charlie', 66, np.NaN, 11) ,
('Dave', np.NaN,'Delhi' , 15) ,
('Eveline', 33, 'Delhi' , 4) ,
('Fred', 32, 'New York', np.NaN ),
('George', 95, 'Paris', 11)
]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])
x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories
# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
#Just to make clear how many points of a given category we have
print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})
# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)
datashaded=datashade(points,aggregator=ds.by(color)).opts(width=800, height=480)
(dynspread(datashaded)*color_points).opts(legend_position='right')
It produces the following picture:
Although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
It would be nice to provide the possiblilty to give a parameter like NA_color='gray' to the datashade-method. Also an option for not plotting NA-category-points at all would be nice with the same kind of interface. But that is less important.
The text was updated successfully, but these errors were encountered:
jbednar
changed the title
tell datashader to use a specific color for NA's when plotting categorial data
Tell datashader to use a specific color for NaNs in categorical data
Aug 23, 2021
I've transferred this issue to the Datashader repo since the code changes involved would be at the Datashader level. For current versions of Datashader, my advice would be to use Pandas to modify the data before plotting, either to replace the NaNs with 'Unknown' or 'Other', or to delete rows where the category is NaN. That way the data will either be clearly labeled or not included, as desired.
That said, there are some reasonable feature requests here for Datashader's shade() function:
Accept a separate nan_color value to use for NaN categorical values (presumably gray by default), as Bokeh provides already. Note that you will then need to handle NaN specially when you construct the legend as above, making sure that the legend shows what color is used for NaNs and has an appropriate label ('Unknown", "Missing", etc.)
Add a flag skip_nans or skip_missing_categories, defaulting to False, which silently drops points where the category information is not known.
Compared to the rest of Datashader, the shade() function is fairly self contained and is not terribly tricky for a new contributor to figure out. I'd be happy to review a PR adding either or both of these features.
I have very large dataset that I cannot plot directly using holoviews. I want to make a scatterplot with categorial data. Unfortunately my data is very sparse and many points have NA as category. I would like to make these points gray. Is there any way to make datashader know what I want to do?
I show you the way I do it now (as more or less proposed in https://holoviews.org/user_guide/Large_Data.html ). I provide you an example:
It produces the following picture:
Although there is just one person from Paris you see that the NA-person (Charlie) is also printed in purple, the color for Paris. Is there a way to make the dot gray? I have tried many plots and it seems like the NAs always take the color of the last item in the legend.
It would be nice to provide the possiblilty to give a parameter like
NA_color='gray'
to thedatashade
-method. Also an option for not plotting NA-category-points at all would be nice with the same kind of interface. But that is less important.The text was updated successfully, but these errors were encountered: