Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that dataframe with multiple columns is categorically shaded #759

Merged
merged 9 commits into from
Jul 29, 2022

Conversation

philippjfr
Copy link
Member

Ensures wide charts, i.e. those with with multiple y-values, are categorically datashaded:

pd._testing.makeTimeDataFrame().hvplot.line(datashade=True)

bokeh_plot - 2022-05-29T174639 395

@maximlt
Copy link
Member

maximlt commented Jun 24, 2022

@philippjfr could you add some comments to make it easier to understand the long conditions? 🙏

The tests failed because of a transient issue with mamba.

@maximlt maximlt added this to the 0.8.1 milestone Jul 18, 2022
@maximlt
Copy link
Member

maximlt commented Jul 24, 2022

@philippjfr I've added a test, feel free to merge as is or to add a comment on that pretty long if condition 🙃

@maximlt
Copy link
Member

maximlt commented Jul 25, 2022

Actually there are some failing tests, I'll check them.

hvplot/converter.py Outdated Show resolved Hide resolved
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test is still failing.

hvplot/converter.py Outdated Show resolved Hide resolved
hvplot/converter.py Outdated Show resolved Hide resolved
@jbednar
Copy link
Member

jbednar commented Jul 25, 2022

Ensures wide charts, i.e. those with with multiple y-values, are categorically datashaded:

This sounds good and I think it's the most useful default. Note that datashader can happily render hundreds of thousands of columns, but it will run out of colors for them after 22 columns by default. Even with Glasbey categorical maps from Colorcet, it will run out after 256, raising a fatal error. It might be good to catch that error in hvPlot and either drop back to non-categorical with a warning, or at least raise a useful error telling people what to do:

Too many columns for the number of colors in the `color_key` ({len(color_key)}; 
reverting to a non-categorical plot. This warning can be suppressed by specifying 
`categorical=False` (??) or by supplying a `color_key` with at least {len(columns)} 
colors.

@maximlt
Copy link
Member

maximlt commented Jul 26, 2022

Indeed this error ValueError: Insufficient colors provided (256) for the categorical fields available (299) is raised by datashader when this code is executed:

import numpy as np
import pandas as pd
import hvplot.pandas

df = pd.DataFrame(np.random.rand(10, 300))
df.columns = [str(c) for c in df.columns]

df.hvplot(x=df.columns[0], y=list(df.columns)[1:], datashade=True)

I'm not sure how I would catch this error at the hvPlot level as it's raised in the plotting machinery of HoloViews, I'd be interested to know if and how that could be achieved, even just out of curiosity. @jlstevens can maybe help?

If a categorical plot is obtained with self.by I think that hvPlot doesn't hold the list of categories, instead it delegates that to HoloViews. Knowing the categories is required to emit the warning you suggest, as their length needs to be computed. The code would need to handle all the ways a categorical plot can be obtained, and the different types of data structures hvPlot supports (e.g. a Dask DataFrame). Is it worth the trouble?

@jlstevens
Copy link
Collaborator

@maximlt is correct that the error is raised when the display machinery is activated at the HoloViews level so I don't think hvplot can directly catch the error.

One option is to try to catch the conditions where such an error is expected, to raise a better error message early. The other option is to catch the error at the holoviews level and raise a better error message there instead (using language appropriate for hvplot users as well).

@philippjfr
Copy link
Member Author

Right, shouldn't be hard to count the colors in the colormap/color_key and the number of columns in the NdOverlay that is being shaded and error/warn when there are more categories than there are colors. This is slightly more awkward when using by because computing the distinct categories in a dask dataframe can be quite expensive.

@jbednar
Copy link
Member

jbednar commented Jul 27, 2022

If a categorical plot is obtained with self.by I think that hvPlot doesn't hold the list of categories, instead it delegates that to HoloViews.

Good point. Maybe HoloViews can then create a similar warning or error?

@jlstevens
Copy link
Collaborator

Maybe HoloViews can then create a similar warning or error?

Sure! Any PR improving the errors in HoloViews is welcome, as long as the message is helpful to people using HoloViews directly, as well as hvplot users.

@maximlt
Copy link
Member

maximlt commented Jul 29, 2022

Since it seems that the warning could be upstreamed to HoloViews, I will merge this PR as is and open an issue in HoloViews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants