Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string or categorical integer values considered as numeric in strip/box/violin plots #1918

Open
DrGFreeman opened this issue Nov 22, 2019 · 6 comments
Labels
bug something broken P3 backlog

Comments

@DrGFreeman
Copy link

Issue:

When using integer values as categorical variable in a strip / box / violin plot, the values of the categorical variable are mapped to a continuous numeric axis even if the values are of string or pd.Categorical type.

Example:

We create a dataframe with columns names having an integer value in string format. These could be any category that makes sense to the specific business case (e.g. product code, etc.)

import numpy as np
import pandas as pd
import plotly_express as px

n = 50

df = pd.DataFrame({
    '1': np.random.normal(2, .3, n),
    '2': np.random.lognormal(.5, .2, n),
    '34': np.random.triangular(0, 2, 3, n),
    '123': np.random.uniform(1, 3, n)
})

We unpivot the data using and make a strip plot.

df1 = df.melt()

px.strip(df1, x='variable', y='value')

string

The categorical variable (values '1', '2', '34' and '123') get mapped to a continuous numeric scale. Here, the variables '1' and '2' blend together and this can get worst if there are orders of magnitude between the different values.

Converting the string values to pd.Categorical type yields the same result as above.

df2 = df.melt()
df2.variable = pd.Categorical(df2.variable)

px.strip(df2, x='variable', y='value')

Workaround:

Adding a character to the values makes them be recognized as categorical which is the expected result (except for the added character in the category names). Unfortunately, adding a blank space does not work either.

df3 = df.copy()
df3.columns = [f"c{c}" for c in df3.columns]

px.strip(df3.melt(), x='variable', y='value')

string+char

Considering that numeric categorical values are legitimate in many contexts, it should be possible to use numbers as categories if they are represented by a string or categorical data type, as is the case with the color parameter (plotly/plotly_express#140):

px.strip(df.melt(), y='value', color='variable')

color

Thanks!

Package              Version  
-------------------- --------- 
plotly               4.1.1    
plotly-express       0.4.1    

@emmanuelle, this is one of the two issues we discussed at the PyData meetup.

@emmanuelle
Copy link
Contributor

Hey @DrGFreeman sure this is a valid concern. You can force the axis to be categorical by creating the plotly figure using the px function and then do

fig.update_layout(xaxis_type='category')

which will force your axis to be categorical. Then of course there is the question whether we should impose this at the plotly.express level...

@DrGFreeman
Copy link
Author

Thanks for the quick response @emmanuelle. I will use this tip for sure.

Then of course there is the question whether we should impose this at the plotly.express level...

Of course, that is up to the core developers to decide. From the perspective of API consistency, if integers passed as strings to the color parameter yield a categorical color scale (as opposed to a continuous one), I would intuitively expect to get similar results on an axis.

@cvrnogueira
Copy link

@emmanuelle thanks for your solution, but when we are in a plot using facet_col this just work for the last plot, not for all of them. Do you have any solution? Thanks

@nicolaskruchten
Copy link
Contributor

@cvrnogueira If you use fig.update_xaxes(type='category') it will apply to all facets.

@nicolaskruchten
Copy link
Contributor

Of course, that is up to the core developers to decide. From the perspective of API consistency, if integers passed as strings to the color parameter yield a categorical color scale (as opposed to a continuous one), I would intuitively expect to get similar results on an axis.

I agree, but for backwards-compatibility reasons throughout the API, we can't really change this at the moment: we've always accepted stringified numbers as numbers on positional axes like x and y

@harrybiddle
Copy link

This is also an issue with Figure.add_shape. Even when the axis has been set to be categorical and the x0/x1 is of string type, if it represents a number it is still interpreted as a coordinate rather than a category. I've searched through the API and I can't find a way around this... Any suggestions welcome.

@gvwilson gvwilson self-assigned this Jul 3, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added the P3 backlog label Aug 12, 2024
@gvwilson gvwilson changed the title px - string or categorical integer values considered as numeric in strip/box/violin plots string or categorical integer values considered as numeric in strip/box/violin plots Aug 12, 2024
@gvwilson gvwilson added the bug something broken label Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P3 backlog
Projects
None yet
Development

No branches or pull requests

6 participants