Skip to content

string or categorical integer values considered as numeric in strip/box/violin plots #1918

@DrGFreeman

Description

@DrGFreeman

Issue:

When using integer values as categorical variable in a strip / box / violin plot, the values of the categorical variable are mapped to a continuous numeric axis even if the values are of string or pd.Categorical type.

Example:

We create a dataframe with columns names having an integer value in string format. These could be any category that makes sense to the specific business case (e.g. product code, etc.)

import numpy as np
import pandas as pd
import plotly_express as px

n = 50

df = pd.DataFrame({
    '1': np.random.normal(2, .3, n),
    '2': np.random.lognormal(.5, .2, n),
    '34': np.random.triangular(0, 2, 3, n),
    '123': np.random.uniform(1, 3, n)
})

We unpivot the data using and make a strip plot.

df1 = df.melt()

px.strip(df1, x='variable', y='value')

string

The categorical variable (values '1', '2', '34' and '123') get mapped to a continuous numeric scale. Here, the variables '1' and '2' blend together and this can get worst if there are orders of magnitude between the different values.

Converting the string values to pd.Categorical type yields the same result as above.

df2 = df.melt()
df2.variable = pd.Categorical(df2.variable)

px.strip(df2, x='variable', y='value')

Workaround:

Adding a character to the values makes them be recognized as categorical which is the expected result (except for the added character in the category names). Unfortunately, adding a blank space does not work either.

df3 = df.copy()
df3.columns = [f"c{c}" for c in df3.columns]

px.strip(df3.melt(), x='variable', y='value')

string+char

Considering that numeric categorical values are legitimate in many contexts, it should be possible to use numbers as categories if they are represented by a string or categorical data type, as is the case with the color parameter (plotly/plotly_express#140):

px.strip(df.melt(), y='value', color='variable')

color

Thanks!

Package              Version  
-------------------- --------- 
plotly               4.1.1    
plotly-express       0.4.1    

@emmanuelle, this is one of the two issues we discussed at the PyData meetup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3backlogbugsomething broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions