-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string or categorical integer values considered as numeric in strip/box/violin plots #1918
Comments
Hey @DrGFreeman sure this is a valid concern. You can force the axis to be categorical by creating the plotly figure using the
which will force your axis to be categorical. Then of course there is the question whether we should impose this at the |
Thanks for the quick response @emmanuelle. I will use this tip for sure.
Of course, that is up to the core developers to decide. From the perspective of API consistency, if integers passed as strings to the |
@emmanuelle thanks for your solution, but when we are in a plot using |
@cvrnogueira If you use |
I agree, but for backwards-compatibility reasons throughout the API, we can't really change this at the moment: we've always accepted stringified numbers as numbers on positional axes like |
This is also an issue with |
Issue:
When using integer values as categorical variable in a strip / box / violin plot, the values of the categorical variable are mapped to a continuous numeric axis even if the values are of string or
pd.Categorical
type.Example:
We create a dataframe with columns names having an integer value in string format. These could be any category that makes sense to the specific business case (e.g. product code, etc.)
We unpivot the data using and make a strip plot.
The categorical variable (values '1', '2', '34' and '123') get mapped to a continuous numeric scale. Here, the variables '1' and '2' blend together and this can get worst if there are orders of magnitude between the different values.
Converting the string values to
pd.Categorical
type yields the same result as above.Workaround:
Adding a character to the values makes them be recognized as categorical which is the expected result (except for the added character in the category names). Unfortunately, adding a blank space does not work either.
Considering that numeric categorical values are legitimate in many contexts, it should be possible to use numbers as categories if they are represented by a string or categorical data type, as is the case with the
color
parameter (plotly/plotly_express#140):Thanks!
@emmanuelle, this is one of the two issues we discussed at the PyData meetup.
The text was updated successfully, but these errors were encountered: