Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale_color_manual: specified breaks do not correspond in order to specified values #445

Closed
jwhendy opened this issue Oct 5, 2020 · 2 comments

Comments

@jwhendy
Copy link

jwhendy commented Oct 5, 2020

I noticed unexpected behavior and wanted to bring it here for comment/input.

Toy example, default behavior:

tmp = pd.DataFrame({'x': [1, 2, 3],
                    'y': [4, 5, 6],
                    'col': [True, True, False]})

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p

grab_2020-10-05_145533

Changing the order of the breaks (colors stay the same):

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_discrete(name='col', breaks=[True, False])
p

grab_2020-10-05_145707

In adding custom labels, the order follows the breaks:

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_discrete(name='col', breaks=[True, False], labels=['true is blue', 'false is red'])
p

grab_2020-10-05_145819

If you want to specify custom colors, however, values appears to follow original (presumably 'factor()` order):

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_manual(name='col', breaks=[True, False], labels=['true is blue', 'false is red'], values=['red', 'black'])
p

grab_2020-10-05_153337

Using limits instead of breaks fixes this:

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_manual(name='col',
                           limits=[True, False],
                           labels=['true is blue', 'false is red'],
                           values=['blue', 'red'])
p

This is not the ggplot2 behavior, however.

library(ggplot2)
tmp <- data.frame(x=c(1, 2, 3), y=c(4, 5, 6), col=c(T, T, F))
ggplot(tmp, aes(x=x, y=y, color=col)) + geom_point() + scale_color_manual("col", breaks=c(T, F), labels=c("true is blue", "false is red"), values=c("blue", "red"))

grab_2020-10-05_154452

I tend to use limits to, say, exclude something from a range, using breaks/labels to change labels, with the assumption that values should correspond to break order, as is specified by ggplot2 docs:

The values will be matched in order (usually alphabetical) with the limits of the scale, or with breaks if provided.

If this is the intended behavior, maybe a note in the docs would help understand what's ordered by breaks vs. what's ordered by the underlying, say, pd.Categorical order. Many thanks for taking a look.

has2k1 added a commit that referenced this issue Oct 6, 2020
@has2k1 has2k1 closed this as completed in 7398758 Oct 6, 2020
@jwhendy
Copy link
Author

jwhendy commented Oct 6, 2020

You are a machine! Thanks so much.

Btw, I just made a submission to r/dataisbeautiful on an analysis of 3.1 million reddit comments. plotnine generated the scatter plots used, which I think it turned out really nice :) Thanks for continuing to provide and maintain the best plotting library in python. I tell everyone at work about it any chance I get.

@has2k1
Copy link
Owner

has2k1 commented Oct 6, 2020

Btw, I just made a submission to r/dataisbeautiful on an analysis of 3.1 million reddit comments. plotnine generated the scatter plots used, which I think it turned out really nice :)

Nice plots there. It would be interesting to see how the vary over time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants