Skip to content

scale_color_manual: specified breaks do not correspond in order to specified values #445

@jwhendy

Description

@jwhendy

I noticed unexpected behavior and wanted to bring it here for comment/input.

Toy example, default behavior:

tmp = pd.DataFrame({'x': [1, 2, 3],
                    'y': [4, 5, 6],
                    'col': [True, True, False]})

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p

grab_2020-10-05_145533

Changing the order of the breaks (colors stay the same):

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_discrete(name='col', breaks=[True, False])
p

grab_2020-10-05_145707

In adding custom labels, the order follows the breaks:

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_discrete(name='col', breaks=[True, False], labels=['true is blue', 'false is red'])
p

grab_2020-10-05_145819

If you want to specify custom colors, however, values appears to follow original (presumably 'factor()` order):

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_manual(name='col', breaks=[True, False], labels=['true is blue', 'false is red'], values=['red', 'black'])
p

grab_2020-10-05_153337

Using limits instead of breaks fixes this:

p = ggplot(tmp, aes(x='x', y='y', color='col')) + geom_point()
p = p + scale_color_manual(name='col',
                           limits=[True, False],
                           labels=['true is blue', 'false is red'],
                           values=['blue', 'red'])
p

This is not the ggplot2 behavior, however.

library(ggplot2)
tmp <- data.frame(x=c(1, 2, 3), y=c(4, 5, 6), col=c(T, T, F))
ggplot(tmp, aes(x=x, y=y, color=col)) + geom_point() + scale_color_manual("col", breaks=c(T, F), labels=c("true is blue", "false is red"), values=c("blue", "red"))

grab_2020-10-05_154452

I tend to use limits to, say, exclude something from a range, using breaks/labels to change labels, with the assumption that values should correspond to break order, as is specified by ggplot2 docs:

The values will be matched in order (usually alphabetical) with the limits of the scale, or with breaks if provided.

If this is the intended behavior, maybe a note in the docs would help understand what's ordered by breaks vs. what's ordered by the underlying, say, pd.Categorical order. Many thanks for taking a look.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions