Default style for more than 6 data series #1513

stevenwong · 2018-07-23T04:16:23Z

Hi all,

Just using the lineplot example but extending this to 8 series:

rs = np.random.RandomState(365)
values = rs.randn(365, 8).cumsum(axis=0)
dates = pd.date_range("1 1 2016", periods=365, freq="D")
data = pd.DataFrame(values, dates, columns=["A", "B", "C", "D", "E", "F", "G", "H"])
data = data.rolling(7).mean()

sns.lineplot(data=data, palette="tab10", linewidth=2.5)

I get this exception, which seems to indicate that I need to specify styles for data series beyond the 6th.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-272055c27972> in <module>()
----> 1 sns.lineplot(data=data, palette="tab10", linewidth=2.5)

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size
_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, s
ort, err_style, err_kws, legend, ax, **kwargs)
   1076         dashes=dashes, markers=markers, style_order=style_order,
   1077         units=units, estimator=estimator, ci=ci, n_boot=n_boot,
-> 1078         sort=sort, err_style=err_style, err_kws=err_kws, legend=legend,
   1079     )
   1080

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
__init__(self, x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes
, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_b
oot, sort, err_style, err_kws, legend)
    670         self.parse_hue(plot_data["hue"], palette, hue_order, hue_norm)
    671         self.parse_size(plot_data["size"], sizes, size_order, size_norm)

--> 672         self.parse_style(plot_data["style"], markers, dashes, style_orde
r)
    673
    674         self.units = units

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
parse_style(self, data, markers, dashes, order)
    492
    493             dashes = self.style_to_attributes(
--> 494                 levels, dashes, self.default_dashes, "dashes"
    495             )
    496

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
style_to_attributes(self, levels, style, defaults, name)
    303             if any(missing_levels):
    304                 err = "These `style` levels are missing {}: {}"
--> 305                 raise ValueError(err.format(name, missing_levels))
    306
    307         return attrdict

ValueError: These `style` levels are missing dashes: {'H', 'G'}

Is there a way to automatically set style for large datasets?

[Edit] Seems like it's because relational.py:30 defines 6 default dashes. Setting sns.lineplot(data=data, dashes=False) fixes the issue.

The text was updated successfully, but these errors were encountered:

mwaskom · 2018-07-23T16:31:24Z

So there's a few related but distinguishable questions here:

Should there be more than 6 default dash styles?
What should happen when there are more than style levels than dash patterns?
Given the constraints on the style semantic, should it be applied by default to "wide-form" data (where you haven't explicitly asked for it?

This kind of issue is what I was talking about in the release notes when I said "default behavior may change" because i'm interested in hearing what people find surprising or annoying.

My current thoughts are

I don't think there have to be exactly 6, but that's about the number that I felt could be reliably distinguished in a variety of plots. Users can specify larger dash sets that are tailored to their specific visualizations, but I'm not wild about having defaults that don't really provide useful information.
Things in matplotlib cycle, but things in seaborn generally don't. (i.e. with hue you'll always get unique colors). Can't handle more than 6 styles #1511 suggested cycling different numbers of dashes/markers sets to get a relatively large number of unique combinations. That's a clever suggestion, but I'm not sure it's the best approach because for most datasets the dashes and markers go together and it might be confusing that at some point they become independent. So I'm open to better ideas but it seemed best to start with the most disruptive response (raising an exception) and then possibly scaling back than going in the opposite direction.
This is just a balance between "by default make maximally accessibly plots" and "by default try not to raise in a confusing way". I'm ambivalent and could be persuaded either way. Unfortunately the logic of how the functions work make it a little difficult to defer on whether there should be a style semantic until we know how many style levels are needed.

stevenwong · 2018-07-24T05:59:07Z

Apologies I didn't notice the other issue. For my purposes, I didn't need dashes so I can just turn it off, but I think at least a more informative error message would be nice.

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount? Or turning dashes off by default so that users won't be surprised with an error if they are just trying out a generic plot.

mwaskom · 2018-07-24T15:43:33Z

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount?

It's possible in principle but I'm not convinced it's a good idea. So e.g. you can programmatically generate markers with an arbitrary number of sides. But it's really hard to discriminate exactly how many sides the markers have above ~5. And once you get above ~7 it's very hard to tell that it's a polygon and not a circle. Similarly, you could programmatically generate dashes with slightly longer and longer solid segments, but for most plots it will be impossible to tell which is which.

These points depends on things like size and density, so it's possible to make a custom plot that works. But I don't think it's a great approach to defaults.

jemshit · 2019-03-30T18:00:25Z

If somebody is trying to add additional styles to default 6 styles:

dash_styles = ["",
               (4, 1.5),
               (1, 1),
               (3, 1, 1.5, 1),
               (5, 1, 1, 1),
               (5, 1, 2, 1, 2, 1),
               (2, 2, 3, 1.5),
               (1, 2.5, 3, 1.2)]

sns.relplot(...,  dashes=dash_styles,...)

Styles tuple must have even number of elements (segment, gap)

ventilator · 2019-04-05T12:44:04Z

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

craymichael · 2019-06-28T03:52:04Z

I think the error message is also a little unintuitive. Perhaps it could be changed to something clearer, e.g.

err = "These `style` levels do not have defaults {}: {}"

instead of

err = "These `style` levels are missing {}: {}"

At first, I thought the categorical column values were missing dashes in the strings, which was quite confusing.

An alternative possibility to this issue is to wrap around the dash_styles (e.g. index 6 becomes 0, 7 becomes 1, etc.) as long as a UserWarning is given.

alemol · 2020-02-07T21:36:53Z

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}

Any other ideas?

melvis02 · 2020-03-25T19:18:26Z

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}
Any other ideas?

For sns.lineplot, I'm just dynamically disabling markers if there would be more than six; example:

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=len(df['r_state'].drop_duplicates()) <= 6, data=df)

mwaskom mentioned this issue Jul 23, 2018

Can't handle more than 6 styles #1511

Closed

mwaskom mentioned this issue May 15, 2020

Programmatically define arbitrarily large style mappings #2075

Merged

mwaskom closed this as completed in #2075 May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default style for more than 6 data series #1513

Default style for more than 6 data series #1513

stevenwong commented Jul 23, 2018 •

edited

mwaskom commented Jul 23, 2018

stevenwong commented Jul 24, 2018

mwaskom commented Jul 24, 2018

jemshit commented Mar 30, 2019 •

edited

ventilator commented Apr 5, 2019

craymichael commented Jun 28, 2019

alemol commented Feb 7, 2020 •

edited

melvis02 commented Mar 25, 2020

Default style for more than 6 data series #1513

Default style for more than 6 data series #1513

Comments

stevenwong commented Jul 23, 2018 • edited

mwaskom commented Jul 23, 2018

stevenwong commented Jul 24, 2018

mwaskom commented Jul 24, 2018

jemshit commented Mar 30, 2019 • edited

ventilator commented Apr 5, 2019

craymichael commented Jun 28, 2019

alemol commented Feb 7, 2020 • edited

melvis02 commented Mar 25, 2020

stevenwong commented Jul 23, 2018 •

edited

jemshit commented Mar 30, 2019 •

edited

alemol commented Feb 7, 2020 •

edited