Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default style for more than 6 data series #1513

Closed
stevenwong opened this issue Jul 23, 2018 · 8 comments · Fixed by #2075
Closed

Default style for more than 6 data series #1513

stevenwong opened this issue Jul 23, 2018 · 8 comments · Fixed by #2075

Comments

@stevenwong
Copy link

stevenwong commented Jul 23, 2018

Hi all,

Just using the lineplot example but extending this to 8 series:

rs = np.random.RandomState(365)
values = rs.randn(365, 8).cumsum(axis=0)
dates = pd.date_range("1 1 2016", periods=365, freq="D")
data = pd.DataFrame(values, dates, columns=["A", "B", "C", "D", "E", "F", "G", "H"])
data = data.rolling(7).mean()

sns.lineplot(data=data, palette="tab10", linewidth=2.5)

I get this exception, which seems to indicate that I need to specify styles for data series beyond the 6th.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-272055c27972> in <module>()
----> 1 sns.lineplot(data=data, palette="tab10", linewidth=2.5)

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size
_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, s
ort, err_style, err_kws, legend, ax, **kwargs)
   1076         dashes=dashes, markers=markers, style_order=style_order,
   1077         units=units, estimator=estimator, ci=ci, n_boot=n_boot,
-> 1078         sort=sort, err_style=err_style, err_kws=err_kws, legend=legend,
   1079     )
   1080

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
__init__(self, x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes
, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_b
oot, sort, err_style, err_kws, legend)
    670         self.parse_hue(plot_data["hue"], palette, hue_order, hue_norm)
    671         self.parse_size(plot_data["size"], sizes, size_order, size_norm)

--> 672         self.parse_style(plot_data["style"], markers, dashes, style_orde
r)
    673
    674         self.units = units

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
parse_style(self, data, markers, dashes, order)
    492
    493             dashes = self.style_to_attributes(
--> 494                 levels, dashes, self.default_dashes, "dashes"
    495             )
    496

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
style_to_attributes(self, levels, style, defaults, name)
    303             if any(missing_levels):
    304                 err = "These `style` levels are missing {}: {}"
--> 305                 raise ValueError(err.format(name, missing_levels))
    306
    307         return attrdict

ValueError: These `style` levels are missing dashes: {'H', 'G'}

Is there a way to automatically set style for large datasets?

[Edit] Seems like it's because relational.py:30 defines 6 default dashes. Setting sns.lineplot(data=data, dashes=False) fixes the issue.

@mwaskom
Copy link
Owner

mwaskom commented Jul 23, 2018

So there's a few related but distinguishable questions here:

  1. Should there be more than 6 default dash styles?
  2. What should happen when there are more than style levels than dash patterns?
  3. Given the constraints on the style semantic, should it be applied by default to "wide-form" data (where you haven't explicitly asked for it?

This kind of issue is what I was talking about in the release notes when I said "default behavior may change" because i'm interested in hearing what people find surprising or annoying.

My current thoughts are

  1. I don't think there have to be exactly 6, but that's about the number that I felt could be reliably distinguished in a variety of plots. Users can specify larger dash sets that are tailored to their specific visualizations, but I'm not wild about having defaults that don't really provide useful information.
  2. Things in matplotlib cycle, but things in seaborn generally don't. (i.e. with hue you'll always get unique colors). Can't handle more than 6 styles #1511 suggested cycling different numbers of dashes/markers sets to get a relatively large number of unique combinations. That's a clever suggestion, but I'm not sure it's the best approach because for most datasets the dashes and markers go together and it might be confusing that at some point they become independent. So I'm open to better ideas but it seemed best to start with the most disruptive response (raising an exception) and then possibly scaling back than going in the opposite direction.
  3. This is just a balance between "by default make maximally accessibly plots" and "by default try not to raise in a confusing way". I'm ambivalent and could be persuaded either way. Unfortunately the logic of how the functions work make it a little difficult to defer on whether there should be a style semantic until we know how many style levels are needed.

@stevenwong
Copy link
Author

Apologies I didn't notice the other issue. For my purposes, I didn't need dashes so I can just turn it off, but I think at least a more informative error message would be nice.

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount? Or turning dashes off by default so that users won't be surprised with an error if they are just trying out a generic plot.

@mwaskom
Copy link
Owner

mwaskom commented Jul 24, 2018

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount?

It's possible in principle but I'm not convinced it's a good idea. So e.g. you can programmatically generate markers with an arbitrary number of sides. But it's really hard to discriminate exactly how many sides the markers have above ~5. And once you get above ~7 it's very hard to tell that it's a polygon and not a circle. Similarly, you could programmatically generate dashes with slightly longer and longer solid segments, but for most plots it will be impossible to tell which is which.

These points depends on things like size and density, so it's possible to make a custom plot that works. But I don't think it's a great approach to defaults.

@jemshit
Copy link

jemshit commented Mar 30, 2019

If somebody is trying to add additional styles to default 6 styles:

dash_styles = ["",
               (4, 1.5),
               (1, 1),
               (3, 1, 1.5, 1),
               (5, 1, 1, 1),
               (5, 1, 2, 1, 2, 1),
               (2, 2, 3, 1.5),
               (1, 2.5, 3, 1.2)]

sns.relplot(...,  dashes=dash_styles,...)

Styles tuple must have even number of elements (segment, gap)

@ventilator
Copy link

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

@craymichael
Copy link
Contributor

I think the error message is also a little unintuitive. Perhaps it could be changed to something clearer, e.g.

err = "These `style` levels do not have defaults {}: {}"

instead of

err = "These `style` levels are missing {}: {}"

At first, I thought the categorical column values were missing dashes in the strings, which was quite confusing.

An alternative possibility to this issue is to wrap around the dash_styles (e.g. index 6 becomes 0, 7 becomes 1, etc.) as long as a UserWarning is given.

@alemol
Copy link

alemol commented Feb 7, 2020

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

image

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}

Any other ideas?

@melvis02
Copy link

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

image

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}

Any other ideas?

For sns.lineplot, I'm just dynamically disabling markers if there would be more than six; example:

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=len(df['r_state'].drop_duplicates()) <= 6, data=df)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants