Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom error bars for factorplot/bar plot? #question #enhancement #331

Closed
szeitlin opened this issue Oct 15, 2014 · 19 comments
Closed

custom error bars for factorplot/bar plot? #question #enhancement #331

szeitlin opened this issue Oct 15, 2014 · 19 comments

Comments

@szeitlin
Copy link

Missing from the tutorial/documentation, not sure if this is doable or how to do it:
dataframe -> factor plot #works great

How do I get it to use one of my other columns containing the errors (calculated elsewhere)?

Tried following matplotlib instructions for 'tweaking', doesn't work. Can't find good examples.

@mwaskom
Copy link
Owner

mwaskom commented Oct 16, 2014

This isn't functionality that is currently possible to do in any easy way, and it would be enough of a departure from the way things currently work that it isn't likely to be added any time soon. Of course, there is plt.errorbar and plt.bar has a yerr argument which should do what you want.

@szeitlin
Copy link
Author

Ah, that's what I suspected.

I have tried plt.errorbar and plt.bar... the truth is, I hate
matplotlib. It's just UGLY.

Anyway thanks for the fast reply! I really like seaborn.

Sam

Michael Waskom mailto:notifications@github.com
October 15, 2014 at 5:43 PM

This isn't functionality that is currently possible to do in any easy
way, and it would be enough of a departure from the way things
currently work that it isn't likely to be added any time soon. Of
course, there is |plt.errorbar| and |plt.bar| has a |yerr| argument
which should do what you want.


Reply to this email directly or view it on GitHub
#331 (comment).

@mwaskom
Copy link
Owner

mwaskom commented Oct 16, 2014

Just out of curiosity is there some major difference in the way you're calculating your error bars that can't be covered by what's in factorplot?

@szeitlin
Copy link
Author

No, I don't think so. I just happened to have a situation where I needed
to collapse a few categories into one (some genius made this table so it
had '0', 'None' and 'Unknown' as categories in the same column (!).

And I already had calculated the descriptive statistics using the
revised list of categories, so what I ended up with was a small df with
something like ('category names', 'means', 'errors') and I just want to
plot that.

Anyway, I think I can make the plot I want, I just have to re-factor the
categories in the original dataframe, which is not a big deal.

Sam

Michael Waskom mailto:notifications@github.com
October 16, 2014 at 11:17 AM

Just out of curiosity is there some major difference in the way you're
calculating your error bars?


Reply to this email directly or view it on GitHub
#331 (comment).

@mwaskom mwaskom closed this as completed Oct 17, 2014
@linbug
Copy link

linbug commented Mar 22, 2015

Hi Michael,

Related to this question, as I understand it the confidence intervals for factorplot and barplot are calculated using a bootstrapping procedure, and are therefore suitable for use with small or nonparametric datasets. Is this correct? (apologies if this is stated somewhere, but I can't find the method for their generation explicitly written in the docs)

Thanks,
Lin

@mwaskom
Copy link
Owner

mwaskom commented Mar 22, 2015

Correct, all the confidence intervals are computed using the bootstrap.

@alkalait
Copy link

I see two ways to using precomputed errors on a seaborn.barplot by plotting the errorbars directly onto the existing axis:

import matplotlib.pyplot as plt
plt.errorbar(x=x, y=y, fmt='none', xerror=error, ecolor='k', elinewidth=2)

or

seaborn.barplot(..., **{'xerr':error})

In the former, the fmt='none' option draws just the errorbars. The latter passes arguments to the underlying plt.bar caller, but is less flexible, e.g. does not accept an elinewidth parameter.

The only aesthetic difference I can see to a pure seaborn solution, is the round barline capstyle. errorbar returns a Linecollection object, but it doesn't seem to store/control capstyle information.

@mwaskom
Copy link
Owner

mwaskom commented Aug 25, 2015

It's redundant to use the ** operator and a dictionary, you can just do xerr=error. This won't work for nested bars, by the way.

@nileracecrew
Copy link

One use case for custom errors is large datasets where bootstrap confidence intervals take a long time to compute, and sensible CIs can be computed by other means.

@buddrball
Copy link

This thread was very helpful. Thank you, everyone. Out of curiosity, in my field we frequently have very few replicates (n = 3 - 5) because of difficulty/expense per replicate. How well does bootstrapping work for representing error of very small datasets? I typically use standard deviation rather than standard error to avoid of over-stating the significance of my data. Does anyone have a recommendation on how to best use bootstrapping in this case? Thanks for your time. And Seaborn rocks.

@mwaskom
Copy link
Owner

mwaskom commented Jan 17, 2017

If you have an N = 3-5, why not just show all the datapoints?

@domenico-somma
Copy link

Hi. Thanks for seaborn. I love it.

I have an issue with the custom error bars
I have a dataframe like:
df

n gene_short_name Sample Average CI
0 IL6 Sample 1 0.000000 0.000000
1 IL17RC Sample 1 2.409620 0.756890
2 CCL15 Sample 1 0.000000 0.000000
3 IL6 Sample 2 1.000000 0.900000
4 IL17RC Sample 2 2.556600 0.781060
5 CCL15 Sample 2 0.000000 0.000000
6 IL6 Sample 3 0.029063 0.029063
7 IL17RC Sample 3 2.093810 0.682930
8 CCL15 Sample 3 0.000000 0.000000
9 IL6 Sample 4 0.016800 0.016800
10 IL17RC Sample 4 1.727750 0.594910
11 CCL15 Sample 4 1.000000 0.800000

And I can plot easily with
g = sns.factorplot(x="Sample", y="Average", hue="gene_short_name", data=df)

1

If I understood correctly, there is not an easy way to plot the error bars.

Here I found that a possible workaround could be:

  • Find the x,y coordinates for each point
x_coords = []
   y_coords = []
   for point_pair in ax.collections:
       for x, y in point_pair.get_offsets():
           x_coords.append(x)
           y_coords.append(y)
  • Calculate the type of error to plot as the error bars, Make sure the order is the same as the points were looped over
errors = tips.groupby(['smoker', 'sex']).std()['tip']
colors = ['steelblue']*2 + ['coral']*2
ax.errorbar(x_coords, y_coords, yerr=errors,
    ecolor=colors, fmt=' ', zorder=-1)

But I am not sure about the order. Is it the Reverse Alphabetical Order for hue?

I tried plt.errorbar as explained here, but of course didn't work

g = sns.FacetGrid(df, col="gene_short_name")
g.map(plt.errorbar, "Sample", "Average", yerr="CI")

ValueError: could not convert string to float: 'Sample 4'

Am I missing something? Any suggestion?
Thanks

@m-beau
Copy link

m-beau commented Dec 5, 2018

@mwaskom

It's redundant to use the ** operator and a dictionary, you can just do xerr=error. This won't work for nested bars, by the way.

Indeed, I cannot feed in something like yerr=[1,2,1,1] to a barplot displaying 4 bars for instance. Has this issue been addressed? It is very sad to be forced to use matplotlib to plot a barplot with pre-calculated errorbars.

@mwaskom
Copy link
Owner

mwaskom commented Dec 5, 2018

That doesn’t sound sad at all. That sounds like using the correct tool for its intended purpose.

@m-beau
Copy link

m-beau commented Dec 5, 2018

Sorry but I do not understand, why wouldn't seaborn (or actually pyplot if I get it right) barplot function argument 'yerr' be the correct tool to plot custom nested error bars?

I simply meant that seaborn is much much nicer than matplotlib, hence it is sad to have to use matplotlib to get custom error bars, I was criticizing matplotlib rather than your great module :)

@loic001
Copy link

loic001 commented Dec 5, 2019

@mwaskom That doesn’t sound sad at all. That sounds like using the correct tool for its intended purpose.

My opinion is that seaborn is a (general purpose) visualization tool, it is not up to seaborn to calculate the confidence intervals. Yeah, it's practical to be able to do so and it is a plus, but it should above all be able to display the given data. No ?

@romanwerpachowski
Copy link
Contributor

romanwerpachowski commented Aug 5, 2020

If the confidence interval is symmetric, the data can be massaged pretty easily to work around the lack of this feature:

data_up = data.copy()
data_down = data.copy()
data_up["value"] = data_up["value"] + data_up["error"]
data_down["value"] = data_down["value"] - data_down["error"]
data = pd.concat([data_up, data_down])
sns.barplot(x="category", y="value", hue="group", data=data)

And Bob's your uncle.

Repository owner locked as resolved and limited conversation to collaborators Aug 5, 2020
@mwaskom
Copy link
Owner

mwaskom commented Aug 5, 2020

I would really encourage everyone who wants to add errorbars from prespecified error values to use matplotlib, a tool that handles this use case well, rather than trying to hack functionality from seaborn that is intended to do something different.

@mwaskom
Copy link
Owner

mwaskom commented Jan 23, 2021

#2407

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants