-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding support for repeated-measures error bars/ CIs (following Morey, 2008) #3723
Comments
This seems a bit specific to your domain. Is there anything preventing you from passing a callable to the errorbar parameter, which allows you to define how the error is computed ? |
My impression is that this would be a beneficial feature to almost also social science domains (and any clinical work that compares post-treatments to baselines), but I might be wrong. I looked into customizable error bar callables using the old API (I haven't migrated yet), but at least in the documentation it appears that the callables should only expect a 1D vector of data, and thus wouldn't be able to make adjustments that require richer vectors with information like a individual identifier. However, it's possible I've just been looking in the wrong place and/or haven't understood how flexible the callables can be. Here's an example that illustrates the stroop example above - if someone were able to provide a flexible solution to it, I'd be very happy to withdraw the issue!
|
Hm, indeed I have no obvious solution here (besides doing the statistical work outside of seaborn). I could probably whip up something for the objects interface though (which is probably the way to go; I believe that the changes to the old interface would need to be somewhat heavy for this to work). |
I've made plots like these before, IIRC the upstream data transform can be done in a pandas one-liner; I'm not sure why the linked stackoverflow question makes it seem so complicated. But I am 👎 on adding this in seaborn; even though the data transform is straightforward, exposing a specification API that is sufficiently general for all experimental designs where you would want to use it is not. Agree it could probably be supported through a plugin stat object. Thanks for the suggestion. |
@henrymj Ok, so I built this for the object interface (it is a bit ugly, sorry) : import numpy as np
import pandas as pd
import seaborn.objects as so
from seaborn._stats.aggregation import Est
class CMEst(Est):
def __call__(
self, data, groupby, orient, scales,
):
var = {"x": "y", "y": "x"}[orient]
means_indiv = data.groupby([orient,"id_var"]).agg(np.mean)[var]
means_all = data.groupby(orient).agg(np.mean)[var]
num_cond = None
def normalize_df(df):
nonlocal num_cond
if num_cond is None:
num_cond = len(df)
new_df = df.copy()
i = df.reset_index().loc[0,[orient,"id_var"]]
partial_mean = means_indiv[i[orient],i["id_var"]]
full_mean = means_all[i[orient]]
new_df[var] = new_df[var] - partial_mean + full_mean
return new_df
data = data.groupby([orient,"id_var"],group_keys=False).apply(normalize_df)
def adjust_variance(df):
df[var] = df[var].mean() + np.sqrt(num_cond/(num_cond-1)) * (df[var] - df[var].mean())
return df
data = groupby.apply(data,adjust_variance)
return super().__call__(data, groupby, orient, scales) This is close to a drop-in replacement of so.Est(), except that you need to specify an id_var column which represents the individual you are considering. The example from your stackexchange post would look like : df = pd.read_csv("DemoWS-30x2.csv")
# Busy work for converting to long-form
df["index"] = df.index
df = pd.wide_to_long(df,[f"activation.{i}" for i in range(1,31)],i="index",j="condition",sep=".").reset_index()
df["index"] = df.index
df = pd.wide_to_long(df,"activation",i="index",j="timepoints",sep=".").reset_index()
p = (
so.Plot(data=df, x="timepoints", y="activation", color="condition")
.add(so.Band(), CMEst(),id_var="id")
#.add(so.Band(),so.Est())
.add(so.Line(marker="o"), so.Agg())
.scale(color=so.Nominal(["red","blue"]))
)
p.show() giving (there are some differences with the provided example though, I believe it might be due to the way the CI is computed, but don't hesitate to check the code as I may have made a mistake) |
Thank you so much @thuiop! I'll test out this approach. I really appreciate you taking the time to generate this - hopefully other social scientists will also benefit from this example. |
I (and many psychologists) frequently work with repeated-measure designs, where a given individual participates in multiple conditions. A classic example would be the Stroop task, where participants see trials where the word and the color of the text are the same (congruent), and when they are different (incongruent). Almost everyone is a little bit slower for incongruent trials compared to congruent trials.
To my knowledge, Seaborn does not have a way of removing individual effects from its error-bars to highlight the consistent effects across individuals, though I believe the option would help many psychologists communicate their effects (and any other field which collects multiple measures from one "individual").
Following Loftus & Masson, 1994, and Cousineau, 2005, Morey, 2008 proposed a very simple estimate of confidence intervals that correct for repeated-measures designs. Hopefully it would be an easy feature to add to Seaborn!
Here's one example showing the benefits of this correction, using an R implementation.
Morey, R. D. (2008). Confidence intervals from normalized data: A correction to Cousineau (2005). Tutorials in quantitative methods for psychology, 4(2), 61-64.
https://www.tqmp.org/RegularArticles/vol04-2/p061/p061.pdf
The text was updated successfully, but these errors were encountered: