Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use a weighted mean estimator in seaborn factor plot (incl bootstrapping)? #722

Closed
timalthoff opened this issue Oct 2, 2015 · 8 comments

Comments

@timalthoff
Copy link

[Note: I am reviving a stackoverflow question that I was unable to figure out with some new insights on how it might work. See: http://stackoverflow.com/questions/32771520/how-to-use-a-weighted-mean-estimator-in-seaborn-factor-plot-incl-bootstrapping]

I have a dataframe where each of the rows has a certain weight which needs to be accounted for in the mean computations. I love seaborn factorplots and their bootstrapped 95% confidence intervals but haven't been able to get seaborn to accept a new weighted mean estimator.

Here is an example of what I would like to do.

tips_all = sns.load_dataset("tips")
tips_all["weight"] = 10 * np.random.rand(len(tips_all))
sns.factorplot("size", "total_bill", 
               data=tips_all, kind="point")
# here I would like to have a mean estimator that computes a weighted mean
# the bootstrapped confidence intervals should also use this weighted mean estimator
# something like (tips_all["weight"] * tips_all["total_bill"]).sum() / tips_all["weight"].sum()
# but on bootstrapped samples (for the confidence interval)

The problem I have is that the estimator function only gets to see the "main variable" (y axis) instead of the full dataframe that would allow the estimator to access more than just "y".
See here:

boots = bootstrap(stat_data, func=estimator,

Is there any simple way to do this?

If not, what is the easiest way to extend the categorical plotting to allow for weighted estimators?

Thanks a lot,
Tim

PS: couldn't figure out labels. my guess is question and wishlist.

@mwaskom
Copy link
Owner

mwaskom commented Oct 3, 2015

It's not really supported, but I think it is possible to hack together a solution. This seems to work?

tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))

tips["tip_and_weight"] = zip(tips.tip, tips.weight)

def weighted_mean(x, **kws):
    val, weight = map(np.asarray, zip(*x))
    return (val * weight).sum() / weight.sum()

g = sns.factorplot("size", "tip_and_weight", data=tips,
                   estimator=weighted_mean, orient="v")
g.set_axis_labels("size", "tip")

@timalthoff
Copy link
Author

Thanks Michael! It looks like this is working. I really appreciate the hacky but clever workaround.

@pkmn99
Copy link

pkmn99 commented Sep 27, 2017

mwaskom's solution works when "orient="v" is specified.
Otherwise, there will be error.
"TypeError: zip argument #1 must support iteration"
This is strange...

@adam-katz-rf
Copy link

Solution worked for me only when I added: tips["tip_and_weight"] = list(zip(tips.tip, tips.weight))

@MaozGelbart
Copy link
Contributor

The above solution (which is very useful) works with v0.10.1, but not with v0.11.0 which now raises the following error:

 File "/Users/maoz/seaborn/seaborn/categorical.py", line 3714, in factorplot
    return catplot(*args, **kwargs)
  File "/Users/maoz/seaborn/seaborn/_decorators.py", line 46, in inner_f
    return f(**kwargs)
  File "/Users/maoz/seaborn/seaborn/categorical.py", line 3779, in catplot
    p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
  File "/Users/maoz/seaborn/seaborn/categorical.py", line 156, in establish_variables
    orient = infer_orient(
  File "/Users/maoz/seaborn/seaborn/_core.py", line 1310, in infer_orient
    raise TypeError(nonnumeric_dv_error.format("Vertical", "y"))
TypeError: Vertical orientation requires numeric `y` variable.

MWE:

import seaborn as sns, numpy as np
tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))
tips["tip_and_weight"] = list(zip(tips.tip, tips.weight))
def weighted_mean(x, **kws):
    val, weight = map(np.asarray, zip(*x))
    return (val * weight).sum() / weight.sum()

sns.pointplot(x="size", y="tip_and_weight", data=tips, estimator=weighted_mean, orient='v')

@fkloosterman
Copy link

Another workaround that does not trigger TypeError: Vertical orientation requires numeric y variable (and thus should be compatible with v0.11.0 and higher) is to store the value and weight as the real and imaginary part of an imaginary number. Something like (tested with v0.11.2):

import seaborn as sns, numpy as np
tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))
tips["tip_and_weight"] = [ v + w*1j for v,w in zip(tips.tip, tips.weight)]
def weighted_mean(x, **kws):
    return np.sum(np.real(x) * np.imag(x)) / np.sum(np.imag(x))

sns.pointplot(x="size", y="tip_and_weight", data=tips, estimator=weighted_mean, orient='v')

@mwaskom
Copy link
Owner

mwaskom commented Apr 8, 2022

Oh that’s super clever!

@IzaakC
Copy link

IzaakC commented Sep 19, 2022

Will this be a supported option in the future? Not with the hacky workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants