Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
sns.lmplot x_estimator / logistic estimation speed #347
I'm a long-time
You can see more details on my blog here, but the short story is: plotting a summary of binomial data, with summarised x values, some logistic regression lines, and a facet_wrap by subject (
I'm afraid I haven't done extensive testing so I can't tell you what specifically is slow (bootstrapping? logistic model fitting?). I wanted to raise it here because I couldn't see anyone talking about this on SO or here.
Are there plans to optimise the backend you're using? If someone wanted to help, where would they start?
The R (ggplot) code I compare it to is also bootstrapping (at least for the data points). The line
produces points that show the data mean, and does 1000 bootstrap iterations to compute 95% confidence intervals. So it's not that R is computing analytically.
Thanks for the tip on the cis.
Using the class object that is actually doing everything gives some further insight. This is on my 2 year-old macbook air:
dat_conditioned = dat.query("subject == 'S1' and sf == 0.5") plotter = sns.linearmodels._RegressionPlotter("log_contrast", "correct", dat_conditioned, x_estimator=np.mean, logistic=True)
This computes the point estimate and CIs for each level of conditioning (it's actually a property):
%timeit plotter.estimate_data 1 loops, best of 3: 228 ms per loop
This needs to be done 25 times to get the whole plot (5 hue levels and col levels), which means the aggregating and bootstrapping for the point estimates takes about 5-6 seconds.
Bootstrapping the logistic regression takes substantially longer:
%timeit plotter.fit_regression(x_range=(-7, 0)) 1 loops, best of 3: 4.2 s per loop
Fitting a logistic regression is fairly computationally expensive:
plotter = sns.linearmodels._RegressionPlotter("log_contrast", "correct", dat_conditioned, x_estimator=np.mean, logistic=True, ci=None) %timeit plotter.fit_regression(x_range=(-7, 0)) 100 loops, best of 3: 4.6 ms per loop