Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError in 0.7.0 #407

Open
tdhopper opened this issue Jun 13, 2020 · 1 comment
Open

MemoryError in 0.7.0 #407

tdhopper opened this issue Jun 13, 2020 · 1 comment

Comments

@tdhopper
Copy link

tdhopper commented Jun 13, 2020

The build for Pythonplot.com started failing with 0.7.0 trying to run

(ggplot(diamonds) +

  aes('depth', fill='cut', color='cut') +

  geom_density(alpha=0.1))

I get a memory error thrown by statsmodels which you can see in the build log.

@has2k1
Copy link
Owner

has2k1 commented Jun 13, 2020

v0.7.0 changed the default method for computing the bandwidth for density estimation. This was in response to #317.

For the example at pythonplot.com, the difference captured by.

from plotnine.data import diamonds
from statsmodels.nonparametric.bandwidths import bw_normal_reference as nr
from statsmodels.sandbox.nonparametric import kernels
from plotnine.stats.stat_density import nrd0

k = kernels.Gaussian()
for _, gdf in diamonds.groupby('cut'):
    x = gdf.depth
    print(
        f'normal_reference: {nr(x, k)}\n'
        f'nrd0:             {nrd0(x)}\n'
    )
	Fair
normal_reference: 0.2689687852105908
nrd0:             0.2285370639409318

	Good
normal_reference: 0.35873214763916517
nrd0:             0.30480708643752136

	Very Good
normal_reference: 0.22284759734989557
nrd0:             0.18934887022210023

	Premium
normal_reference: 0.18242428971018324
nrd0:             0.15500204430500442

	Ideal
normal_reference: 0.09605607167961927
nrd0:             0.08161680389109885

where the normal_reference values are pre v0.7.0 and nrd0 values are v0.7.0. So far, it not clear to me why that would result in significantly more memory being used. nrd0 bandwidth is slightly the lesser for each group; that means when computing the density there are fewer points under each kernel function and therefore more kernel functions. I do not know how much memory travis allocates may be that difference is enough to tip it over in this case.

TODO NEXT: Find out the differences in memory used for these density computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants