Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in BaseEstimator Class #1576

Closed
gregbolet opened this issue Oct 24, 2022 · 2 comments
Closed

Memory Leak in BaseEstimator Class #1576

gregbolet opened this issue Oct 24, 2022 · 2 comments
Labels

Comments

@gregbolet
Copy link

gregbolet commented Oct 24, 2022

Subject of the issue (with proposed fix)

When creating multiple MaximumLikelihoodEstimator objects and calling estimate_cpd for all the model nodes in each, we seem to get a memory leak. This is what happens with repeated calls to BayesianNetwork.fit.

The state_counts function of the BaseEstimator class has a decorator ([https://github.com/pgmpy/pgmpy/blob/dev/pgmpy/estimators/base.py#L66])

@lru_cache(maxsize=2048)

which seems to be the culprit. When I remove it, the memory leak goes away. I haven't bothered to dig deeper on the issue, but it's something that may be worth considering if repeatedly training models.

Your environment

  • Google Colab Notebook
  • PGMPY version: pgmpy-0.1.20
  • Python version: 3.7.15
  • Operating System: Ubuntu 18.04.6 LTS (Bionic Beaver)
  • uname -a output: Linux 5.10.133+

Steps to reproduce

Try the following code out. You should see the memory slowly get eaten up.

import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator

data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

cols = list(data.columns)
edgelist = [(a,b) for a in cols[0:3] for b in cols[3:] if a != b]

model = BayesianNetwork()
model.add_nodes_from(cols)
model.add_edges_from(edgelist)


for i in range(1000):
  model.fit(data, MaximumLikelihoodEstimator)

Expected behaviour

Repeatedly calling the model.fit function with the MLE should not consume memory ad infinitum, the model gets re-trained and so old data should not persist.

Actual behaviour

During the 1000 iterations additional RAM is consumed and not freed EVER.

@gregbolet
Copy link
Author

Looks like this might be a similar issue: limix/glimix-core#15

@gregbolet gregbolet reopened this Oct 24, 2022
@ankurankan ankurankan added the Bug label Oct 24, 2022
@ankurankan
Copy link
Member

Fixed for now as caching is disabled as it wasn't improving performance by much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants