You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating multiple MaximumLikelihoodEstimator objects and calling estimate_cpd for all the model nodes in each, we seem to get a memory leak. This is what happens with repeated calls to BayesianNetwork.fit.
The state_counts function of the BaseEstimator class has a decorator ([https://github.com/pgmpy/pgmpy/blob/dev/pgmpy/estimators/base.py#L66])
@lru_cache(maxsize=2048)
which seems to be the culprit. When I remove it, the memory leak goes away. I haven't bothered to dig deeper on the issue, but it's something that may be worth considering if repeatedly training models.
Try the following code out. You should see the memory slowly get eaten up.
import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
cols = list(data.columns)
edgelist = [(a,b) for a in cols[0:3] for b in cols[3:] if a != b]
model = BayesianNetwork()
model.add_nodes_from(cols)
model.add_edges_from(edgelist)
for i in range(1000):
model.fit(data, MaximumLikelihoodEstimator)
Expected behaviour
Repeatedly calling the model.fit function with the MLE should not consume memory ad infinitum, the model gets re-trained and so old data should not persist.
Actual behaviour
During the 1000 iterations additional RAM is consumed and not freed EVER.
The text was updated successfully, but these errors were encountered:
Subject of the issue (with proposed fix)
When creating multiple
MaximumLikelihoodEstimator
objects and callingestimate_cpd
for all the model nodes in each, we seem to get a memory leak. This is what happens with repeated calls toBayesianNetwork.fit
.The
state_counts
function of theBaseEstimator
class has a decorator ([https://github.com/pgmpy/pgmpy/blob/dev/pgmpy/estimators/base.py#L66])@lru_cache(maxsize=2048)
which seems to be the culprit. When I remove it, the memory leak goes away. I haven't bothered to dig deeper on the issue, but it's something that may be worth considering if repeatedly training models.
Your environment
uname -a
output: Linux 5.10.133+Steps to reproduce
Try the following code out. You should see the memory slowly get eaten up.
Expected behaviour
Repeatedly calling the
model.fit
function with the MLE should not consume memory ad infinitum, the model gets re-trained and so old data should not persist.Actual behaviour
During the 1000 iterations additional RAM is consumed and not freed EVER.
The text was updated successfully, but these errors were encountered: