-
Notifications
You must be signed in to change notification settings - Fork 735
Description
- PyTorch-Forecasting version: 0.8.5
- PyTorch version: 1.9.0
- PyTorch Lightning version: 1.4.0
- Python version: 3.8
- Operating System: MacOS 11.4
Expected behavior
When a TimeSeriesDataSet instance is no longer being used, I'd expect the memory it uses to be released.
Actual behavior
Instead, memory seems to accumulate when creating multiple instances of a TimeSeriesDataSet, which is what happens under the hood when calling e.g. the predict method on the TemporalFusionTransformer class with a pandas DataFrame. This causes my deployment that serves predictions using a TemporalFusionTransformer to get OOMKilled after some time.
Code to reproduce the problem
A minimal example can be found below. Simply run this script locally, and monitor your machine's memory usage. Memory accumulates over time. Uncommenting the last lines, where some of the attributes are explicitly set to None, seems to alleviate the problem a bit, but not completely solve it.
import numpy as np
import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet
test_data = pd.DataFrame(
{
"value": np.random.rand(3000000) - 0.5,
"group": np.repeat(np.arange(3), 1000000),
"time_idx": np.tile(np.arange(1000000), 3),
}
)
# Memory accumulates when creating `TimeSeriesDataSet`s. Seems like not everything is
# being garbage collected after a `TimeSeriesDataSet` instance is no longer used.
for i in range(100):
print("Creating dataset ", i)
dataset = TimeSeriesDataSet(
test_data,
group_ids=["group"],
target="value",
time_idx="time_idx",
min_encoder_length=5,
max_encoder_length=5,
min_prediction_length=2,
max_prediction_length=2,
time_varying_unknown_reals=["value"],
predict_mode=False
)
# Uncommenting the following lines help to reduce the memory leak, but does not
# completely solve it. Some memory is still not released.
# dataset.index = None
# dataset.data = NoneMetadata
Metadata
Assignees
Labels
Type
Projects
Status