Skip to content

[BUG] Memory leak in TimeSeriesDataSet #648

@TomSteenbergen

Description

@TomSteenbergen
  • PyTorch-Forecasting version: 0.8.5
  • PyTorch version: 1.9.0
  • PyTorch Lightning version: 1.4.0
  • Python version: 3.8
  • Operating System: MacOS 11.4

Expected behavior

When a TimeSeriesDataSet instance is no longer being used, I'd expect the memory it uses to be released.

Actual behavior

Instead, memory seems to accumulate when creating multiple instances of a TimeSeriesDataSet, which is what happens under the hood when calling e.g. the predict method on the TemporalFusionTransformer class with a pandas DataFrame. This causes my deployment that serves predictions using a TemporalFusionTransformer to get OOMKilled after some time.

Code to reproduce the problem

A minimal example can be found below. Simply run this script locally, and monitor your machine's memory usage. Memory accumulates over time. Uncommenting the last lines, where some of the attributes are explicitly set to None, seems to alleviate the problem a bit, but not completely solve it.

import numpy as np
import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet

test_data = pd.DataFrame(
    {
        "value": np.random.rand(3000000) - 0.5,
        "group": np.repeat(np.arange(3), 1000000),
        "time_idx": np.tile(np.arange(1000000), 3),
    }
)

# Memory accumulates when creating `TimeSeriesDataSet`s. Seems like not everything is
# being garbage collected after a `TimeSeriesDataSet` instance is no longer used.
for i in range(100):
    print("Creating dataset ", i)
    dataset = TimeSeriesDataSet(
        test_data,
        group_ids=["group"],
        target="value",
        time_idx="time_idx",
        min_encoder_length=5,
        max_encoder_length=5,
        min_prediction_length=2,
        max_prediction_length=2,
        time_varying_unknown_reals=["value"],
        predict_mode=False
    )
    
    # Uncommenting the following lines help to reduce the memory leak, but does not
    # completely solve it. Some memory is still not released.
    # dataset.index = None
    # dataset.data = None

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Fixed/resolved

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions