Massive Memory Leak c# DirectML #14466

elephantpanda · 2023-01-28T19:02:46Z

Describe the issue

Create some big inference sessions. (say >2GB).

This increases memory by 2GB on the GPU and also in the RAM by 2GB.

Call session.Dispose();

This clears the memory from the GPU but keeps memory on the RAM.

Thus 2GB of RAM is not being deleted.

Close the program. The RAM is finally freed.

Why is this a problem?
When working in an IDE such as Unity, there needs to be a way to free the RAM without closing the IDE. Each time you run the program without exiting the IDE, the RAM increases until memory error occurs.

What could be happening? Maybe it is loading the onnx file onto RAM but not freeing it after putting it on the GPU. Just a guess.

Possible Reason
This seems to happen mostly with big 2GB files. Which I noticed often have their weights in separate files. So perhaps the runtime is freeing memory from the main file but is not freeing the memory of the linked files such as weights.pb

To reproduce

Using C# onnx runtime latest developer build. 1.14 and also tried it on 1.15

Create some sessions. With DirectML.
(Do nothing with them)
Dispose of the sessions.

Have a look at the GPU and ram monitor in Task manager.

Close the program.

Urgency

No response

Platform

Windows

OS Version

Windows 10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

Microsoft.ML.OnnxRuntime.Managed.1.15.0-dev-20230128-0428-7aecb2150f

ONNX Runtime API

C#

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

elephantpanda · 2023-01-29T07:42:29Z

I have found a partial fix by making sure the onnx file is one single file. This fix only works for onnx files <2GB.

uchuusen · 2023-01-29T08:38:03Z

I've noticed the same issue in Python as well. If I have an app that loads a stable diffusion onnx model and then unloads it repeatedly, for the purpose of switching models or clearing out vram, it will seem to lose track of a couple gigabytes of system ram every time it does so.
In my case, I'm using an integrated gpu, an AMD Vega 10 on a Ryzen 3700U chipset.

RyanUnderhill · 2023-01-31T08:50:42Z

@fdwr Is this a DirectML issue or is this C#?

elephantpanda · 2023-01-31T13:42:10Z

@fdwr Is this a DirectML issue or is this C#?

I think it's a general issue of the weights.pb file not getting released from memory when the onnx file is in multiple parts.

RyanUnderhill · 2023-02-01T06:04:44Z

Is it possible to create a minimal repro scenario and paste it here? Just so we're sure we're doing the same thing you are.

elephantpanda · 2023-02-02T16:52:47Z

The scenario is as follows:

SessionOptions so = new SessionOptions
                {
                    ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
                    GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED
                };
                so.AppendExecutionProvider_DML(); 
                    session = new InferenceSession("model.onnx", so);
session.Dispose();

If you try this with an onnx file that is in two parts model.onnx (100Mb) with separate weights.pb file (1.5GB). It seems to not release the weights.pb file from memory. Whereas if it is in a single file model.onnx (1.6GB) then it all gets cleared up from RAM.

This is not a problem for me anymore since I am now making sure that I only use single onnx files and not ones that are in multiple parts. Mind you, this may be a problem for people running bigger onnx files since they must be split if they are over 2GB.

Seems like a simple bug in which auxillary files are not getting released.

Here is a conversion script I used.

fdwr · 2023-02-03T22:30:25Z

@fdwr Is this a DirectML issue or is this C#?

Ryan: 🤔 My hunch is that it's a general GC resource lifetime issue (seeing both C# and Python repros in the comments), because the DML EP has no awareness of or differences in behavior for external vs internal weights - it just uses whatever was passed to it from ORT. AFAIK, we never did any work on the DML EP to support external weights, and so whoever added support must have done it in a way that it works with all the EP's generically.

elephantpanda · 2023-02-04T00:13:46Z

In general can I just say that it would be beneficial to release as much RAM as possible after having loaded in the values form the files including releasing file pointers and anything else.

Many people are working with 1-2GB model files, on consumer PCs which in general have an average of 8GB VRAM and 8-12GB RAM if they're lucky. So every bit of memory counts.

Thanks!

elephantpanda · 2023-02-24T18:42:13Z

@fdwr Is this a DirectML issue or is this C#?

Ryan: 🤔 My hunch is that it's a general GC resource lifetime issue (seeing both C# and Python repros in the comments), because the DML EP has no awareness of or differences in behavior for external vs internal weights - it just uses whatever was passed to it from ORT. AFAIK, we never did any work on the DML EP to support external weights, and so whoever added support must have done it in a way that it works with all the EP's generically.

Hello, is anyone working on this? It seems like lots of people have pointed out the same problem but nobody at Microsoft knows how to fix it? Seems like a 5 minute fix for the right person. Just unload the external weights files once they've been consumed.

pranavsharma · 2023-03-08T22:24:04Z

The session should have released all memory on destruction. There's no known issue here. Have you tried using our C/C++ API (which has deterministic destruction)?

fdwr · 2023-03-09T04:48:21Z

Sorry Mr @pauldog, but I haven't reproduced it :/. I tried with ORT 1.9 and ORT 1.14.1 using Stable Diffusion unet with a separate weights.pb file. The memory grows huge (gigabytes), but then falls back down after the Dispose (but before the last WriteLine statement). Here's my code in entirety:

using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

namespace OrtTestApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Begin...");

            SessionOptions sessionOptions = new SessionOptions
            {
                ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
                GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED,
                EnableMemoryPattern = false,
            };
            sessionOptions.AppendExecutionProvider_DML(0);
            InferenceSession session = new InferenceSession("D:\\stable_diffusion_onnx\\unet\\model.onnx", sessionOptions);
            session.Dispose();

            Console.WriteLine("End...");
        }
    }
}

And to confirm, this does not happen with the CPU provider? (removing AppendExecutionProvider_DML)

elephantpanda · 2023-03-09T06:09:25Z

Your program seems to be flawed because the program closes after it writes "End". Whereas the memory leak is for when the program stays open. Try adding a Console.ReadLine(); after the Console.WriteLine("End...") and repeat the experiment.

I am using C# in Unity. Using the same code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System;
using System.IO;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

class OrtTestApp : MonoBehaviour
{
    void Reload()
    {
        Debug.Log("Begin...");

        SessionOptions sessionOptions = new SessionOptions
        {
            ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
            GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED,
            EnableMemoryPattern = false,
        };
        sessionOptions.AppendExecutionProvider_DML(0);
        InferenceSession session = new InferenceSession("model.onnx", sessionOptions);
        session.Dispose();

        Debug.Log("End...");
    }

    private void Update()
    {
        if (Input.GetKeyDown(KeyCode.Space))
        {
            Reload();
        }
    }
}

As you can see about a memory leak of over 1GB.

Perhaps its just a Unity thing, though I don't see how. I am using the latest dev build of onnxruntime directml and the onnxruntime managed library.

Same experiment but without external weights file: (no memory leak here)

Therefor my only conclusion can be is that the external weights files are not getting cleared by Dispose.

fdwr · 2023-03-09T23:44:27Z

@pauldog I repro it when running a longer loop. It's definitely, like you say, related to these separate weight files.

✅ CPU EP + C#
✅ CPU EP + C++
✅ DML EP + C# + single model
✅ DML EP + C++ + single model
❌ DML EP + C# + separate weights
❌ DML EP + C++ + separate weights (e.g. SD1.5)

ssube · 2023-03-13T13:55:05Z

fwiw, I don't think this behavior is specific to C# or DirectML: I have a Python program doing repeated inferences and it reliably runs out of memory after about 95 runs, with a very similar memory graph, on both CUDA and DirectML (attempting to test on ROCm as well). It does seem to be related to models with external weights, I have not seen it on smaller single-file models, but the SD v1.5 UNet hits about ~300GB of virtual memory on the CPU side within 100 runs, even if I close the sessions/options. Unloading and reloading the model after 10 runs frees up VRAM, but does not fix the CPU side. Restarting the worker process entirely after 10 runs does free up all of the memory, but requires multiprocessing and reloading everything.

elephantpanda · 2023-03-13T15:11:51Z

yes, it needs to be fixed to be able run model >2GB without leaking memory.

pranavsharma · 2023-03-13T21:27:19Z

I can repo this on CUDA as well. Will try to take a look this week.

github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels Jan 28, 2023

elephantpanda changed the title ~~Possible Memory Leak~~ Possible Memory Leak c# DirectML Jan 28, 2023

elephantpanda changed the title ~~Possible Memory Leak c# DirectML~~ Massive Memory Leak c# DirectML Jan 28, 2023

This was referenced Feb 24, 2023

onnxruntime C# library 1.13.1 and 1.14.0 keeps external data files open after loading #14815

Closed

onnxruntime with CUDA not releasing about 400 MB memory after the session and environment is destroyed #14819

Open

Memory Leak #14745

Open

pranavsharma self-assigned this Mar 14, 2023

pranavsharma mentioned this issue Mar 14, 2023

Fix CPU memory leak due to external weights not getting memory unmapped when using non-CPU EP. #15040

Merged

pranavsharma closed this as completed in #15040 Mar 16, 2023

elephantpanda mentioned this issue Mar 18, 2023

Load in fp16? meta-llama/llama#42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive Memory Leak c# DirectML #14466

Massive Memory Leak c# DirectML #14466

elephantpanda commented Jan 28, 2023 •

edited

Loading

elephantpanda commented Jan 29, 2023 •

edited

Loading

uchuusen commented Jan 29, 2023

RyanUnderhill commented Jan 31, 2023

elephantpanda commented Jan 31, 2023

RyanUnderhill commented Feb 1, 2023

elephantpanda commented Feb 2, 2023 •

edited

Loading

fdwr commented Feb 3, 2023 •

edited

Loading

elephantpanda commented Feb 4, 2023

elephantpanda commented Feb 24, 2023 •

edited

Loading

pranavsharma commented Mar 8, 2023

fdwr commented Mar 9, 2023 •

edited

Loading

elephantpanda commented Mar 9, 2023 •

edited

Loading

fdwr commented Mar 9, 2023 •

edited

Loading

ssube commented Mar 13, 2023 •

edited

Loading

elephantpanda commented Mar 13, 2023

pranavsharma commented Mar 13, 2023

Massive Memory Leak c# DirectML #14466

Massive Memory Leak c# DirectML #14466

Comments

elephantpanda commented Jan 28, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

elephantpanda commented Jan 29, 2023 • edited Loading

uchuusen commented Jan 29, 2023

RyanUnderhill commented Jan 31, 2023

elephantpanda commented Jan 31, 2023

RyanUnderhill commented Feb 1, 2023

elephantpanda commented Feb 2, 2023 • edited Loading

fdwr commented Feb 3, 2023 • edited Loading

elephantpanda commented Feb 4, 2023

elephantpanda commented Feb 24, 2023 • edited Loading

pranavsharma commented Mar 8, 2023

fdwr commented Mar 9, 2023 • edited Loading

elephantpanda commented Mar 9, 2023 • edited Loading

fdwr commented Mar 9, 2023 • edited Loading

ssube commented Mar 13, 2023 • edited Loading

elephantpanda commented Mar 13, 2023

pranavsharma commented Mar 13, 2023

elephantpanda commented Jan 28, 2023 •

edited

Loading

elephantpanda commented Jan 29, 2023 •

edited

Loading

elephantpanda commented Feb 2, 2023 •

edited

Loading

fdwr commented Feb 3, 2023 •

edited

Loading

elephantpanda commented Feb 24, 2023 •

edited

Loading

fdwr commented Mar 9, 2023 •

edited

Loading

elephantpanda commented Mar 9, 2023 •

edited

Loading

fdwr commented Mar 9, 2023 •

edited

Loading

ssube commented Mar 13, 2023 •

edited

Loading