Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive Memory Leak c# DirectML #14466

Closed
elephantpanda opened this issue Jan 28, 2023 · 16 comments · Fixed by #15040
Closed

Massive Memory Leak c# DirectML #14466

elephantpanda opened this issue Jan 28, 2023 · 16 comments · Fixed by #15040
Assignees
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform

Comments

@elephantpanda
Copy link

elephantpanda commented Jan 28, 2023

Describe the issue

Create some big inference sessions. (say >2GB).

This increases memory by 2GB on the GPU and also in the RAM by 2GB.

Call session.Dispose();

This clears the memory from the GPU but keeps memory on the RAM.

Thus 2GB of RAM is not being deleted.

Close the program. The RAM is finally freed.

Why is this a problem?
When working in an IDE such as Unity, there needs to be a way to free the RAM without closing the IDE. Each time you run the program without exiting the IDE, the RAM increases until memory error occurs.

What could be happening? Maybe it is loading the onnx file onto RAM but not freeing it after putting it on the GPU. Just a guess.

Possible Reason
This seems to happen mostly with big 2GB files. Which I noticed often have their weights in separate files. So perhaps the runtime is freeing memory from the main file but is not freeing the memory of the linked files such as weights.pb

To reproduce

Using C# onnx runtime latest developer build. 1.14 and also tried it on 1.15

Create some sessions. With DirectML.
(Do nothing with them)
Dispose of the sessions.

Have a look at the GPU and ram monitor in Task manager.

Close the program.

Urgency

No response

Platform

Windows

OS Version

Windows 10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

Microsoft.ML.OnnxRuntime.Managed.1.15.0-dev-20230128-0428-7aecb2150f

ONNX Runtime API

C#

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels Jan 28, 2023
@elephantpanda elephantpanda changed the title Possible Memory Leak Possible Memory Leak c# DirectML Jan 28, 2023
@elephantpanda elephantpanda changed the title Possible Memory Leak c# DirectML Massive Memory Leak c# DirectML Jan 28, 2023
@elephantpanda
Copy link
Author

elephantpanda commented Jan 29, 2023

I have found a partial fix by making sure the onnx file is one single file. This fix only works for onnx files <2GB.

@uchuusen
Copy link

I've noticed the same issue in Python as well. If I have an app that loads a stable diffusion onnx model and then unloads it repeatedly, for the purpose of switching models or clearing out vram, it will seem to lose track of a couple gigabytes of system ram every time it does so.
In my case, I'm using an integrated gpu, an AMD Vega 10 on a Ryzen 3700U chipset.

@RyanUnderhill
Copy link
Member

@fdwr Is this a DirectML issue or is this C#?

@elephantpanda
Copy link
Author

@fdwr Is this a DirectML issue or is this C#?

I think it's a general issue of the weights.pb file not getting released from memory when the onnx file is in multiple parts.

@RyanUnderhill
Copy link
Member

Is it possible to create a minimal repro scenario and paste it here? Just so we're sure we're doing the same thing you are.

@elephantpanda
Copy link
Author

elephantpanda commented Feb 2, 2023

The scenario is as follows:

SessionOptions so = new SessionOptions
                {
                    ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
                    GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED
                };
                so.AppendExecutionProvider_DML(); 
                    session = new InferenceSession("model.onnx", so);
session.Dispose();

If you try this with an onnx file that is in two parts model.onnx (100Mb) with separate weights.pb file (1.5GB). It seems to not release the weights.pb file from memory. Whereas if it is in a single file model.onnx (1.6GB) then it all gets cleared up from RAM.

This is not a problem for me anymore since I am now making sure that I only use single onnx files and not ones that are in multiple parts. Mind you, this may be a problem for people running bigger onnx files since they must be split if they are over 2GB.

Seems like a simple bug in which auxillary files are not getting released.

Here is a conversion script I used.

@fdwr
Copy link
Contributor

fdwr commented Feb 3, 2023

@fdwr Is this a DirectML issue or is this C#?

Ryan: 🤔 My hunch is that it's a general GC resource lifetime issue (seeing both C# and Python repros in the comments), because the DML EP has no awareness of or differences in behavior for external vs internal weights - it just uses whatever was passed to it from ORT. AFAIK, we never did any work on the DML EP to support external weights, and so whoever added support must have done it in a way that it works with all the EP's generically.

@elephantpanda
Copy link
Author

In general can I just say that it would be beneficial to release as much RAM as possible after having loaded in the values form the files including releasing file pointers and anything else.

Many people are working with 1-2GB model files, on consumer PCs which in general have an average of 8GB VRAM and 8-12GB RAM if they're lucky. So every bit of memory counts.

Thanks!

@elephantpanda
Copy link
Author

elephantpanda commented Feb 24, 2023

@fdwr Is this a DirectML issue or is this C#?

Ryan: 🤔 My hunch is that it's a general GC resource lifetime issue (seeing both C# and Python repros in the comments), because the DML EP has no awareness of or differences in behavior for external vs internal weights - it just uses whatever was passed to it from ORT. AFAIK, we never did any work on the DML EP to support external weights, and so whoever added support must have done it in a way that it works with all the EP's generically.

Hello, is anyone working on this? It seems like lots of people have pointed out the same problem but nobody at Microsoft knows how to fix it? Seems like a 5 minute fix for the right person. Just unload the external weights files once they've been consumed.

@pranavsharma
Copy link
Contributor

The session should have released all memory on destruction. There's no known issue here. Have you tried using our C/C++ API (which has deterministic destruction)?

@fdwr
Copy link
Contributor

fdwr commented Mar 9, 2023

Sorry Mr @pauldog, but I haven't reproduced it :/. I tried with ORT 1.9 and ORT 1.14.1 using Stable Diffusion unet with a separate weights.pb file. The memory grows huge (gigabytes), but then falls back down after the Dispose (but before the last WriteLine statement). Here's my code in entirety:

image

using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

namespace OrtTestApp
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Begin...");

            SessionOptions sessionOptions = new SessionOptions
            {
                ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
                GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED,
                EnableMemoryPattern = false,
            };
            sessionOptions.AppendExecutionProvider_DML(0);
            InferenceSession session = new InferenceSession("D:\\stable_diffusion_onnx\\unet\\model.onnx", sessionOptions);
            session.Dispose();

            Console.WriteLine("End...");
        }
    }
}

And to confirm, this does not happen with the CPU provider? (removing AppendExecutionProvider_DML)

@elephantpanda
Copy link
Author

elephantpanda commented Mar 9, 2023

Your program seems to be flawed because the program closes after it writes "End". Whereas the memory leak is for when the program stays open. Try adding a Console.ReadLine(); after the Console.WriteLine("End...") and repeat the experiment.

I am using C# in Unity. Using the same code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System;
using System.IO;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

class OrtTestApp : MonoBehaviour
{
    void Reload()
    {
        Debug.Log("Begin...");

        SessionOptions sessionOptions = new SessionOptions
        {
            ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
            GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_EXTENDED,
            EnableMemoryPattern = false,
        };
        sessionOptions.AppendExecutionProvider_DML(0);
        InferenceSession session = new InferenceSession("model.onnx", sessionOptions);
        session.Dispose();

        Debug.Log("End...");
    }

    private void Update()
    {
        if (Input.GetKeyDown(KeyCode.Space))
        {
            Reload();
        }
    }
}

screenshot_taskmanager

As you can see about a memory leak of over 1GB.

Perhaps its just a Unity thing, though I don't see how. I am using the latest dev build of onnxruntime directml and the onnxruntime managed library.

Same experiment but without external weights file: (no memory leak here)
screenshot_taskmanager2
Therefor my only conclusion can be is that the external weights files are not getting cleared by Dispose.

@fdwr
Copy link
Contributor

fdwr commented Mar 9, 2023

@pauldog I repro it when running a longer loop. It's definitely, like you say, related to these separate weight files.

✅ CPU EP + C#
✅ CPU EP + C++
✅ DML EP + C# + single model
✅ DML EP + C++ + single model
❌ DML EP + C# + separate weights
❌ DML EP + C++ + separate weights (e.g. SD1.5)

image

@ssube
Copy link

ssube commented Mar 13, 2023

fwiw, I don't think this behavior is specific to C# or DirectML: I have a Python program doing repeated inferences and it reliably runs out of memory after about 95 runs, with a very similar memory graph, on both CUDA and DirectML (attempting to test on ROCm as well). It does seem to be related to models with external weights, I have not seen it on smaller single-file models, but the SD v1.5 UNet hits about ~300GB of virtual memory on the CPU side within 100 runs, even if I close the sessions/options. Unloading and reloading the model after 10 runs frees up VRAM, but does not fix the CPU side. Restarting the worker process entirely after 10 runs does free up all of the memory, but requires multiprocessing and reloading everything.

@elephantpanda
Copy link
Author

yes, it needs to be fixed to be able run model >2GB without leaking memory.

@pranavsharma
Copy link
Contributor

I can repo this on CUDA as well. Will try to take a look this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform
Projects
None yet
6 participants