Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in PoseNet's video processing pattern #129

Open
davepagurek opened this issue Apr 6, 2024 · 5 comments
Open

Memory leak in PoseNet's video processing pattern #129

davepagurek opened this issue Apr 6, 2024 · 5 comments

Comments

@davepagurek
Copy link

Hi everyone! I realize there's a new version of the library being developed so this bug doesn't actually have to be addressed, but I thought maybe the learnings of the cause might be good to know for future development.

→ Step 1: Describe the issue 📝

So I was running PoseNet in a sketch being used in an art installation, intended to be running for days at a time. After a few hours on, it would mostly still run fast, but would drop a bunch of frames once in a while (at an accelerating pace, after being left on for e.g. 12 hours.) This shows up in a performance profile as a 1.2s(!!) garbage collection:
image

I compared some heap profiles over time to see what got added, and noticed that some of the added memory includes a promise, referenced by a promise, referenced by a promise, referenced by a promise...:

image

→ Step 2: Screenshots or Relevant Documentation 🖼

I narrowed this down to the usage of promises in multiPose:
https://github.com/ml5js/ml5-library/blob/f80f95aa6b31191ab7e79ff466e8506f5e48a172/src/PoseNet/index.js#L180-L182

→ Step 3: Share an example of the issue 🦄

So, this code has the leak, using the automatic detection on the next frame, going through the promises above:

class MyDetector {
  constructor(constraints) {
    this.capture = createCapture(constraints)
    this.poseNet = ml5.poseNet(this.capture, {
       modelUrl: 'lib/model-stride16.json',
       graphModelURL: 'lib/group1-shard1of1.bin',
    })
    this.poseNet.on('pose', (results) => {
      this.processNextPose(results)
    })
  }

  processNextPose(results) {}
}

Meanwhile, this code does not seem to leak, by not initializing PoseNet with a video, and instead manually calling multiPose:

class MyDetector {
  constructor(constraints) {
    this.capture = createCapture(constraints)
    this.ready = false; // Manually store when we're ready to process a frame
    this.poseNet = ml5.poseNet(undefined, { // Pass undefined here
       modelUrl: 'lib/model-stride16.json',
       graphModelURL: 'lib/group1-shard1of1.bin',
    }, () => {
      this.ready = true
    })
    this.poseNet.on('pose', (results) => {
      this.processNextPose(results)
    })
  }

  // Call this every frame in draw():
  update() {
    if (this.ready) {
      this.ready = false;
      this.poseNet.multiPose(this.capture)
      requestAnimationFrame(() => {
        this.ready = true
      })
    }
  }

  processNextPose(results) {}
}

So it seems like the problem was in fact the buildup of contexts in the promises. It tries to return the promise from multiPose, effectively doing an asynchronous infinite loop on itself, and I guess the browser keeps around the whole chain of promises due to the fact that it's being returned instead of just awaited.

Anyway, I just wanted to flag this in case the same pattern is being used elsewhere!

→ Describe your setup 🦄

  • Web browser & version: Chrome 123
  • Operating System: MacOS 14.2.1
  • ml5 version you're using: 0.12.2
@shiffman
Copy link
Member

shiffman commented May 3, 2024

Hi @davepagurek thank you for reporting this and for your patience! I am working adding some documentation to this repo and archiving it. We have moved current ml5.js development to a new repository:

https://github.com/ml5js/ml5-next-gen

The latest examples are here:

https://editor.p5js.org/ml5/collections/pUzWMkdmE

We are planning to launch a new version of the library + website in June.

The new BodyPose functionality of ml5.js uses updated models behind the scenes (MoveNet and BlazePose). @ziyuan-linn have you noticed any signs of a memory leak like this in our newer examples?

@davepagurek
Copy link
Author

Just skimming the code, it looks like the looping pattern here https://github.com/ml5js/ml5-next-gen/blob/main/src%2FBodyPose%2Findex.js#L312 would avoid the issue since it doesn't return the result of the promise, but I can try profiling the examples when I get back to my computer to be sure.

@davepagurek
Copy link
Author

I was testing out this one https://editor.p5js.org/ml5/sketches/OukJYAJAb and taking heap snapshots every few minutes, and the snapshots seem to increase in size over time:

image

So that might mean there's a leak somewhere still? Here's the first and last snapshot for comparison:
BEFORE-Heap-20240504T094341.heapsnapshot.zip
AFTER-Heap-20240504T094350.heapsnapshot.zip

I see something kind of similar going on in the added Objects. They don't appear as promises this time, but the last one in the chain does say "pending."
image

@shiffman shiffman transferred this issue from ml5js/ml5-library May 5, 2024
@shiffman
Copy link
Member

shiffman commented May 5, 2024

Thanks @davepagurek I moved this issue into the current repo for further discussion and tracking!

@ziyuan-linn
Copy link
Member

@davepagurek Thank you for reporting this issue!

I did some testing and found memory leaks in a few models. Here are the heap snapshots:

bodyPose, with MoveNet as the underlying model:

bodyPose, with BlazePose as the underlying model on mediapipe runtime:

faceMesh, on tfjs runtime:

faceMesh, on mediapipe runtime:

This problem only seems to occur when using the mediapipe runtime. I think the problem could be due to MediaPipe using WASM and C++(no automatic garbage collection). Even this tfjs demo of face-landmark-detection on MediaPipe runtime have a memory leak. I could not find much information about this, but here is a potentially related issue with a solution.

For now, switching to tfjs runtime or MoveNet on the new version of ml5 should fix the memory leak:

bodyPose = ml5.bodyPose("BlazePose", { runtime: "tfjs" });

or

bodyPose = ml5.bodyPose("MoveNet");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants