Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Main Process Can Leak Memory/File Handles/Temp Files When Renderer Requests Files From ASAR #29292

Closed
dsanders11 opened this issue May 23, 2021 · 2 comments · Fixed by #29293

Comments

@dsanders11
Copy link
Member

dsanders11 commented May 23, 2021

Bug Description

When a renderer sends a request for a file in an ASAR archive (including unpacked files), the main process can leak the C++ asar::Archive object, leading to leaked memory and a file handle. If any temp files were created for that archive via asar::Archive::CopyFileOut, those would also be leaked. This doesn't happen on every request, see "Reproduction" and "Root Cause" for more info.

Refs #28962.
Might ref #27982 and explain why the switch to memory mapped files caused issues, as those would also be leaked by this issue. cc @nornagon.

Reproduction

To reproduce, send a handful (to guarantee reproduction, send dozens or even hundreds at once) of fetch requests from a renderer for a file in an ASAR. It doesn’t matter if the file is marked as unpacked or not (as long as you request them from a path with app.asar, and not app.asar.unpacked), either case will reproduce the bug. The amount of memory leaked is related to the header size of the ASAR archive (not the size of the file requested), so the bigger the archive the better, it only leaks ~10x the header size per reproduction. The leak can also be confirmed by looking at open file handles for the main process (lsof -p <pid> | grep 'app.asar$), there shouldn't be more than a few, and reproducing will cause them to increase.

There’s a timing aspect to the leak (which isn’t strictly timing related, more on that in "Root Cause"), so it’s best to wait 30 to 60 seconds between attempts to reproduce, leaving the app completely idle during that time.

The Signal app reliably reproduces the bug on macOS simply by opening the emoji picker in a conversation and flipping through the emoji categories, since it will send requests for dozens of emoji images, which are in the ASAR file for the app. That app has a large ASAR file (~100MB), so the memory leak is substantial and noticeable.

Root Cause

The leak is due to the usage of base::ThreadLocalPointer in shell/common/asar/asar_util.cc to cache asar::Archive files per-thread. That class doesn’t take ownership of the pointer, so memory management needs to be handled manually. From the header file comment for the class:

This means for uses of ThreadLocalPointer, you must correctly manage the memory yourself, these classes will not destroy the pointer for you. There are no at-thread-exit actions taken by these classes.

While there is an asar::ClearArchives function which will delete the pointer stored in base::ThreadLocalPointer, it does not align well with the real-world lifetime of the threads which handle URL requests from renderers. It also appears asar::ClearArchives is only called in a few locations, where it's executed on a main thread (main process, renderer, web worker), so it wouldn't effect the thread-local storage on worker threads. This means threads get destroyed without there being a call to asar::ClearArchives, leaking the asar::Archive instance. Due to the lifespan and nature of worker threads, it's not trivial to add a call to asar::ClearArchives which will be executed before thread exit.

In particular, the pointer in base::ThreadLocalPointer gets populated when asar::AsarURLLoader is run (via asar::GetOrCreateAsarArchive), and that URL loader is executed on the worker thread pool, which creates and destroys threads as needed. This is what creates the “timing” aspect in the reproduction. Rather than waiting an arbitrary amount of time, you can watch the number of threads for the main process and the leak will reproduce again once threads have been destroyed, meaning new threads will be created to handle future requests. In testing, an app left entirely idle will dwindle down to ~28-30 total threads for the main process (lldb shows 1 or 2 "ThreadPoolForegroundWorker" threads), and when hundreds of requests from the renderer are sent it increases to ~45-48 total threads (lldb shows ~16 "ThreadPoolForegroundWorker" threads). When left idle for a few dozen seconds, it then dwindles back down to ~28-30 threads again.

Is The ASAR Archive Cache Necessary?

Probably? At the moment it's only being used on some subset of requests from renderers since worker threads come and go. App startup is likely a time when a lot of apps load a handful of files at once, which would use multiple worker threads, which will each have a cache miss, so currently a common path isn't really getting much benefit from the caching. However, testing with the Signal app since has a large ASAR did show that with caching disabled, each request took ~40ms longer, so the improvement with caching is not insignificant for larger apps. For real world usage it looks a little mixed though, as there seems to be a bottleneck elsewhere in the main process code when more than a few requests are sent at once, and fixing this issue didn't seem to create any real difference in the startup time of the Signal app.

@gpetrov
Copy link

gpetrov commented May 23, 2021

This is really ground breaking bug report! Awesome work @dsanders11 - great research and PR solution!

We were just about to dump Asar usage just because of this memory leak and the tons of out of memory crashes coming because of it. But now there is light on the horizon!

Hope the PR gets quickly merged by the Electron team!

zcbenz pushed a commit that referenced this issue Jun 4, 2021
* fix: change ASAR archive cache to per-process to fix leak (#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes
zcbenz pushed a commit that referenced this issue Jun 4, 2021
* fix: change ASAR archive cache to per-process to fix leak (#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes

Co-authored-by: David Sanders <dsanders11@ucsbalum.com>
zcbenz pushed a commit that referenced this issue Jun 4, 2021
* fix: change ASAR archive cache to per-process to fix leak (#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes

Co-authored-by: David Sanders <dsanders11@ucsbalum.com>
dsanders11 added a commit to dsanders11/electron that referenced this issue Jun 5, 2021
…9293)

* fix: change ASAR archive cache to per-process to fix leak (electron#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes
zcbenz pushed a commit that referenced this issue Jun 7, 2021
* fix: change ASAR archive cache to per-process to fix leak (#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes
@wenssh
Copy link

wenssh commented Nov 16, 2022

Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants