Leak on Linux? #955

asilvas · 2017-09-20T22:59:33Z

Been troubleshooting a leak in https://github.com/asilvas/node-image-steam (processes millions of images every day) and originally thought it was in my project but after a number of heap dump checks I determined it wasn't a leak in V8.

In order to break it down the simplest parts I recorded the traffic in a serial form so it can be replayed in a pure sharp script.

https://gist.github.com/asilvas/474112440535051f2608223c8dc2fcdf

npm i sharp request

curl https://gist.githubusercontent.com/asilvas/474112440535051f2608223c8dc2fcdf/raw/be4e593c6820c0246acf2dc9604012653d71c353/sharp.js > sharp.js
curl https://gist.githubusercontent.com/asilvas/474112440535051f2608223c8dc2fcdf/raw/be4e593c6820c0246acf2dc9604012653d71c353/sharp.log > sharp.log

node sharp.js http://img1.wsimg.com/isteam sharp.log

It's downloading these files on the fly, which avoids any FS caching which will bloat memory usage, and forwards the instructions (in sharp.log) directly to sharp, one at a time.

Memory usage gets into 500MB+ within a few mins (at least on Docker+CentOS), and seems to eventually peak. On some systems I've seen over 2GB usage. Only processing a single image at a time should be pretty flat in memory usage. Have you seen this before? Any ideas? I wasn't aware of anything sharp/vips was doing that should be triggering Linux's file caching.

Edit: While memory usage on Mac is still higher than I expect for a single image processed at a time (~160MB) after a couple hundred images, it's nowhere near as high as on Linux.. And it seems to peak quickly. So it appears to be a linux only issue. Docker is also involved, so not ruling that out either.

The text was updated successfully, but these errors were encountered:

lovell · 2017-09-21T08:12:26Z

Hello, how is memory usage being measured? If RSS, please remember this includes free memory that has not (yet) been returned to the OS, which explains why different OSs report different RSS for the same task.

If you've not seen then, there have been quite a few related questions previously:
https://github.com/lovell/sharp/search?utf8=%E2%9C%93&q=rss+%22returned+to+the+OS%22&type=Issues

asilvas · 2017-09-21T19:26:02Z

Yes, RSS being the main indicator. But I also rely on buff/cache and avail memory to better understand "free-able" memory, and this indicates that this memory is never being released back. This memory doesn't seem to reside in the V8 memory space, as indicated from dozens of heap dump tests.

Thanks for links to the other issues that seem connected. I'm not entirely sure the issue is fully understood though. I'm not convinced it's an issue with sharp either, but I'm hoping we can work around the problem as the impact is quite significant. We're using ~5x the memory we should be, which becomes a big deal when you're serving (many) millions of requests/day.

I'll continue investigating the related cases. So far I've found no workaround to the problem that doesn't involve not using toBuffer.

lovell · 2017-09-21T19:37:30Z

It's worth subscribing to nodejs/node#1671 for V8 updates that will improve GC of Buffer objects.

If you're not already doing so, you might want to experiment with a different memory allocator such as jemalloc. You'll probably see less fragmentation, but that's still dealing with the effect rather than the cause.

asilvas · 2017-09-21T20:05:53Z

Will do, thanks.

Probably no surprise, but I was at least able to correlate usage with the concurrency setting.

In my isolated (sharp-only) test these were the findings:

0 concurrency: 276MB (should be detecting 4 in my local setup)
1 concurrency: 190MB
4 concurrency: 276MB
8 concurrency: 398MB (our prod env)
16 concurrency: 500MB

lovell · 2017-09-22T09:09:08Z

Does the prod environment use 8 real CPU cores or is this "vCPU" hyper-threading? If the latter, perhaps also experiment halving concurrency to improve throughput (and reduce the memory effects).

asilvas · 2017-09-22T12:42:17Z

I am artificially limiting cores to keep memory in check, but at the cost of up to 30% slower response times. Temporary test.

asilvas · 2017-09-25T16:26:52Z

Feel free to close this as a duplicate of others. But from everything I've learned of the problem thus far there doesn't seem to be any conclusive evidence that this is a Node and/or V8 issue. From the symptoms of the isolated tests I've run (as well as others) it does seem to be any issue with sharp or vips as this isn't a common problem in the node community to see run-away memory increases of this nature. I was able to verify this is not a case of GC'd memory not being released back to the OS, confirming that this memory was in-use and any reduction in available memory resulting in memory allocation failure. But as I said, nothing conclusive either way.

I tried to investigate ways to resolve/workaround the problem within Sharp but was unsuccessful -- hopefully someone with more expertise with V8 native modules will have better luck.

lovell · 2017-09-25T19:57:42Z

"I was able to verify this is not a case of GC'd memory not being released back to the OS"

Could memory fragmentation explain this?

asilvas · 2017-09-25T20:46:08Z

I haven't proven/disproven that theory. But with the modest number of objects being processed to achieve such high memory usage, it'd seem to require some pretty severe fragmentation to justify this.

Is your thoughts that it's fragmentation in V8 or native space?

lovell · 2017-09-26T13:45:51Z

Are you using Node 8? If not, do you see the same RSS levels with it?

Were you able to try jemalloc? It provides useful debugging via malloc_stats_print.

Given you're using CentOS, have you tried disabling transparent huge pages?

asilvas · 2017-09-26T23:37:04Z

Not using 8 in prod, but yes was able to reproduce similar RSS levels (with 8.5.0) in the isolated test.

Might be a bit before I can look into the other options but will keep in mind, thanks.

lovell · 2017-10-15T08:34:24Z

If sharpen or blur operations are being used then the small leak fixed in https://github.com/jcupitt/libvips/issues/771 may be related here.

asilvas · 2017-10-15T15:02:55Z

The test sample for this topic doesn't use those two operations so probably unrelated. But we do use them on occasion, thanks!

trev-dev · 2017-11-23T17:46:21Z

I've found that running my sharp modules in a child_process spawn that exits once it's completed works really well for me. It keeps the memory load down.

asilvas · 2017-11-24T01:49:41Z

Was hoping to avoid spawning a child process, but it is something I had considered as well. It's manageable at the moment so holding out for now.

lovell · 2017-12-19T10:25:30Z

@asilvas The sharp tests just revealed a memory leak on one possible libvips error path when using toBuffer (or pipe output) with JPEG output - see https://github.com/jcupitt/libvips/pull/835

asilvas · 2017-12-19T19:22:27Z

Excellent find (and fix), @lovell ! Thanks, these sort of fixes make a big difference when processing millions of images. Any idea when this fix will be available?

lovell · 2017-12-19T20:17:45Z

@asilvas The next libvips v8.6.1 patch release should contain this fix, which then allows the release of sharp v0.19.0.

kishorgandham · 2018-01-04T15:25:02Z

libvips v8.6.1 is out
https://github.com/jcupitt/libvips/releases/tag/v8.6.1

lovell · 2018-02-03T14:51:36Z

@asilvas Are you seeing an improvement with the latest libvips/sharp?

asilvas · 2018-02-03T15:22:06Z

In testing, will let you know next week.

vinerz · 2018-02-07T19:30:20Z

I am currently having the same issue.

Stack information:
Heroku on Ubuntu Server 16.04.3
Node 9.5.0

Libraries versions:
LIBVIPS=8.6.2
LIBWEBP=0.6.1
LIBTIFF=4.0.9
LIBSVG=2.42.2
LIBGIF=5.1.4

Sharp:
Version: 0.19.0

I also use a lot of JPEG toBuffer.

Even under stress the V8 heap doesn't change, but the non-heap memory grows consistently on every request.

Before the update my memory usage was steadily at 300MB using libvips 7.42.3 and sharp 0.17.1

asilvas · 2018-02-07T19:44:19Z

Seeing similar results after ~24 hours in production:

Overall memory usage patterns seems to be a bit improved, but still far higher than I'd expect (eventually reaching 2GB, perhaps related to prior suggestions). I have noticed some perf improvements overall, though it could be due to the relatively short life of the new containers. I might revisit doing some more memory profiling at some point, but it'll have to wait for now.

I'll let you know if any new data surfaces.

vinerz · 2018-02-07T21:17:53Z

@asilvas would you mind sharing the throughput of your servers and if they are many different images?

asilvas · 2018-02-07T22:00:06Z

@vinerz We generate over 20 million images per day, from millions of source images. Overall throughput is much higher, but that part is unrelated to this topic. Powered by https://github.com/asilvas/node-image-steam, and of course sharp+libvips.

vinerz · 2018-02-07T22:09:15Z

Thanks for the answer!

Based on this information, I can see that my leakage is getting larger much, much faster than yours, even tough manipulating only 200 thousand images per day from around 50 thousand different sources.

It might be related to the fact that I use toBuffer a lot in my code due to filters / resizing chains.

I'll try disabling libvips cache to see what happens.

asilvas · 2018-02-07T22:34:18Z

We use toBuffer for the final result of every image as well: https://github.com/asilvas/node-image-steam/blob/b96c3d39bc7b125f552b1cef0d1dfa05be3b488e/lib/processor/processor.js#L103

Our sharp options include:

cache: false
concurrency: 4
simd: true

I've toyed with options quite a bit in the past, but might be worth revisiting with the recent changes/fixes.

vinerz · 2018-02-08T00:44:03Z

I modified the core app flow to use a single sharp object and changed all toBuffer chain to a single Stream piped directly to the Express Response, but I am getting the same memory results. It might related to something else.

Currently using cache: false and concurrency: 2

lovell · 2018-02-09T18:54:43Z

@asilvas Thank you for the detailed updates!

@vinerz Your comments mention "Node 9.5.0" and "Before the update... using libvips 7.42.3 and sharp 0.17.1". I suspect you were using a different version of Node "before the update" too. If so, does returning to the previous version make any difference?

lovell · 2020-07-23T20:07:51Z

@egekhter Whilst Node.js Worker Threads will probably help the single-threaded world of jimp, they won't offer much to help the multi-threaded world of sharp/libvips and can cause greater heap fragmentation faster.

30x worker threads each spawning 4x libuv threads each spawning 16x (c5.4xlarge vCPU) libvips threads is a concurrency of 1920 threads all allocating/freeing memory from the same pool.

If you'd like to "manage" concurrency via Worker Threads, then try setting sharp.concurrency to 1 so libvips doesn't also try to do so.

egekhter · 2020-07-23T21:08:27Z

@egekhter Whilst Node.js Worker Threads will probably help the single-threaded world of jimp, they won't offer much to help the multi-threaded world of sharp/libvips and can cause greater heap fragmentation faster.

30x worker threads each spawning 4x libuv threads each spawning 16x (c5.4xlarge vCPU) libvips threads is a concurrency of 1920 threads all allocating/freeing memory from the same pool.

If you'd like to "manage" concurrency via Worker Threads, then try setting sharp.concurrency to 1 so libvips doesn't also try to do so.

Solved my problem with your help.

sharp.concurrency(1);

Up to 70x child processes and still have plenty of memory left.

Thanks for all your work!

lovell · 2021-03-10T09:27:40Z

Please see #2607 for a change in the default concurrency for glibc-based Linux users that will be in v0.28.0.

FoxxMD · 2022-05-05T18:09:53Z

not related...but @vinerz what application were you using to monitor memory in this comment? Is that heroku's dashboard?

vinerz · 2022-05-05T18:30:39Z

not related...but @vinerz what application were you using to monitor memory in this comment? Is that heroku's dashboard?

Hey @FoxxMD, that's New Relic's application monitor for node 😄

alcidesbsilvaneto · 2023-02-18T02:41:23Z

I switched from Debian to Alpine (and admittedly Node 11 to 12):

Thanks to everyone who suggested this as a fix!

Still the solution for me in 2023.

refs lovell/sharp#955 (comment) - we've seen Ghost hogging memory whenever images are uploaded - it seems to be due to an issue with memory fragmentation, because a heap snapshot of a container after the memory grows shows nothing in JS code - we've been using jemalloc in production but it still seems to occur - this change has been suggested on the referenced thread in order to improve fragmentation on top of using jemalloc - can easily revert if it causes issues

nandi95 · 2023-09-03T23:08:26Z

Went from:

to:

It's a definite improvement.

rambo-panda · 2023-12-28T15:14:18Z

const sharp = require("sharp");
(async () => {
    await Promise.all([...Array(100)].map(() => sharp("./test.webp").rotate(90).toBuffer()));
})();

use libjemalloc.so cache:true concurrency : 1
use libjemalloc.so cache:false concurrency : 1
use libjemalloc.so cache:true concurrency : 16
use libjemalloc.so cache:false concurrency : 16

use glibc cache: false concurrency:1
use glibc cache:true concurrency:1
use glibc cache:false concurrency : 16
use glibc cache: true concurrency: 16

lovell added the question label Sep 21, 2017

lovell mentioned this issue Aug 15, 2020

Why jemalloc greatly reduces memory usage? #2324

Closed

kleisauke mentioned this issue Oct 11, 2020

Possible TargetCustom Leak kleisauke/net-vips#96

Closed

lovell mentioned this issue Oct 11, 2020

Consuming a lot of memory while processing images #2407

Closed

vietlongn mentioned this issue Nov 18, 2020

Multiple Image Uploads result in memory leak strapi/strapi#6432

Closed

lovell mentioned this issue Nov 27, 2020

Question #2464

Closed

This was referenced Mar 4, 2021

Reduce concurrency when using glibc-based Linux without jemalloc to help prevent memory fragmentation #2607

Closed

Experiment: consider statically-linking jemalloc/mimalloc for glibc-based Linux lovell/sharp-libvips#95

Open

lovell mentioned this issue Mar 12, 2021

next/image memory leak? vercel/next.js#20915

Closed

lovell mentioned this issue May 7, 2021

Memory did not released back after process is complete #2700

Closed

kuehlein mentioned this issue Jan 28, 2022

Memory leak #3052

Closed

HoraceShmorace mentioned this issue Mar 29, 2022

VipsJpeg: Insufficient memory HoraceShmorace/Image-Flex#12

Closed

FoxxMD mentioned this issue May 5, 2022

Investigate increased memory usage FoxxMD/context-mod#90

Closed

kleisauke mentioned this issue Nov 14, 2022

SEGFAULT after the inclusion of FFI::shutdown() to release memory consumption libvips/php-vips#170

Closed

n0vad3v mentioned this issue May 31, 2023

webp-server uses to much memory webp-sh/webp_server_go#198

Closed

Marc-Roig mentioned this issue Jun 12, 2023

fix: Sharp memory fragmentation strapi/strapi#16978

Closed

lovell mentioned this issue Jun 29, 2023

There is a memory leak issue on processing specific image #3712

Closed

kleisauke mentioned this issue Aug 11, 2023

Advice on lowering memory usage kleisauke/net-vips#215

Closed

lovell mentioned this issue Oct 9, 2023

delete sharp instance #3817

Closed

lovell mentioned this issue Oct 30, 2023

Memory usage is insanely high #3834

Closed

styfle mentioned this issue Feb 7, 2024

Another Memory Leak in next/image vercel/next.js#44685

Open

1 task

hilja mentioned this issue Feb 15, 2024

Improve "Tracing has already been started" message GoogleChrome/lighthouse#15775

Open

2 tasks

Leak on Linux? #955

Leak on Linux? #955

Comments

asilvas commented Sep 20, 2017 • edited Loading

lovell commented Sep 21, 2017

asilvas commented Sep 21, 2017 • edited Loading

lovell commented Sep 21, 2017

asilvas commented Sep 21, 2017 • edited Loading

lovell commented Sep 22, 2017

asilvas commented Sep 22, 2017

asilvas commented Sep 25, 2017 • edited Loading

lovell commented Sep 25, 2017

asilvas commented Sep 25, 2017

lovell commented Sep 26, 2017

asilvas commented Sep 26, 2017

lovell commented Oct 15, 2017

asilvas commented Oct 15, 2017

trev-dev commented Nov 23, 2017

asilvas commented Nov 24, 2017 • edited Loading

lovell commented Dec 19, 2017 • edited Loading

asilvas commented Dec 19, 2017

lovell commented Dec 19, 2017

kishorgandham commented Jan 4, 2018

lovell commented Feb 3, 2018

asilvas commented Feb 3, 2018

vinerz commented Feb 7, 2018

asilvas commented Feb 7, 2018

vinerz commented Feb 7, 2018

asilvas commented Feb 7, 2018 • edited Loading

vinerz commented Feb 7, 2018

asilvas commented Feb 7, 2018 • edited Loading

vinerz commented Feb 8, 2018 • edited Loading

lovell commented Feb 9, 2018

lovell commented Jul 23, 2020

egekhter commented Jul 23, 2020 • edited Loading

lovell commented Mar 10, 2021

FoxxMD commented May 5, 2022

vinerz commented May 5, 2022

alcidesbsilvaneto commented Feb 18, 2023

nandi95 commented Sep 3, 2023

rambo-panda commented Dec 28, 2023 • edited Loading

asilvas commented Sep 20, 2017 •

edited

Loading

asilvas commented Sep 21, 2017 •

edited

Loading

asilvas commented Sep 21, 2017 •

edited

Loading

asilvas commented Sep 25, 2017 •

edited

Loading

asilvas commented Nov 24, 2017 •

edited

Loading

lovell commented Dec 19, 2017 •

edited

Loading

asilvas commented Feb 7, 2018 •

edited

Loading

asilvas commented Feb 7, 2018 •

edited

Loading

vinerz commented Feb 8, 2018 •

edited

Loading

egekhter commented Jul 23, 2020 •

edited

Loading

rambo-panda commented Dec 28, 2023 •

edited

Loading