-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when saving pyramid files #88
Comments
Update: I have a slightly more complex module that does other things in addition to loading and saving. This one still exhibits the memory leak if I remove Full module source code:
|
This sounds weird, I'll have a look. Thanks for the report! |
I tried here with a 1500 x 2000 pixel jpg image as source and I see a steady 105mb of memory use. It rises a bit at the start, but stabilizes after a few 100 iterations. I measured memuse by watching RES in What size image are you using, and what version of libvips and pyvips? |
Right, I missed the important details:
The same error occurs also on an Amazon EC2 instance. I cannot check right now for the libvips version, but it should be the latest one packaged with Ubuntu Xenial. The sample image I used is 4081x3206 pixels, so you might want to try with a substantially larger image—that might make a difference. The resident memory for the script reaches 1Gb within the first 30 iterations and only goes up. |
BTW, my source image format is TIFF. |
Ah, OK, yes I see high memuse here with a tiff source as well. It seems to be the operation cache -- if I add:
at the start, I see steady memory use of about 80mb. I'll see if I can improve behaviour with the cache enabled. |
This runs in a steady 150mb: /* compile with
*
* gcc -g -Wall leak.c `pkg-config vips --cflags --libs`
*/
#include <vips/vips.h>
int
main( int argc, char **argv )
{
int i;
if( VIPS_INIT( argv[0] ) )
vips_error_exit( NULL );
for( i = 0; i < 10000; i++ ) {
char *str;
size_t length;
VipsImage *im;
printf( "loop %d ...\n", i );
if( !(str = vips__file_read_name( argv[1], NULL, &length )) )
vips_error_exit( NULL );
if( !(im = vips_image_new_from_buffer( str, length, "", NULL )) )
vips_error_exit( NULL );
if( vips_image_write_to_file( im, "x.tif", NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
g_free( str );
}
return( 0 );
} So the problem is pyvips holding an extra ref to the input buffer. |
What are the implications of disabling the cache (I assume that is what that does) if I want to try it as a temporary workaround? |
It can run slower, depending on your code, but you'll get the same result in the end. libvips caches the last 1000 or so operations you perform, and will reuse results when it can. This gets you things like common-subexpression removal, and reuse of open images between threads. |
@jcupitt FYI, I tried adding the |
@workergnome track this |
Hi again, I tried this version of your test program: #!/usr/bin/python3
import sys
import pyvips
for i in range(100000):
print('loop {} ...'.format(i))
with open(sys.argv[1], 'rb') as fh:
img = pyvips.Image.new_from_buffer(fh.read(), '', access='sequential')
img.tiffsave('/tmp/flush.tif', compression='jpeg', Q=90) And tried with a fairly small 1500 x 2000 pixel RGB tif. I watched RES in top. Memory use rose to 1gb by 100 loops, 2.1gb by 300 loops, and then stabilised. I left it for 5000 ieterations, but memory use remained at 2.1gb. I turned off the cache like this: #!/usr/bin/python3
import sys
import pyvips
pyvips.cache_set_max(0)
for i in range(100000):
print('loop {} ...'.format(i))
with open(sys.argv[1], 'rb') as fh:
img = pyvips.Image.new_from_buffer(fh.read(), '', access='sequential')
img.tiffsave('/tmp/flush.tif', compression='jpeg', Q=90) And watched RES again. Memory use stabilised at 88mb after a few iterations. I left it for 10,000 loops. So ... in the short-term, I would disable the cache. Longer term, we need to stop pyvips keeping these extra copies of the input buffer. |
OK. I tested my full conversion script with the cache option off and now it seems to be flushing the memory periodically. I thought that in earlier attempts it did not. I will test again later today in my complete framework, and if it behaves as expected I will disable the cache as a temporary fix. I'll let you know soon. |
Now, here's something interesting. Your script does not leak in my laptop (Arch Linux), it does in my server (AWS, Ubuntu Xenial). If you have an EC2 instance with Ubuntu handy, are you able to confirm the same behavior? The instance where I see the leak happening is:
|
Ouch, libvips 8.2 is four years old. I would update that. |
I just noticed and I'm compiling the most recent one from source. I'll report back. |
libvips 4.7 is confirmed to get rid of the problem. I only tried 140 loops but with a quite large image (7892x9867). The res memory usage never goes above 400Mb. Aside from the root problem with the cache, it may be important to notify the maintainer(s) of the Ubuntu package that the version packaged with Xenial has a critical issue. I can take care of that. |
I've never had any luck nudging Ubuntu about updates, but do have a go. The one in cosmic is much better, fortunately. |
In my institution we use 16.04 LTS in EC2 instances, which is I think a norm. I sent the support team an email just in case. Updating from that archaic version with a memory leak should be a big incentive for using vips more! |
Totally gratuitous comment: after fixing the leak I was able to process 1K images for a total of 86 Gb in about half hour. Memory usage peaked at just above 2Gb when processing a 1.8Gb image. This is very efficient, kudos! |
Yes, my institution has 16.04 on all desktops and servers too. I'm glad it's working now. |
See reply from Ubuntu maintainers:
They have a pretty high bar for upgrading packages for a stable release, however I think that this upgrade would fall in the "severe regression" case in the linked guidelines, because the current Ubuntu package is pretty much unusable and may cause system instability. I'll take a stab at it, can't guarantee how far this goes. If a cherry-pick is needed, that would definitely be beyond my capacity. |
Unfortunately I have to report that there is still a memory leak by just opening and saving a file. Minimum program reproducing the issue: https://gist.github.com/scossu/cdde59337ff27f4436eafe23a61142b2 You can notice that the memory usage increases pretty sharply initially, then slows down to a creep but it's still there. It is enough to crash the program in a 32Gb machine when given enough, and large enough, images. I have tried removing lines from the test program. Removing the jpeg compression seemed to slow the leak down somewhat, and removing To make sure I am interpreting the results from the |
Hi again @scossu, I tried your program on this laptop with a 1450x2048 jpg image and 10000 iterations. I see: As you say, memory use rises at the beginning, but stabilises. I don't think there's a leak, just memory fragmentation. This is with 8.8, I'll try 8.7, there have been a couple of memory fixes. |
8.7 was very similar: it was at a steady 58mb after 10,000 iterations. |
@jcupitt Thanks for checking. My images are in the 5000x7000 range or larger, so that the memory increas may have been larger. After less than 1500 images the whole 32Gb system memory was filled up. I am working on a version of my program without vips, to verify whether it's that what is causing the issue. I will report back when I have more information. |
Oh, interesting, I'll try with a larger image. |
Do you have a lot of cores on your test machine, by the way? I'm running on a two-core, four-thread laptop. |
I have 8 logical cores, 8 processes running in parallel. I am not excluding memory not being released in my multiprocessing implementation, which is what I hope to know once I get to change the code from transforming the image with pyvips to a simple passthrough or similar.
…On May 11, 2019 7:47:51 AM CDT, John Cupitt ***@***.***> wrote:
Do you have a lot of cores on your test machine, by the way? I'm
running on a two-core, four-thread laptop.
|
I tried with a 10,000 x 10,000 pixel jpg and let it run for 10,000 iterations (overnight!): And it stabilised at 180mb after about 3500 loops. That's with git master libvips. Could there be some other factor? Do some of your images have errors in, for example? We did recently find a few memleaks when handling corrupt images. |
I did more testing using pyvips and a rough implementation using tifffile. The latter is much less efficient and produces much larger images (probably because of LZMA compression), but memory usage is stable over time. Pyvips has a pronounced buildup: The two images above are snapshots of 1 hour of processing, but the trend remains the same for both for several hours (i.e. with vips memory usage increases linearly). One more test I can do is using pyvips single-threaded in a simple loop, but that will be 8 times slower and I have to let it run overnight in order to gather enough data. |
I tried with the python3 image on dockerhub and I see growth there with your test program. That's using a very old libvips, though. I'll see if I can make a Dockerfile for pyvips with current stable libvips on AWS. |
Here's pyvips 2.1.5 (current stable) plus libvips 8.7.4 (current stable) running on amazonlinux 2018.03: https://github.com/jcupitt/docker-builds/tree/master/pyvips-aws-soak Build with:
Run with:
I see:
Could you be picking up an old version of libvips somehow? |
You can find out the version of libvips that pyvips has linked to like this:
|
I have the same versions on the server I ran my tests on, pyvips 2.1.5 and vips 8.7.4. |
By the way, I am using Python 3.7.3. |
Before I try compiling python 3.7.3 from source, could you check that you really are picking up the latest libvips in your tests? Please try this version of your test program: #!/usr/bin/python3
# usage: soak.py <test-image> <output-image>
import resource
import sys
import pyvips
pyvips.cache_set_max(0)
print("linked to libvips {}.{}.{}"
.format(pyvips.version(0), pyvips.version(1), pyvips.version(2)) )
for i in range(10000):
with open(sys.argv[1], "rb") as fh:
img = pyvips.Image.new_from_buffer(fh.read(), "", access="sequential")
kw = {
"compression": "jpeg",
"Q": 90,
"tile": True,
"tile_width": 256,
"tile_height": 256,
"pyramid": True,
"bigtiff": True,
}
img.tiffsave(sys.argv[2], **kw)
print("{}, {}"
.format(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)) And verify that you see a leak. |
You don't have a pre-compiled Python 3.7? I found it for Ubuntu 14 LTS. Your script output:
The memory increase is slow, but with larger images and multiple processes it consumes large quantities of RAM, even if I kill processes after only one job. I am not yet sure about my multiprocess handling not releasing memory, but this does not seem to happen with other implementations. |
You might need to let it run a little longer -- for me, memory did not completely stabilize until ~3500 iterations. libvips 8.8 is out now, by the way. It fixes a couple of leaks when images contain errors. I'll try with py 3.7. |
Here's pyvips 2.1.6, libvips 8.8, python 3.7.3 running on alpine: https://github.com/jcupitt/docker-builds/tree/master/pyvips-python3.7-alpine I see: It's impressive how much lower memuse is on alpine. It seems stable too. |
It could well be a problem with my code that only shows up with pyvips then. I will upgrade to v8.8 and retry. For now I have been able to keep memory in check by killing processes after each job, but at a higher scale it may not work. Thanks for all the testing. |
I have been chasing this issue down for some time, and the issue seems to be with the Script to reproduce (you need to throw a sRGB ICC profile file in the folder): https://gist.github.com/scossu/27111a4429b8b7f9df1af789741d7c3d Memory graph, run with 8 parallel processes on a 15K x 8K pixel image with 10K iterations: You can notice that memory goes up very quickly, and then bounces up and down, eventually going to a flat line. Sometimes (as in this case) the process goes on for very long and completes, but the script never ends. Some other time the processes go zombie one by one within 15-30 minutes and they stop doing anything. Processes left at the end of the run: If you comment out the pyramid option the graph changes drastically: I believe the issue may be somewhere along these lines: https://github.com/libvips/libvips/blob/master/libvips/foreign/vips2tiff.c#L1737-L1761 but I haven't investigated further. I'll change the title of the issue to be more relevant. |
Hi @scossu, Thanks for the detailed report. I'll have a look here. |
I tried a slight variant of your code: #!/usr/bin/python3
# Use: test_vips.py image_path iterations
import sys
import pyvips as vips
vips.leak_set(True)
vips.cache_set_max(0)
def _process(fname):
img = vips.Image.new_from_file(fname, access="sequential")
img.tiffsave_buffer(compression="jpeg", tile=True, pyramid=True)
if __name__ == "__main__":
fname = sys.argv[1]
ct = int(sys.argv[2])
for i in range(ct):
print(f"Processing image #{i} ...")
_process(fname) And I see memory growth here too. It looks like libvips itself isn't leaking, I always get 5mb of pixel buffer use however many iterations I do, but do see a steady MB or two per iteration. Perhaps it's in tiff memory output? I'll investigate. Thanks for reporting this! |
Huh, it was a silly error, we were not freeing layers other than the top layer in TIFF memory output to pyramid. This program now runs in a steady 50mb: /* compile with
*
* gcc -g -Wall leak.c `pkg-config vips --cflags --libs`
*/
#include <vips/vips.h>
int
main( int argc, char **argv )
{
int i;
if( VIPS_INIT( argv[0] ) )
vips_error_exit( NULL );
vips_cache_set_max( 0 );
vips_leak_set( TRUE );
for( i = 0; i < atoi( argv[2] ); i++ ) {
VipsImage *im;
void *buf;
size_t length;
printf( "loop %d ...\n", i );
if( !(im = vips_image_new_from_file( argv[1],
"access", VIPS_ACCESS_SEQUENTIAL,
NULL )) )
vips_error_exit( NULL );
if( vips_image_write_to_buffer( im, ".tif", &buf, &length,
"compression", VIPS_FOREIGN_TIFF_COMPRESSION_JPEG,
"pyramid", TRUE,
"tile", TRUE,
NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
g_free( buf );
}
return( 0 );
} Thanks! This will be in 8.8.1. |
The commit: libvips/libvips@dc16f12 |
That's great news! I will pull the commit and test in my framework. |
And thanks to you for pushing on this. |
@jcupitt Any word on releasing libvips 8.8.1? |
For anyone googling for how to deal with extracting the pages from large PDF files as images, this Issue got me on the right path. The cache doesn't have to be disabled completely, just limited. Example script here. |
When I open a TIFF image using
access='sequential'
and then save it usingcompression='jpeg'
memory is not released.Minimal example to reproduce the issue (run passing a large image file path as first parameter:
You can observe memory constantly increasing until eventually the program crashes.
Note that if I don't use sequential access, or JPEG compression, the memory is flushed as normal.
The text was updated successfully, but these errors were encountered: