Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when saving pyramid files #88

Closed
scossu opened this issue Mar 16, 2019 · 52 comments
Closed

Memory leak when saving pyramid files #88

scossu opened this issue Mar 16, 2019 · 52 comments

Comments

@scossu
Copy link

scossu commented Mar 16, 2019

When I open a TIFF image using access='sequential' and then save it using compression='jpeg' memory is not released.

Minimal example to reproduce the issue (run passing a large image file path as first parameter:


#!/usr/bin/env python3

import sys

import pyvips

fpath = sys.argv[1]

i = 1
while True:
    print('Iteration {}'.format(i))
    with open(fpath, 'rb') as fh:
        img = pyvips.Image.new_from_buffer(fh.read(), '', access='sequential')
        img.tiffsave('/tmp/flush.tif', compression='jpeg', Q=90)
    i += 1

You can observe memory constantly increasing until eventually the program crashes.

Note that if I don't use sequential access, or JPEG compression, the memory is flushed as normal.

@scossu
Copy link
Author

scossu commented Mar 16, 2019

Update: I have a slightly more complex module that does other things in addition to loading and saving. This one still exhibits the memory leak if I remove access='sequential'. I did not try removing JPEG compression because that's something I need to keep.

Full module source code:

import logging

from io import BytesIO
from os import path

import pyvips

from pyvips.enums import Interpretation as chmode, Intent

import gcis


SRGB_PROFILE_FPATH = path.join(gcis.basedir, "resources", "sRGB2014.icc")
GRAY_PROFILE_FPATH = path.join(
    gcis.basedir, "resources", "generic_gray_gamma_2.2.icc"
)

logger = logging.getLogger(__name__)


def tiff_to_ptiff(data, out_path=None, auto_flatten=False):
    """
    Convert a TIFF image to a pyramidal TIFF using pyvips and tiffcp.

    :param BytesIO data: Streaming buffer to read from.
    :param str out_path: Path to write to. If omitted, a BytesIO object is
        returned.
    :param bool auto_flatten: If set to True, when an image with alpha is
        encountered, the channel will be discarded and the image flattened.
        If False (the default), an exception is raised in order to allow
        manual inspection and correction of the image.

    :rtype: BytesIO or None
    :return: Byte buffer with encoded TIFF image contents, if ``out_path`` was
        not set, otherwise ``None``.
    """
    img = pyvips.Image.new_from_buffer(data.read(), "", access="sequential")

    channels = img.interpretation
    if channels not in (chmode.SRGB, chmode.RGB, chmode.B_W):
        raise ValueError(f"Color mode {channels} is not supported.")

    # Since only RGB and gray are supported, 4 bands means RGBA, 2 bands
    # means Gray + A.
    if img.bands == 4 or img.bands == 2:
        if auto_flatten:
            logger.info("Removing alpha channel.")
            img = img.flatten()
        else:
            raise ValueError(
                "Alpha channel detected. Not trying to flatten automatically."
            )

    # Set the right ICC profile unconditionally.
    logger.info("Setting ICC profile.")
    profile_fpath = (
        GRAY_PROFILE_FPATH if channels == chmode.B_W else SRGB_PROFILE_FPATH
    )
    # with open(profile_fpath) as fh:
    #    img.set('icc-profile-data', fh.read())
    img = img.icc_transform(profile_fpath, intent=Intent.PERCEPTUAL)

    kw = {
        "compression": "jpeg",
        "Q": 90,
        "profile": profile_fpath,
        "tile": True,
        "tile_width": 256,
        "tile_height": 256,
        "pyramid": True,
        "bigtiff": True,
        #"region_shrink": "average",
    }

    if out_path:
        img.tiffsave(out_path, **kw)
    else:
        return BytesIO(img.tiffsave_buffer(**kw))

@jcupitt
Copy link
Member

jcupitt commented Mar 17, 2019

This sounds weird, I'll have a look. Thanks for the report!

@jcupitt
Copy link
Member

jcupitt commented Mar 17, 2019

I tried here with a 1500 x 2000 pixel jpg image as source and I see a steady 105mb of memory use. It rises a bit at the start, but stabilizes after a few 100 iterations. I measured memuse by watching RES in top. I waited for 6000 loops.

What size image are you using, and what version of libvips and pyvips?

@scossu
Copy link
Author

scossu commented Mar 17, 2019

Right, I missed the important details:

  • libvips 8.7.4-1 (Arch Linux)
  • pyvips 2.1.5

The same error occurs also on an Amazon EC2 instance. I cannot check right now for the libvips version, but it should be the latest one packaged with Ubuntu Xenial.

The sample image I used is 4081x3206 pixels, so you might want to try with a substantially larger image—that might make a difference.

The resident memory for the script reaches 1Gb within the first 30 iterations and only goes up.

@scossu
Copy link
Author

scossu commented Mar 17, 2019

The sample image I used is 4081x3206 pixels

BTW, my source image format is TIFF.

@jcupitt
Copy link
Member

jcupitt commented Mar 17, 2019

Ah, OK, yes I see high memuse here with a tiff source as well.

It seems to be the operation cache -- if I add:

pyvips.cache_set_max(0)

at the start, I see steady memory use of about 80mb.

I'll see if I can improve behaviour with the cache enabled.

@jcupitt
Copy link
Member

jcupitt commented Mar 17, 2019

This runs in a steady 150mb:

/* compile with
 *
 *      gcc -g -Wall leak.c `pkg-config vips --cflags --libs`
 */

#include <vips/vips.h>

int
main( int argc, char **argv )
{
        int i;

        if( VIPS_INIT( argv[0] ) )
                vips_error_exit( NULL );

        for( i = 0; i < 10000; i++ ) {
                char *str;
                size_t length;
                VipsImage *im;

                printf( "loop %d ...\n", i );
                
                if( !(str = vips__file_read_name( argv[1], NULL, &length )) )
                        vips_error_exit( NULL );
                if( !(im = vips_image_new_from_buffer( str, length, "", NULL )) )
                        vips_error_exit( NULL );
                if( vips_image_write_to_file( im, "x.tif", NULL ) )
                        vips_error_exit( NULL );

                g_object_unref( im );
                g_free( str );
        }

        return( 0 );
}

So the problem is pyvips holding an extra ref to the input buffer.

@scossu
Copy link
Author

scossu commented Mar 18, 2019

pyvips.cache_set_max(0)

What are the implications of disabling the cache (I assume that is what that does) if I want to try it as a temporary workaround?

@jcupitt
Copy link
Member

jcupitt commented Mar 18, 2019

It can run slower, depending on your code, but you'll get the same result in the end.

libvips caches the last 1000 or so operations you perform, and will reuse results when it can. This gets you things like common-subexpression removal, and reuse of open images between threads.

@scossu
Copy link
Author

scossu commented Mar 18, 2019

@jcupitt FYI, I tried adding the pyvips.cache_set_max(0) line in the complete example I pasted above, but it's still leaking memory. I will wait to test your fix once that is available. Thanks.

@scossu
Copy link
Author

scossu commented Mar 20, 2019

@workergnome track this

@jcupitt
Copy link
Member

jcupitt commented Mar 22, 2019

Hi again, I tried this version of your test program:

#!/usr/bin/python3

import sys
import pyvips

for i in range(100000):
    print('loop {} ...'.format(i))
    with open(sys.argv[1], 'rb') as fh:
        img = pyvips.Image.new_from_buffer(fh.read(), '', access='sequential')
        img.tiffsave('/tmp/flush.tif', compression='jpeg', Q=90)

And tried with a fairly small 1500 x 2000 pixel RGB tif. I watched RES in top. Memory use rose to 1gb by 100 loops, 2.1gb by 300 loops, and then stabilised. I left it for 5000 ieterations, but memory use remained at 2.1gb.

I turned off the cache like this:

#!/usr/bin/python3

import sys
import pyvips

pyvips.cache_set_max(0)

for i in range(100000):
    print('loop {} ...'.format(i))
    with open(sys.argv[1], 'rb') as fh:
        img = pyvips.Image.new_from_buffer(fh.read(), '', access='sequential')
        img.tiffsave('/tmp/flush.tif', compression='jpeg', Q=90)

And watched RES again. Memory use stabilised at 88mb after a few iterations. I left it for 10,000 loops.

So ... in the short-term, I would disable the cache. Longer term, we need to stop pyvips keeping these extra copies of the input buffer.

@scossu
Copy link
Author

scossu commented Mar 22, 2019

OK. I tested my full conversion script with the cache option off and now it seems to be flushing the memory periodically. I thought that in earlier attempts it did not.

I will test again later today in my complete framework, and if it behaves as expected I will disable the cache as a temporary fix. I'll let you know soon.

@scossu
Copy link
Author

scossu commented Mar 22, 2019

Now, here's something interesting. Your script does not leak in my laptop (Arch Linux), it does in my server (AWS, Ubuntu Xenial).

If you have an EC2 instance with Ubuntu handy, are you able to confirm the same behavior?

The instance where I see the leak happening is:

  • /etc/debian_version: stretch/sid
  • glibc: 2.48.2
  • libvips: 8.2.2-1
  • pyvips: 2.1.5

@jcupitt
Copy link
Member

jcupitt commented Mar 22, 2019

Ouch, libvips 8.2 is four years old. I would update that.

@scossu
Copy link
Author

scossu commented Mar 22, 2019

I just noticed and I'm compiling the most recent one from source. I'll report back.

@scossu
Copy link
Author

scossu commented Mar 22, 2019

libvips 4.7 is confirmed to get rid of the problem. I only tried 140 loops but with a quite large image (7892x9867). The res memory usage never goes above 400Mb.

Aside from the root problem with the cache, it may be important to notify the maintainer(s) of the Ubuntu package that the version packaged with Xenial has a critical issue. I can take care of that.

@jcupitt
Copy link
Member

jcupitt commented Mar 23, 2019

I've never had any luck nudging Ubuntu about updates, but do have a go.

The one in cosmic is much better, fortunately.

@scossu
Copy link
Author

scossu commented Mar 23, 2019

In my institution we use 16.04 LTS in EC2 instances, which is I think a norm. I sent the support team an email just in case. Updating from that archaic version with a memory leak should be a big incentive for using vips more!

@scossu
Copy link
Author

scossu commented Mar 23, 2019

Totally gratuitous comment: after fixing the leak I was able to process 1K images for a total of 86 Gb in about half hour. Memory usage peaked at just above 2Gb when processing a 1.8Gb image. This is very efficient, kudos!

@jcupitt
Copy link
Member

jcupitt commented Mar 23, 2019

Yes, my institution has 16.04 on all desktops and servers too. I'm glad it's working now.

@scossu
Copy link
Author

scossu commented Mar 25, 2019

See reply from Ubuntu maintainers:

This someone to volunteer an update. Please see
https://wiki.ubuntu.com/StableReleaseUpdates for details. A wholesale
update to the latest upstream stable release may or may not be
acceptable depending on the changes involved. A cherry-pick of the
specific bugfix required is more likely to be acceptable and therefore
easier to land.

They have a pretty high bar for upgrading packages for a stable release, however I think that this upgrade would fall in the "severe regression" case in the linked guidelines, because the current Ubuntu package is pretty much unusable and may cause system instability. I'll take a stab at it, can't guarantee how far this goes. If a cherry-pick is needed, that would definitely be beyond my capacity.

@scossu
Copy link
Author

scossu commented May 9, 2019

Unfortunately I have to report that there is still a memory leak by just opening and saving a file.

Minimum program reproducing the issue: https://gist.github.com/scossu/cdde59337ff27f4436eafe23a61142b2

You can notice that the memory usage increases pretty sharply initially, then slows down to a creep but it's still there. It is enough to crash the program in a 32Gb machine when given enough, and large enough, images.

I have tried removing lines from the test program. Removing the jpeg compression seemed to slow the leak down somewhat, and removing tiffsave reduced it even further, but even just new_from_buffer seems to leak.

To make sure I am interpreting the results from the resource module correctly I tried the same with PIL.Image and the memory usage remains perfectly constant across loops.

@jcupitt
Copy link
Member

jcupitt commented May 10, 2019

Hi again @scossu,

I tried your program on this laptop with a 1450x2048 jpg image and 10000 iterations. I see:

x

As you say, memory use rises at the beginning, but stabilises. I don't think there's a leak, just memory fragmentation.

This is with 8.8, I'll try 8.7, there have been a couple of memory fixes.

@jcupitt
Copy link
Member

jcupitt commented May 10, 2019

8.7 was very similar: it was at a steady 58mb after 10,000 iterations.

@scossu
Copy link
Author

scossu commented May 10, 2019

@jcupitt Thanks for checking. My images are in the 5000x7000 range or larger, so that the memory increas may have been larger. After less than 1500 images the whole 32Gb system memory was filled up.

I am working on a version of my program without vips, to verify whether it's that what is causing the issue. I will report back when I have more information.

@jcupitt
Copy link
Member

jcupitt commented May 11, 2019

Oh, interesting, I'll try with a larger image.

@jcupitt
Copy link
Member

jcupitt commented May 11, 2019

Do you have a lot of cores on your test machine, by the way? I'm running on a two-core, four-thread laptop.

@scossu
Copy link
Author

scossu commented May 11, 2019 via email

@jcupitt
Copy link
Member

jcupitt commented May 12, 2019

I tried with a 10,000 x 10,000 pixel jpg and let it run for 10,000 iterations (overnight!):

x

And it stabilised at 180mb after about 3500 loops. That's with git master libvips.

Could there be some other factor? Do some of your images have errors in, for example? We did recently find a few memleaks when handling corrupt images.

@scossu
Copy link
Author

scossu commented May 15, 2019

I did more testing using pyvips and a rough implementation using tifffile. The latter is much less efficient and produces much larger images (probably because of LZMA compression), but memory usage is stable over time.

2019-05-15_1649x633

Pyvips has a pronounced buildup:

2019-05-15_1658x629

The two images above are snapshots of 1 hour of processing, but the trend remains the same for both for several hours (i.e. with vips memory usage increases linearly).

One more test I can do is using pyvips single-threaded in a simple loop, but that will be 8 times slower and I have to let it run overnight in order to gather enough data.

@scossu
Copy link
Author

scossu commented May 16, 2019

This is a single-threaded run of only 144 images (unfortunately a I/O error raised an exception, but you can still see the trend):

2019-05-16_1347x331

@jcupitt
Copy link
Member

jcupitt commented May 16, 2019

I tried with the python3 image on dockerhub and I see growth there with your test program. That's using a very old libvips, though.

I'll see if I can make a Dockerfile for pyvips with current stable libvips on AWS.

@jcupitt
Copy link
Member

jcupitt commented May 16, 2019

Here's pyvips 2.1.5 (current stable) plus libvips 8.7.4 (current stable) running on amazonlinux 2018.03:

https://github.com/jcupitt/docker-builds/tree/master/pyvips-aws-soak

Build with:

docker pull amazonlinux:2018.03
docker build -t pyvips-amazonlinux:2018.03 .

Run with:

docker run --rm -it -v $PWD:/data pyvips-amazonlinux:2018.03 \
  python3 /data/soak.py Gugg_coloured.jpg x.tif

I see:

$ docker run --rm -it -v $PWD:/data pyvips-amazonlinux:2018.03 \
>   python3 /data/soak.py Gugg_coloured.jpg x.tif
0, 27564
1, 29600
2, 31060
3, 32396
4, 34716
5, 34892
...
996, 41840
997, 41840
998, 41840
999, 41840
1000, 41840
1001, 41840
1002, 41840
...
1996, 43096
1997, 43096
1998, 43096
1999, 43096
2000, 43096
2001, 43096
2002, 43096
...
2998, 44084
2999, 44084
3000, 44084
3001, 44084
3002, 44084
3003, 44084
..
3997, 44084
3998, 44084
3999, 44084
4000, 44084
4001, 44084
4002, 44084
4003, 44084
...

Could you be picking up an old version of libvips somehow?

@jcupitt
Copy link
Member

jcupitt commented May 16, 2019

You can find out the version of libvips that pyvips has linked to like this:

$ docker run --rm -it -v $PWD:/data pyvips-amazonlinux:2018.03 
bash-4.2# python3
Python 3.6.8 (default, Mar 18 2019, 18:57:19) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> pyvips.version(0)
8
>>> pyvips.version(1)
7
>>> pyvips.version(2)
4

@scossu
Copy link
Author

scossu commented May 16, 2019

I have the same versions on the server I ran my tests on, pyvips 2.1.5 and vips 8.7.4.

@scossu
Copy link
Author

scossu commented May 16, 2019

By the way, I am using Python 3.7.3.

@jcupitt
Copy link
Member

jcupitt commented May 17, 2019

Before I try compiling python 3.7.3 from source, could you check that you really are picking up the latest libvips in your tests?

Please try this version of your test program:

#!/usr/bin/python3
  
# usage: soak.py <test-image> <output-image>

import resource
import sys
import pyvips

pyvips.cache_set_max(0)

print("linked to libvips {}.{}.{}"
    .format(pyvips.version(0), pyvips.version(1), pyvips.version(2)) )

for i in range(10000):
    with open(sys.argv[1], "rb") as fh:
        img = pyvips.Image.new_from_buffer(fh.read(), "", access="sequential")

    kw = {
        "compression": "jpeg",
        "Q": 90,
        "tile": True,
        "tile_width": 256,
        "tile_height": 256,
        "pyramid": True,
        "bigtiff": True,
    }

    img.tiffsave(sys.argv[2], **kw)

    print("{}, {}"
        .format(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))

And verify that you see a leak.

@scossu
Copy link
Author

scossu commented May 22, 2019

You don't have a pre-compiled Python 3.7? I found it for Ubuntu 14 LTS.

Your script output:

linked to libvips 8.7.4
0, 171652
1, 260668
2, 276420
3, 289632
4, 289632
5, 296332
6, 305520
7, 309664
8, 315720
9, 321668
10, 324772
[...]
100, 363932
[...]
200, 363932
[...]
500, 372800
[...]
1000, 374840
[...]
2000, 376312
[...]

The memory increase is slow, but with larger images and multiple processes it consumes large quantities of RAM, even if I kill processes after only one job.

I am not yet sure about my multiprocess handling not releasing memory, but this does not seem to happen with other implementations.

@jcupitt
Copy link
Member

jcupitt commented May 22, 2019

You might need to let it run a little longer -- for me, memory did not completely stabilize until ~3500 iterations.

libvips 8.8 is out now, by the way. It fixes a couple of leaks when images contain errors.

I'll try with py 3.7.

@jcupitt
Copy link
Member

jcupitt commented May 22, 2019

Here's pyvips 2.1.6, libvips 8.8, python 3.7.3 running on alpine:

https://github.com/jcupitt/docker-builds/tree/master/pyvips-python3.7-alpine

I see:

x

It's impressive how much lower memuse is on alpine. It seems stable too.

@scossu
Copy link
Author

scossu commented May 22, 2019

It could well be a problem with my code that only shows up with pyvips then. I will upgrade to v8.8 and retry. For now I have been able to keep memory in check by killing processes after each job, but at a higher scale it may not work. Thanks for all the testing.

@scossu
Copy link
Author

scossu commented Jun 5, 2019

I have been chasing this issue down for some time, and the issue seems to be with the pyramid option of the save command.

Script to reproduce (you need to throw a sRGB ICC profile file in the folder): https://gist.github.com/scossu/27111a4429b8b7f9df1af789741d7c3d

Memory graph, run with 8 parallel processes on a 15K x 8K pixel image with 10K iterations:

2019-06-05_858x551

You can notice that memory goes up very quickly, and then bounces up and down, eventually going to a flat line. Sometimes (as in this case) the process goes on for very long and completes, but the script never ends. Some other time the processes go zombie one by one within 15-30 minutes and they stop doing anything.

Processes left at the end of the run:

2019-06-05_831x340

If you comment out the pyramid option the graph changes drastically:

2019-06-05_856x489

I believe the issue may be somewhere along these lines: https://github.com/libvips/libvips/blob/master/libvips/foreign/vips2tiff.c#L1737-L1761 but I haven't investigated further.

I'll change the title of the issue to be more relevant.

@scossu scossu changed the title Memory leak when opening with access='sequential' and saving with JPEG compression Memory leak when opening when saving pyramid files Jun 5, 2019
@scossu scossu changed the title Memory leak when opening when saving pyramid files Memory leak when saving pyramid files Jun 5, 2019
@jcupitt
Copy link
Member

jcupitt commented Jun 5, 2019

Hi @scossu,

Thanks for the detailed report. I'll have a look here.

@jcupitt
Copy link
Member

jcupitt commented Jun 6, 2019

I tried a slight variant of your code:

#!/usr/bin/python3

# Use: test_vips.py image_path iterations

import sys
import pyvips as vips

vips.leak_set(True)
vips.cache_set_max(0)

def _process(fname):
    img = vips.Image.new_from_file(fname, access="sequential")
    img.tiffsave_buffer(compression="jpeg", tile=True, pyramid=True)

if __name__ == "__main__":
    fname = sys.argv[1]
    ct = int(sys.argv[2])

    for i in range(ct):
        print(f"Processing image #{i} ...")
        _process(fname)

And I see memory growth here too.

It looks like libvips itself isn't leaking, I always get 5mb of pixel buffer use however many iterations I do, but do see a steady MB or two per iteration. Perhaps it's in tiff memory output? I'll investigate.

Thanks for reporting this!

@jcupitt
Copy link
Member

jcupitt commented Jun 6, 2019

Huh, it was a silly error, we were not freeing layers other than the top layer in TIFF memory output to pyramid. This program now runs in a steady 50mb:

/* compile with
 *
 *      gcc -g -Wall leak.c `pkg-config vips --cflags --libs`
 */

#include <vips/vips.h>

int
main( int argc, char **argv )
{
        int i;

        if( VIPS_INIT( argv[0] ) )
                vips_error_exit( NULL );

        vips_cache_set_max( 0 );
        vips_leak_set( TRUE );

        for( i = 0; i < atoi( argv[2] ); i++ ) {
                VipsImage *im;
                void *buf;
                size_t length;

                printf( "loop %d ...\n", i );

                if( !(im = vips_image_new_from_file( argv[1], 
                        "access", VIPS_ACCESS_SEQUENTIAL, 
                        NULL )) )
                        vips_error_exit( NULL );
                if( vips_image_write_to_buffer( im, ".tif", &buf, &length,
                        "compression", VIPS_FOREIGN_TIFF_COMPRESSION_JPEG,
                        "pyramid", TRUE,
                        "tile", TRUE,
                        NULL ) )
                        vips_error_exit( NULL );
                g_object_unref( im );
                g_free( buf );
        }

        return( 0 );
}

Thanks! This will be in 8.8.1.

@jcupitt
Copy link
Member

jcupitt commented Jun 6, 2019

The commit: libvips/libvips@dc16f12

@scossu
Copy link
Author

scossu commented Jun 6, 2019

That's great news! I will pull the commit and test in my framework.

@scossu
Copy link
Author

scossu commented Jun 6, 2019

This is more like it:

2019-06-06_1343x610

1.3Tpx processed in 70', with ridiculously small memory footprint. Beautiful.

Thanks for the fix! Looking forward to the release.

@scossu scossu closed this as completed Jun 6, 2019
@jcupitt
Copy link
Member

jcupitt commented Jun 6, 2019

And thanks to you for pushing on this.

@scossu
Copy link
Author

scossu commented Jul 3, 2019

@jcupitt Any word on releasing libvips 8.8.1?

@sixtyfive
Copy link

For anyone googling for how to deal with extracting the pages from large PDF files as images, this Issue got me on the right path. The cache doesn't have to be disabled completely, just limited. Example script here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants