wrapping Image.copy causing an adverse impact on memory use #63

Closed
ptone opened this Issue Nov 10, 2011 · 7 comments

4 participants

@ptone

f570bd0 changed the way images were opened, and wraps the original copy method with one that tries to preserve extra imagekit related info.

however this results in the image data being retained somehow in memory - I'm not sure exactly how or where.

commenting out this line:

img.copy = types.MethodType(_wrap_copy(img.copy), img, img.__class__)

results in a substantially reduced memory footprint

I've created a quick sample app/project to demonstrate this - if you watch the process memory - you can see it climb.

https://github.com/ptone/imagefoobar (note this is using the new Django 1.4/trunk layout)

cc @lettertwo

@ptone

I've spent some time pondering this - it is tricky - exif data is specific only to some jpegs, but the processor pipeline expects to chain just a generic PIL image - which does not always have exif data.

The above referenced commit tries to do this by wrapping copy to include the _getexif method on the copy, however:

  • though still unclear to me why, this is having an adverse impact on memory
  • it doesn't completely solve the problem as this may still be stripped by an earlier processor (say for example reflection), which creates a new image.

I think the processor pipeline design is good, but it requires that there is a simple contract that a processor.process method takes a generic PIL image object and returns the same.

exif data breaks that contract.

there is no reliable way to pass this through the pipeline

I think we need something akin to the autoconvert processor, or even build exif transpose into the autoconvert processor, it needs a first byte at the apple, and in that case ProcessPipleline can handle the image.copy() step.

However it is not entirely clear to me why the pipeline needs to start with a copy of the image, instead of just the image itself?

The other thing being done in the wrapping of the .copy() method is setting of an .app attr - but for the life of me, I can find no other reference to the use of this attribute in imagekit - is it just spurious?

@lettertwo
Collaborator

Indeed, the memory footprint is increased. My guess is that it's a side effect of wrapping a reference to the parent instance into the copy() method on the clone.

However, I don't think this constitutes a memory leak because, as is illustrated below, these reference cycles are cleaned up by the garbage collector.

Here are some memory usage numbers under three different scenarios (a sample was taken on each iteration, then the samples were analyzed after the iteration completed):

Scenario 1

Using the PIL version of copy:

   AVG: 42M
MEDIAN: 42M
  HIGH: 42M
   LOW: 19M

Scenario 2

Using the wrapped version of copy:

   AVG: 172M
MEDIAN: 153M
  HIGH: 315M
   LOW: 20M

Scenario 3

Using the wrapped version of copy, forcing garbage collection each iteration:

   AVG: 42M
MEDIAN: 42M
  HIGH: 42M
   LOW: 20M

The increased memory usage in the second example above is likely a result of the garbage collector simply deferring collection until some later time. The numbers in that second example are markedly higher than those for the PIL implementation of copy (first example), but they do not continually rise, as you would expect in a true memory leak. Rather, they rise to a certain level, then drop, then rise again, then drop to roughly the same level (as is expected when a periodic garbage collection occurs). Here is a snippet of the iterative sampling to illustrate the garbage collection in action:

(iteration 59) MEM: 270.57M
(iteration 60) MEM: 279.38M
(iteration 61) MEM: 288.19M
(iteration 62) MEM: 296.98M
gc: collecting generation 0...
gc: objects in each generation: 922 29 18995
gc: done, 828 unreachable, 0 uncollectable, 0.0625s elapsed.
(iteration 63) MEM: 152.77M
(iteration 64) MEM: 147.59M

And again, later in the cycle:

(iteration 92) MEM: 288.21M
(iteration 93) MEM: 297.01M
(iteration 94) MEM: 305.80M
(iteration 95) MEM: 314.59M
gc: collecting generation 0...
gc: objects in each generation: 978 67 18995
gc: done, 882 unreachable, 0 uncollectable, 0.0668s elapsed.
(iteration 96) MEM: 152.80M
(iteration 97) MEM: 147.62M

The memory levels rise to ~300M, then drop to ~150M repeatedly, which shows that the reference cyles are detected and cleaned up by the garbage collector.

So it seems that the memory footprint is increased by the wrapped version of copy, but it is not leaking memory (thanks to python's garbage collector).

Here is the modified test case i used (you will need to install psutil to run it):

import os, sys, gc
import psutil
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "imagefoobar.settings")
from dummy.models import Dummy
import ImageFile

ImageFile.MAXBLOCK = 1024*1024*5
sample = 'pony.jpg'
Dummy.objects.all().delete()

print 'imagekit'
gc.set_debug(gc.DEBUG_STATS)
process = psutil.Process(os.getpid())
mem_readings = []

for i in xrange(100):
    # gc.collect()
    reading = float(process.get_memory_info()[0]) / 1048576
    mem_readings.append(reading)
    print '(iteration %s) MEM: %.2fM' % (i, reading)
    try:
        d = Dummy()
        f = open(sample, 'rb')
        d.spec_image.save('foo.jpg', f)
    finally:
        if f: f.close()

print
print 'done'
print
print '   AVG: %2.fM' % (sum(mem_readings) / len(mem_readings))
print 'MEDIAN: %2.fM' % sorted(mem_readings)[len(mem_readings)/2]
print '  HIGH: %2.fM' % max(mem_readings)
print '   LOW: %2.fM' % min(mem_readings)

I ran scenario 1 by modifying imagekit.utils.open_image to not wrap the image instance's copy method.

I ran scenario 2 by using the imagekit.utils.open_image implementation.

I ran scenario 3 like scenario 2, but uncommented the line in the test case that forces garbage collection: # gc.collect()

@ptone

You are correct that this is not a leak, and I've updated the issue accordingly. For background, the discovery of this came when trying to use imagekit to batch something on ep.io and the process runner/monitors they used were killing the task due to the high memory use.

Thank you for performing the more detailed analysis. I still consider this adverse situation, and an issue. Using gc would be a workaround - but I still feel there are some non-memory issues with wrapping copy:

  • as pointed out in my followup comment, there is no way to ensure that the customized img instance will be preserved through the pipeline and be available by the time it gets to a processor that would use exif.

  • this is essentially monkeypatching the copied instance with a private method - something dodgy about that.

  • I've not seen how/where this 'app' attribute is used?

I support the idea of using the exif data, and I don't mind using PILs access to it (as compared to requiring another dependency, or reinventing) - but I think it should be as some sort of special pre-pipeline step, and can be done without the ~7x memory increase.

@matthewwithanm

Yeah, the copy-wrapping thing was definitely something that we weren't happy about. This was our attempt to work around a PIL limitation. Our attitude regarding the data not being preserved was basically just "auto transpositions have to be done first"—a restriction which should at the very least be pointed out in the documentation. I'll have to think about this. I can't really see a good way to avoid the kludgy pseudo-monkeypatching just yet.

As far as the "app" attribute, I believe that was just another attribute that @lettertwo found PIL's copy() was stripping.

@lettertwo
Collaborator

Haha, @matthewwithanm beat me to this one by a few mins. Here are my thoughts anyway:

  • there is no way to ensure that the customized img instance will be preserved through the pipeline and be available by the time it gets to a processor that would use exif

Well, that is the nature (and even the point) of the pipeline--to manipulate the data as it moves through. The problem we're addressing with this copy wrapper is that PIL strips the metadata arbitrarily. If a processor chooses to strip it, that is ok.

  • this is essentially monkeypatching the copied instance with a private method

Yes, it is gross. We found this to be the least bad of the approaches we considered when writing it. We'll all have to ponder this one a bit more.

  • I've not seen how/where this 'app' attribute is used?

The app property (as defined by PIL's JpegImagePlugin) is a map of metadata markers commonly found in JPEG files. These common markers include APP0 (indicates metadata conforming to the JFIF standard), APP1 (indicates metadata conforming to the EXIF standard), and COM (indicates text comments embedded in the image).

See Wikipedia's article on JPEG files for further explanation.

While there is seperate access for metadata that conforms to the EXIF standard (the _getexif method), the JFIF standard (frequently used by image editing software) may also encode metadata via these markers (color profiles, embedded thumbnails, etc).

Since we are making an effort to preserve the EXIF metadata, it makes sense to also conserve the JFIF metadata. While none of the processors packaged with ImageKit (so far) make use of JFIF metadata, it's entirely possible that some processor in the future may want access to that metadata during processing, in which case, it should be made available.

@pennersr

I lost a lot of time tracking down excessive memory usage on a server, eventually discovering that this ticket was the cause of it all. A simple gc.collect() after imagekit processing did wonders for us. In order to prevent others from having to go through the same path, I am all 👍 for adding a one-liner work-around as a temporary solution while the proper solution is being researched...

@matthewwithanm

The copy method is no longer wrapped in IK3, so I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment