Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image.crop is not thread safe #4848

Closed
lanzkron opened this issue Aug 10, 2020 · 8 comments
Closed

Image.crop is not thread safe #4848

lanzkron opened this issue Aug 10, 2020 · 8 comments

Comments

@lanzkron
Copy link

What did you do?

Try to run several Image.crop actions concurrently.

What did you expect to happen?

Success

What actually happened?

Got some errors (here's a partial list)

  • unrecognized data stream contents when reading image file
  • 'NoneType' object has no attribute 'im_text'
  • 'NoneType' object has no attribute 'read'
  • broken PNG file (chunk b'yw^.')

And even a crash of the Python process.

What are your OS, Python and Pillow versions?

  • OS: Windows 10
  • Python: 3.7.5
  • Pillow: 7.2.0
from PIL import Image
from concurrent import futures
import threading

lock = threading.Lock()
def crop_image(image, x, y, width, height, use_lock):
    if use_lock:
        with lock:
            return image.crop((x, y, x + width, y + height))
    return image.crop((x, y, x + width, y + height))

try:
    image = Image.open("./image.png")
    with futures.ThreadPoolExecutor() as executor:
        futures = [executor.submit(crop_image, image, i, i, 10, 10, False) for i in range(10)]

    cropped = [future.result() for future in futures]

    for i in range(len(cropped)):
        cropped[i].save(f"output-{i}.png", "PNG")

except Exception as ex:
    print(f"Exception: {ex}")
@radarhere
Copy link
Member

Hi. When I run your code, I mostly see "'NoneType' object has no attribute 'read'". I did get a segfault once.

If I insert a line to load the image,

try:
    image = Image.open("./image.png")
    image.load()

Then I find that the problem is solved. The reason for this would be that Pillow does not finish reading the file from disk after Image.open. You may be interested in https://pillow.readthedocs.io/en/stable/reference/open_files.html#image-lifecycle

If you'd like to talk about other errors, please attach the image that you're using.

@lanzkron
Copy link
Author

Thanks @radarhere this solved the issue for me too.

Basically you're saying that load isn't thread-safe ;)

I would personally still consider this an issue but if you think it's acceptable we can close it.
Perhaps something should be added.

BTW, is calling close necessary or will it be called automatically?

@radarhere
Copy link
Member

radarhere commented Aug 10, 2020

close will not be called automatically, as of Pillow 7.0.0.

I wouldn't say that load isn't thread safe. The issue is that in Image.open you're opening a file handler, and then in load - or crop, since crop calls load within Pillow - you're closing it. You're passing the same open file handler into different threads, but once one thread closes the file handler, the remaining threads raise an error because they're now dealing with a closed file handler.

So an alternative way of fixing the problem would be to open the image within the thread -

try:
    with futures.ThreadPoolExecutor() as executor:
        def x(i):
            image = Image.open("/Users/andrewmurray/Dropbox/docs/pillow/Pillow/Tests/images/hopper.png")
            return executor.submit(crop_image, image, i, i, 10, 10, False)
        futures = [x(i) for i in range(10)]

    cropped = [future.result() for future in futures]

    for i in range(len(cropped)):
        cropped[i].save(f"output-{i}.png", "PNG")

except Exception as ex:
    print(f"Exception: {ex}")

@lanzkron
Copy link
Author

Obviously I could open the image in the thread but I wanted to do common work once (under the assumption that it's more performant than opening the image multiple times)

Now I'm confused about resource management, do I have to use images in context managers? The code you just posted doesn't do so (nor call close explicitly).

I see that load says:

This function is only required to close images that have not
had their file read and closed by the
:py:meth:~PIL.Image.Image.load method. See
:ref:file-handling for more information.

If I load all images I create do I still need to use a context manager? If I don't call close, will a file handle be leaked for every image I open and don't load?

@radarhere
Copy link
Member

radarhere commented Jan 30, 2021

To be clear, the documentation you quoted comes from close() -

Pillow/src/PIL/Image.py

Lines 582 to 592 in 27a8852

def close(self):
"""
Closes the file pointer, if possible.
This operation will destroy the image core and release its memory.
The image data will be unusable afterward.
This function is only required to close images that have not
had their file read and closed by the
:py:meth:`~PIL.Image.Image.load` method. See
:ref:`file-handling` for more information.

You should use either a context manager or close(), yes. The fact that I didn't should be considered a mistake.

https://pillow.readthedocs.io/en/stable/reference/open_files.html

Users of the library should use a context manager or call Image.Image.close() on any image opened with a filename or Path object to ensure that the underlying file is closed.

@radarhere
Copy link
Member

If I load all images I create do I still need to use a context manager? If I don't call close, will a file handle be leaked for every image I open and don't load?

Testing, I don't find that your particular code leaks multiple file handles, only one.
I also find that your code does not leak a file handle for PNGs, which would be because of load().

However, https://pillow.readthedocs.io/en/stable/reference/open_files.html#proposed-file-handling

Image.Image.load() should close the image file, unless there are multiple frames.

So it does leak a file handle if I change the format to SPIDER.

I've created #5239 to clarify the documentation.

@radarhere
Copy link
Member

#5239 has been merged, changing the close() docstring to

Pillow/src/PIL/Image.py

Lines 582 to 592 in 023dbe3

def close(self):
"""
Closes the file pointer, if possible.
This operation will destroy the image core and release its memory.
The image data will be unusable afterward.
This function is required to close images that have multiple frames or
have not had their file read and closed by the
:py:meth:`~PIL.Image.Image.load` method. See :ref:`file-handling` for
more information.

@radarhere
Copy link
Member

As to why a context manager or close() should be used if load() closes the file handler, I think this is a stylistic declaration rather than a practical one.

If you look at #3577, part of the reasoning is to 'close the file handler at a specified point in the user's code'.

If you have further questions, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants