Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious performance issues in 12.2 (AWS S3) #351

Closed
pySilver opened this issue Jan 21, 2015 · 19 comments
Closed

Serious performance issues in 12.2 (AWS S3) #351

pySilver opened this issue Jan 21, 2015 · 19 comments

Comments

@pySilver
Copy link

Hello,

There is a serious performance issue in current 12.2 branch. Problem is that we have ~2M thumbnails, so if execution goes here (marked line):

# /sorl/thumbnail/base.py:101 (get_thumbnail)
        # We have to check exists() because the Storage backend does not
        # overwrite in some implementations.
        if not thumbnail.exists(): # <---- This is the root of the problem!
            try:
                source_image = default.engine.get_image(source)
            except IOError:
                if settings.THUMBNAIL_DUMMY:
                    return DummyImageFile(geometry_string)
                else:
                    # if S3Storage says file doesn't exist remotely, don't try to
                    # create it and exit early.
                    # Will return working empty image type; 404'd image
                    logger.warn(text_type('Remote file [%s] at [%s] does not exist'),
                                file_, geometry_string)

                    return thumbnail

            # We might as well set the size since we have the image in memory
            image_info = default.engine.get_image_info(source_image)
            options['image_info'] = image_info
            size = default.engine.get_image_size(source_image)
            source.set_size(size)

            try:
                self._create_thumbnail(source_image, geometry_string, options,
                                       thumbnail)
                self._create_alternative_resolutions(source_image, geometry_string,
                                                     options, thumbnail.name)
            finally:
                default.engine.cleanup(source_image)

it is actually asks boto to return LIST of all stored thumbnails (without even using prefix), so appliction hangs with 100% CPU and high memory usage (well, not a surprise actually).

Wouldn't it be better to provide a prefix for lookup (constructed with the same function as used to store thumbnail) ?

In a mean time we've had to revert to 11.12.1b which works better.

This is related to a fix introduced in #92

@camilonova
Copy link
Member

It's so broken.

I had to revert to sorl-thumbnail==12.1c to make it work.

@mariocesar
Copy link
Collaborator

This is directly related on the storage implementation, exists is needed to make sure we overwrite the thumbnail file.

We have a possible solutions, Add a THUMBNAIL_FORCE_OVERWRITE, by default set to False, where if set to true it will look for only regenerate a thumbnail if one with the same name doesn't already exist in storage.

Especially we can not fix a bad storage implementation, where .exists() is slow, this feature was added because to avoid hit the storage for every thumbnail creation, is fast for some storage implementations and slow to another

cc: @relekang

@mariocesar
Copy link
Collaborator

@camilonova A quote from the #92 about this issue from

"exists() is extremely slow when using S3boto as a backend, so I've make the assumption that if the thumbnail is not cached, it doesn't exist" I am not going to merge that, better use a custom backend if you like - or add option not to check or something. --sorl

would you still want to provide a solution in the sorl-thumbnail code base, or solve it on your storage implementation?

@camilonova
Copy link
Member

@mariocesar I'm sure we can set to override the thumbnail if does not exist in cache, only when we are using S3.

If you agree is a good idea, let me know where and I'll work on a PR.

@mariocesar
Copy link
Collaborator

I already start this https://github.com/mariocesar/sorl-thumbnail/tree/features/force_overwrite, please test it an look how can be improved.

Will be great to have tests for this

Here is the commit that introduces the new setting eb569ad

@camilonova
Copy link
Member

Looks nice, the best default value for the new setting will be True if is using S3, that will make it easier for people who don't read the manual.

@camilonova
Copy link
Member

Also will be nice to have a test were we can make sure (for future changes) this will be the behavior from now on.

@camilonova
Copy link
Member

@mariocesar ping

@a115
Copy link

a115 commented Jul 6, 2015

Is a fix for this planned to be released anytime soon?

mariocesar added a commit that referenced this issue Jul 8, 2015
@mariocesar
Copy link
Collaborator

There is already THUMBNAIL_FORCE_OVERWRITE Is not smart to set by default to True as it's a fix very specific for S3, but I added a special note in the docs about it.

@bartels
Copy link

bartels commented Aug 11, 2015

Hi @mariocesar

I see the the README mentions THUMBNAIL_FORCE_OVERWRITE setting, but I can't find it anywhere in the source code. Was it removed?

@mariocesar
Copy link
Collaborator

@bartels you are right, give me some time I will dig into this

@mariocesar mariocesar reopened this Aug 11, 2015
@variable
Copy link

Ping, just curious what status is for this issue?

@haakenlid
Copy link
Contributor

I've created a pull request to reintroduce the THUMBNAIL_FORCE_OVERWRITE commit, since it seems to have unintentionally vanished from the repo.

@mariocesar
Copy link
Collaborator

Thanks @haakenlid !

@marco-lavagnino
Copy link

What if instead of using storage.exists() you used storage.open() and catched errors if not found?

@iddqd1
Copy link

iddqd1 commented Jun 5, 2019

It is slow latest boto 2 and sorl. It makes a lot of exists even THUMBNAIL_FORCE_OVERWRITE is true.

@pfcodes
Copy link

pfcodes commented Oct 3, 2022

I see that THUMBNAIL_FORCE_OVERWRITE is supposed to be the solution to this. But what is the negative and/or side-effect of setting this variable to true?

@sowinski
Copy link

Hi,

I use version 12.9.0 with S3 and THUMBNAIL_FORCE_OVERWRITE=True and performance is terrible. It takes minutes to load a page.

How can I debug the problem? I do not understand what is taking so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests