Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Prevent large images from repeatedly crashing PHP on resize #2753

Closed
wants to merge 1 commit into from

Conversation

kinglozzer
Copy link
Member

This is a (late) follow up to the discussion here about GD crashing when attempting to open large or high DPI images.

The Google Group discussion involved ideas around “image unavailable” icons in the CMS, this setup (by storing which manipulation failed) will hopefully pave the way for that functionality to be added later, while addressing the immediate issue of completely crashing entire sections of websites.

Implements the same method for checking available memory as #2569, though it’s different to the method Ingo suggested. I’m not really sure which is more robust, the method in the php.net comments doesn’t take into account that bits and channels aren’t always present in the image info.

Marked as an API change as I’ve added a few methods to the Image_Backend interface. Have only tested the basic functionality so far, unit tests etc are still todo. Perhaps this would be better opened against master and delayed until 3.2?

Doesn’t cause any issues with the CMS, the images just don’t appear for now:

screen shot 2014-01-02 at 16 15 27
screen shot 2014-01-02 at 16 17 23

All comments welcome.

@jakr
Copy link
Contributor

jakr commented Jan 6, 2014

Hi, thanks for keeping up with this. I actually developed a slightly different solution the last time I played with it, but never got around to creating a pull: jakr@5913ad0

I used a slightly different method for cleaning up. Since we are not really deleting the ImageBackend, I thought we did not have to introduce onBeforeDelete and could just directly pass the filename information to the backend, see cleanupBeforeDelete.

Feel free to mix and match the approaches.

@kinglozzer
Copy link
Member Author

Hi @jakr,

Our solutions are quite similar, I was simply matching the existing naming convention (for onBeforeDelete() vs cleanupBeforeDelete()). I think I prefer 'onBeforeDelete', simply because it’s less specific - other libraries might not ‘clean up’ as such.

Thanks for the work on the memory calculations, I copied that from your original PR :)

@Zauberfisch
Copy link
Contributor

looks very good.

I know, you wrote it's a quick fix, but may I still make 1 feature request:
display some sort of message to the admin that this image is to big, and therefore will not be resized.
(this might be important for a content author, lets say he uploads an 10MB image to the startpage of his website, assuming that the website scales it down to 100kb. the fact that it does not scale down, is going to be a big performance issue in this case)

@kinglozzer
Copy link
Member Author

@Zauberfisch For that example it wouldn’t resample the image, so it wouldn’t display it at all - rather than showing the full size image, it simply won’t show one.

A message of some kind in the CMS is the eventual goal, I might look at addressing that in a separate pull request if I get time. I’ve been trying to think of how best to achieve it without coupling stuff to GD. There are a few different places this will need to be addressed, off the top of my head these include:

  • UploadField
  • AssetAdmin
  • The “Insert Media” dialog (GridField?)

I don’t know how we can manage to display a useful error message (i.e. “Image is too large to be resized, please try a smaller image” instead of “Image unavailable”). Just running ideas from the top of my head here, but perhaps something like:

  • UploadField/GridField/AssetAdmin attempt to get a thumbnail.
  • If the resample call returns false, the image failed - show the new ‘unavailable’ icon.
  • We then call an Image::$backend->diagnose_failure() method on the image to retrieve an error message which could then be displayed as a title attribute or tooltip or something:
public static function diagnose_failure($filename) {
        if ( ! $this->checkAvailableMemory($filename)) {
                return "Image is too large to be resampled. Please try a smaller image";
        } else if($this->failedResample($filename)) {
                return "Image failed to resample - it may be corrupt or too large";
        } else {
                return "Image failed to resample - an unknown error occurred";
        }
}

@kinglozzer
Copy link
Member Author

Updated to avoid problems with image manipulations causing fatal errors after the image has been loaded into memory. I’ve adjusted the ‘lock file’ logic to use a destructor to remove the file: if all is well __destruct() is called and the lock file is removed, but if there’s a fatal error __destruct() is never called and it’ll be spotted next time around.

In some situations, the image would pass memory checks and be loaded into GDBackend fine, but then cause a fatal error when the actual manipulation took place. An extreme example would be $Image.SetWidth(1500000) - if the image is a reasonable size, it would load fine, but then fail on modification.

Thanks to @Zauberfisch for discovering and helping with this.

@jonom
Copy link
Contributor

jonom commented Jan 22, 2014

Hi Loz,

Thanks for working on this. Maybe you already thought of this but may be good to include a Dev task "Remove all image lock files". One case where this could be useful is if the memory limit has been increased so it may be worth having a second attempt at resampling any files which failed on the previous memory limit.

@kinglozzer
Copy link
Member Author

Hi @jonom,

No I’ve not looked at that yet, I was thinking of perhaps integrating it with the FlushGeneratedImagesTask but it’d probably make more sense as its own task.

One thing I’ve thought of is that the lock file names aren’t “modification-specific” - if $Image.SetWidth(100000) fails, the lock file will be left there and $Image.SetWidth(10) will not run, when it might actually be able to run fine. I’m trying to think of a way of letting each method write its own lock file (e.g. .lock.croppedresize300500.my-image.jpg) without repeating the same code snippets inside each modification method.

Perhaps we could pass the cache file name returned by Image->cacheFilename() as a second argument to GD->__construct(), and use that when generating the lock file name. We’d need more logic for the GD->onBeforeDelete() to find each of the resampled lock file names, but I’m sure it’s possible as most of the logic already exists in Image->deleteFormattedImages().

@jonom
Copy link
Contributor

jonom commented Jan 22, 2014

Modification-specific lock files would certainly be handy for tracing the source of any crashes - especially if there was a report that listed all the lock files to be found in the assets folder.

But if it's a lot of work I wonder if it would be worth it. Whenever I've had a GD memory crash in SilverStripe, the cause was always a too-high resolution image that should have been down-sampled before uploading. So I would say the core of the problem is identifying those problematic images so that they can be replaced or removed. In that case a single lock file and a simple "This image may cause problems" indicator in the CMS should get the job done most of the time.

@jakr
Copy link
Contributor

jakr commented Jan 24, 2014

Please don't create any special logic for crashes on images that are just slightly too large and only crash in a single resolution.

  1. These "anti-goldilock" files are so rare that it is not worth creating special logic for them. And how often do you think an editor will say: "My 4000_4000px image crasehed when rescaling to 2000_2000, ok, lets just use the thumbnail size" - to me this makes no sense.
  2. The nasty crashes are those in the backend, when no thumbnail can be created and you have no means of deleting the image again.
    I imagine the "usual" workflow will be: Editor loads up large image, sees it's "broken" in backend, deletes (which deletes the lockfile), rescales, reuploads.

I hope that no site will have hundreds of lockfiles lying about, so I think that even the admin task will be overkill, but if you want to write it, go ahead.

@ajshort
Copy link
Contributor

ajshort commented Jan 24, 2014

Is there a reason to use lock files rather than the caching functionality? You wouldn't really want lock files to get transferred around with your assets.

@Zauberfisch
Copy link
Contributor

@jakr I disagree, I have a site that went live last week that had exactly that problem.

the image itself did not crash. 1 re size did not crash it either, but performing a 2nd operation on that image resulted in a crash for about 3 out of 100 images.
and on a site that is designed for displaying user submitted images in a gallery, this is a huge deal.
because when you display 25 images per page, that would mean that 3 out of 4 pages show you a php fatal error.

@kinglozzer
Copy link
Member Author

@ajshort This was just the functionality discussed on the Google Group, but I’d welcome other suggestions 😃. Can I ask what “caching functionality” refers to? Do you mean using Image->cacheFilename() and/or storing the lock files in the _resampled folder, or something else entirely?

@jakr / @Zauberfisch I agree that hundreds of lock files left over would suck (perhaps less so if they’re stored in the _resampled folder), but surely it’s better than an image is only broken in one place (e.g. wherever $SetWidth(4000,4000) was used) than everywhere that image is used?

@ajshort
Copy link
Contributor

ajshort commented Jan 24, 2014

Check out SS_Cache - using this would also make it easier to clear lock flags.

@kinglozzer
Copy link
Member Author

@ajshort Ah I see, that’s a neat idea. Just to make sure I’m understanding fully, we’d have something like:

  • A cache named ‘Images’/‘GD’
  • -1 lifetime
  • Each image’s file path as a cache key (possibly with a serialized array of manipulations?)

I guess we could also clear the cache on flush then as well.

@Zauberfisch
Copy link
Contributor

I wouldn't clear it on flush, I'd like a separated task better. You often have to flush (new files, changed statics, ...), but I might not always want to clear the image locks

@ajshort
Copy link
Contributor

ajshort commented Jan 24, 2014

Yep - I'd be tempted to hash together the file's name, mtime, and any arguments and just use it as a simple lookup table.

@kinglozzer
Copy link
Member Author

I've had very little time to look at this lately, so I've pushed up what little work I've done on this for comments on the implementation.

On my todo list:

  • Check that setting the cache lifetime to null actually results in a permanent cache; according to Zend's docs it does.
  • Fix up the onBeforeDelete() behaviour to remove the cache for the file.
  • Fix up/improve remaining unit tests
  • Dev task for cache cleaning

@kinglozzer
Copy link
Member Author

Okay, reworked to use SS_Cache instead of lock files. Data is stored in the cache like so:

  • Cache key = md5(implode('_', array($filePath, filemtime($filePath))))
  • Each cache key’s value is a serialized array of manipulations that failed, arguments pipe separated. E.g: CroppedImage|112|44

Hopefully, if I/someone ever gets round to it, that’ll allow the method/arguments to be extracted from the cache to provide devs/CMS users with info as to why the image failed to resample.

Add unit tests, fix incorrect class instanciation

Prevent image manipulation methods from causing repeat GD failures

Use destructor to handle lock file removal

Remove attempts to call parent destructor - there isn't one

First draft of using SS_Cache API

Updated unit tests, minor fixes/tweaks

Updated onBeforeDelete logic and unit tests

Add dev task for clearing cache of image manipulations

Store manipulation method/arguments in more accessible way in cache

API: Prevent large images from repeatedly crashing PHP on resize
@kinglozzer
Copy link
Member Author

Re-opened against master as I feel this is too big a change for 3.1: #2859.

@kinglozzer kinglozzer closed this Feb 17, 2014
@@ -97,6 +97,16 @@ public function hasImageResource() {
}

/**
* @todo Implement memory checking/lock file for Imagick?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, Imagick doesn't seems to have the same memory limitations for resizing.

@jedateach
Copy link
Contributor

Shouldn't the Image_Backend interface constructor definition get updated also?

@kinglozzer
Copy link
Member Author

Hey @jedateach,

This behaviour was merged into master instead in #2859. I did miss out Image_Backend::__construct() in that pull request too, though 😞, so it probably should be fixed. Imagick_Backend::__construct() is also missing the extra argument. I’ll create an issue for it.

@kinglozzer kinglozzer deleted the gd-resize-crash branch November 19, 2014 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants