Mitigate against DDOS attempts by requesting fetch create large numbers of resize/cropped images potentially causing very high CPU load and filling up disk space.
Currently the code includes a hard coded limit - I set it at 20. Possibly this should be an advanced config setting or a defined constant.
There is one issue with this implementation, if one page requires multiple copies of the same resized image, the first view of that page, when the image resizing takes place, may result in the second and subsequent fetch requests returning the placeholder cachefile before its replaced with an actual image. Currently, the multiple copy requests would result in all of them resizing the image, generating the same result once each.
A placeholder cache file is necessary to avoid problems with large numbers of essentially simultaneous requests and the slow image resize/crop process. I.e. the version count check happens when the version count is below the limit, but there are actually sufficient resizes in process to go above the limit.
avoid creating an intermediate crop image when the final image will m…
…atch a resize
limit the number of resize/crop versions that can be created from one…
As far as I remember there are performance issues when glob is used in large directories on certain file systems like NFS (for this reason the readdircache configuration option was introduced). Maybe we could rather create a list of all versions that exist of a certain image file? This list could also be used in order to server the next larger size (instead of the original image) when no new cached versions can be created. I'm not really a fan of limiting the number of resized versions of media files as long as we don't have any automatic cleanup of the cache.
Concerning the "placeholder" cache image: won't this create problems if the browser caches the empty file as the resized image could have the same modification time? And I think when the resize operation fails, the empty cache will stay and be used until it expires.
In order to avoid DDOS attacks I rather suggest to include the same hash that is also used for external files in the request URL for resized or cropped images (and the hash should include the resize parameters). This will prevent the problem if the attacker doesn't have edit permissions. If the attacker has edit permissions there are also other ways to create a high CPU load (like "previewing" very large pages with large tables etc.) and to fill the disk space by creating a huge amount of pages.
My instinct (no hard evidence) is that the image resize would still require more time than the glob. Plus while it might be effective to store pages & media on nfs, I don't see why anyone would put the cache there, a memdisk maybe, not a remote one.
Aren't large tables an issue for the client rather than the server? But point taken, I'd use lots of syntax highlighting.
I do like the idea of keeping the versions somewhere - real metadata for media, but that's something for another version. I'll look at hashing the parameters.
The cache can become relatively large in larger wikis so you would need to implement a custom cleanup routine so I'm not sure if a memdisk is really used for the cache. Concerning large tables I was referring to FS#2004 though I'm not sure if high memory usage equals a high CPU load in this case.
Maybe @YoBoY can tell us more about the performance of glob and if this is an issue in the cache in his setup. We had more usages of glob in the past, for example for finding all meta files during page deletion, see FS#2301.
[Edit] Even if the resize should be faster than the glob, we should remember that at least in the current code this glob can be triggered by simply specifying another image resolution which means that then the DDOS attack wouldn't target the resize code but glob.
Last time I checked (when the glob function was introduced) it was bad, and I'm still using my hack to avoid it.
[Edit] And we don't use NFS anymore…
There are still two elsewhere in dokuwiki, one in simplepie and the other in pageutils.
YoBoY, do you have your cache on a remote disk?
michitux, my point is that a cache needs to be closer/faster, there is little point in making it slower/further away.
Anyway, are we comfortable with a token which would remove the possibility of external attacks on fetch or should we aim to have some limit on the number of resize/crop versions of an image, but a better solution than glob?
As YoBoY has added to his comment, they don't use NFS anymore. That usage of glob() in inc/pageutils.php isn't used anymore as metaFiles() isn't used anymore, see c5f9274. The one in SimplePie is just over a small list of (distributed) files so this shouldn't be any problem.
A suggestion of YoBoY in the irc channel was to use a set of predefined image sizes. We already have predefined image sizes in the media manager. So maybe we could simply create a list of valid sizes (that can be extended/disabled in the configuration) and serve the next matching size for all other sizes?
add a token to fetch urls requiring image resize/crop to prevent exte…
…rnal DDOS via fetch
I don't much like the idea of preset image sizes.
A work around for glob, would be to use a file, say cache/hash.resize.txt or mediaMetaFN($id, 'resizes'), to list the existing resizes. I lean towards the cache file as any cache clean up is likely to remove both resized images and the resize list.
use a version list file rather than glob
when version limit is reached, expel stale versions
change version limit back to a reasonable production value (4 was for…
I've created PR#203 with just the token code to separate it from all the version limiting source
@Chris--S could you merge master into this branch, so we can see the differences to what is in place currently easier?