Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Proposal for limiting the number of crop/resize versions which can be created from one image #202

Closed
wants to merge 6 commits into from

4 participants

@Chris--S
Collaborator

Mitigate against DDOS attempts by requesting fetch create large numbers of resize/cropped images potentially causing very high CPU load and filling up disk space.

Currently the code includes a hard coded limit - I set it at 20. Possibly this should be an advanced config setting or a defined constant.

There is one issue with this implementation, if one page requires multiple copies of the same resized image, the first view of that page, when the image resizing takes place, may result in the second and subsequent fetch requests returning the placeholder cachefile before its replaced with an actual image. Currently, the multiple copy requests would result in all of them resizing the image, generating the same result once each.

A placeholder cache file is necessary to avoid problems with large numbers of essentially simultaneous requests and the slow image resize/crop process. I.e. the version count check happens when the version count is below the limit, but there are actually sufficient resizes in process to go above the limit.

@michitux
Collaborator

As far as I remember there are performance issues when glob is used in large directories on certain file systems like NFS (for this reason the readdircache configuration option was introduced). Maybe we could rather create a list of all versions that exist of a certain image file? This list could also be used in order to server the next larger size (instead of the original image) when no new cached versions can be created. I'm not really a fan of limiting the number of resized versions of media files as long as we don't have any automatic cleanup of the cache.

Concerning the "placeholder" cache image: won't this create problems if the browser caches the empty file as the resized image could have the same modification time? And I think when the resize operation fails, the empty cache will stay and be used until it expires.

In order to avoid DDOS attacks I rather suggest to include the same hash that is also used for external files in the request URL for resized or cropped images (and the hash should include the resize parameters). This will prevent the problem if the attacker doesn't have edit permissions. If the attacker has edit permissions there are also other ways to create a high CPU load (like "previewing" very large pages with large tables etc.) and to fill the disk space by creating a huge amount of pages.

@Chris--S
Collaborator

My instinct (no hard evidence) is that the image resize would still require more time than the glob. Plus while it might be effective to store pages & media on nfs, I don't see why anyone would put the cache there, a memdisk maybe, not a remote one.

Aren't large tables an issue for the client rather than the server? But point taken, I'd use lots of syntax highlighting.

I do like the idea of keeping the versions somewhere - real metadata for media, but that's something for another version. I'll look at hashing the parameters.

@michitux
Collaborator

The cache can become relatively large in larger wikis so you would need to implement a custom cleanup routine so I'm not sure if a memdisk is really used for the cache. Concerning large tables I was referring to FS#2004 though I'm not sure if high memory usage equals a high CPU load in this case.

Maybe @YoBoY can tell us more about the performance of glob and if this is an issue in the cache in his setup. We had more usages of glob in the past, for example for finding all meta files during page deletion, see FS#2301.

[Edit] Even if the resize should be faster than the glob, we should remember that at least in the current code this glob can be triggered by simply specifying another image resolution which means that then the DDOS attack wouldn't target the resize code but glob.

@YoBoY

Last time I checked (when the glob function was introduced) it was bad, and I'm still using my hack to avoid it.

[Edit] And we don't use NFS anymore…

@Chris--S
Collaborator

There are still two elsewhere in dokuwiki, one in simplepie and the other in pageutils.

YoBoY, do you have your cache on a remote disk?
michitux, my point is that a cache needs to be closer/faster, there is little point in making it slower/further away.

Anyway, are we comfortable with a token which would remove the possibility of external attacks on fetch or should we aim to have some limit on the number of resize/crop versions of an image, but a better solution than glob?

@michitux
Collaborator

As YoBoY has added to his comment, they don't use NFS anymore. That usage of glob() in inc/pageutils.php isn't used anymore as metaFiles() isn't used anymore, see c5f9274. The one in SimplePie is just over a small list of (distributed) files so this shouldn't be any problem.

A suggestion of YoBoY in the irc channel was to use a set of predefined image sizes. We already have predefined image sizes in the media manager. So maybe we could simply create a list of valid sizes (that can be extended/disabled in the configuration) and serve the next matching size for all other sizes?

@Chris--S
Collaborator

I don't much like the idea of preset image sizes.

A work around for glob, would be to use a file, say cache/hash.resize.txt or mediaMetaFN($id, 'resizes'), to list the existing resizes. I lean towards the cache file as any cache clean up is likely to remove both resized images and the resize list.

@Chris--S
Collaborator

I've created PR#203 with just the token code to separate it from all the version limiting source
98da068

@splitbrain
Owner

@Chris--S could you merge master into this branch, so we can see the differences to what is in place currently easier?

@Chris--S Chris--S closed this
@splitbrain splitbrain deleted the fetchissues2 branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 157 additions and 4 deletions.
  1. +4 −0 inc/common.php
  2. +147 −2 inc/media.php
  3. +6 −2 lib/exe/fetch.php
View
4 inc/common.php
@@ -436,6 +436,10 @@ function exportlink($id = '', $format = 'raw', $more = '', $abs = false, $sep =
function ml($id = '', $more = '', $direct = true, $sep = '&', $abs = false) {
global $conf;
if(is_array($more)) {
+ // add token for resized images
+ if($more['w'] || $more['h']){
+ $more['tok'] = media_get_token($id,$more['w'],$more['h']);
+ }
// strip defaults for shorter URLs
if(isset($more['cache']) && $more['cache'] == 'cache') unset($more['cache']);
if(!$more['w']) unset($more['w']);
View
149 inc/media.php
@@ -1796,9 +1796,15 @@ function media_resize_image($file, $ext, $w, $h=0){
if($w > 2000 || $h > 2000) return $file;
//cache
- $local = getCacheName($file,'.media.'.$w.'x'.$h.'.'.$ext);
+ $basename = getCacheName($file);
+ $local = $basename.'.media.'.$w.'x'.$h.'.'.$ext;
$mtime = @filemtime($local); // 0 if not exists
+ if (!$mtime && !media_reserve_version($basename,$local)){
+ // unable to reserve a file for a new version, return the original image
+ return $file;
+ }
+
if( $mtime > filemtime($file) ||
media_resize_imageIM($ext,$file,$info[0],$info[1],$local,$w,$h) ||
media_resize_imageGD($ext,$file,$info[0],$info[1],$local,$w,$h) ){
@@ -1828,6 +1834,13 @@ function media_crop_image($file, $ext, $w, $h=0){
// calculate crop size
$fr = $info[0]/$info[1];
$tr = $w/$h;
+
+ // check if the crop can be handled completely by resize,
+ // i.e. the specified width & height match the aspect ratio of the source image
+ if ($w == round($h*$fr)) {
+ return media_resize_image($file, $ext, $w);
+ }
+
if($tr >= 1){
if($tr > $fr){
$cw = $info[0];
@@ -1850,9 +1863,15 @@ function media_crop_image($file, $ext, $w, $h=0){
$cy = (int) (($info[1]-$ch)/3);
//cache
- $local = getCacheName($file,'.media.'.$cw.'x'.$ch.'.crop.'.$ext);
+ $basename = getCacheName($file);
+ $local = $basename.'.media.'.$cw.'x'.$ch.'.crop.'.$ext;
$mtime = @filemtime($local); // 0 if not exists
+ if (!$mtime && !media_reserve_version($basename,$local)){
+ // unable to reserve a file for a new version, return the original image
+ return $file;
+ }
+
if( $mtime > @filemtime($file) ||
media_crop_imageIM($ext,$file,$info[0],$info[1],$local,$cw,$ch,$cx,$cy) ||
media_resize_imageGD($ext,$file,$cw,$ch,$local,$cw,$ch,$cx,$cy) ){
@@ -1865,6 +1884,132 @@ function media_crop_image($file, $ext, $w, $h=0){
}
/**
+ * Reserve a cache file for a new version of an image
+ *
+ * its necessary to reserve, using touch(), a placeholder for the new
+ * resize/crop version before the resize/crop operation to ensure
+ * consistency with any other nearly simultaneous fetch requests for
+ * resizes/crops of the same source image.
+ *
+ * @param string $basename base name of the cache file used for the image versions (a hash on the source image)
+ * @param string $version complete path to the cachefile to be used for this version
+ * @return bool true if the cachefile could be reserved
+ *
+ * @author Christopher Smith <chris@jalakai.co.uk>
+ */
+define(MEDIA_VERSION_LIMIT, 20);
+define(MEDIA_VERSION_LIST_EXT, '.versions');
+
+function media_reserve_version($basename,$version){
+ global $conf;
+
+ $version_list = $basename.MEDIA_VERSION_LIST_EXT; // name of the file containing the current list of versions of the image
+ $this_version = $version.' '.time()."\n"; // version list line format: {versionfilepath} {timestamp}
+ $can_reserve = false;
+
+ io_lock($version_list);
+ $version_list_changed = false;
+
+ $versions = @file($version_list);
+ if (!is_array($versions)) $versions = array();
+ $version_count = count($versions);
+
+ if ($version_count >= MEDIA_VERSION_LIMIT){
+ $stale = time() - max($conf['cachetime'],3600);
+
+ foreach ($versions as $i => $line){
+ list($cachefile, $timestamp) = preg_split('/ (?=[^ ]*$)/',trim($line),2); // split at last space
+
+ // test stale - assume non-stale files exist to avoid lots of unnecessary file accesses
+ // (the version list is in the cache, so emptying the cache will remove it, and
+ // presumably anything more specific will only remove cachefiles older than cachetime)
+ if ($stale < $timestamp) continue;
+
+ // version file says "stale", re-check against the actual file
+ $mtime = @filemtime($cachefile);
+ if ($stale < $mtime) continue;
+
+ // remove the cachefile, if it exists (mtime not false)
+ if ($mtime === false || @media_unlink_version($cachefile)){
+ unset($versions[$i]);
+ $version_list_changed = true;
+ --$version_count;
+ }
+
+ // only do the minimum necessary, stale files could still be valid and useful
+ if ($version_count < MEDIA_VERSION_LIMIT) break;
+ }
+ }
+
+ if ($version_count < MEDIA_VERSION_LIMIT) {
+ $can_reserve = true;
+ touch($version);
+
+ $versions[] = $this_version;
+ $version_list_changed = true;
+ }
+
+ if ($version_list_changed) {
+ file_put_contents($version_list, join('',$versions));
+ }
+
+ io_unlock($version_list);
+ return $can_reserve;
+}
+
+/**
+ * delete a resized/cropped image version
+ * and for crops, look for and delete any derived resize versions and their version list
+ *
+ * @param string $version path to version file to be deleted
+ * @return bool success deleting $version
+ *
+ * @author Christopher Smith <chris@jalakai.co.uk>
+ */
+function media_unlink_version($version){
+ if (strpos($version,'.crop.') !== false){
+ $basename = getCacheName($version);
+ $crop_version_list = $basename.MEDIA_VERSION_LIST_EXT;
+ $crop_versions = @file($crop_version_list);
+
+ // if $crop_version_list exists, $crop_versions will be an array
+ if (is_array($crop_versions)) {
+ foreach ($crop_versions as $line) {
+ list($cachefile, $timestamp) = preg_split('/ (?=[^ ]*$)/',trim($line),2); // split at last space
+ @unlink($cachefile);
+ }
+ @unlink($crop_version_list);
+ }
+ }
+
+ return @unlink($version);
+}
+
+/**
+ * Calculate a token to be used to verify fetch requests for resized or
+ * cropped images have been internally generated - and prevent external
+ * DDOS attacks via fetch
+ *
+ * @param string $id id of the image
+ * @param int $w resize/crop width
+ * @param int $h resize/crop height
+ *
+ * @author Christopher Smith <chris@jalakai.co.uk>
+ */
+function media_get_token($id,$w,$h){
+ // token is only required for modified images
+ if ($w || $h) {
+ $token = auth_cookiesalt().$id;
+ if ($w) $token .= '.'.$w;
+ if ($h) $token .= '.'.$h;
+
+ return substr(md5($token),0,6);
+ }
+
+ return '';
+}
+
+/**
* Download a remote file and return local filename
*
* returns false if download fails. Uses cached file if available and
View
8 lib/exe/fetch.php
@@ -32,7 +32,7 @@
}
// check for permissions, preconditions and cache external files
- list($STATUS, $STATUSMESSAGE) = checkFileStatus($MEDIA, $FILE, $REV);
+ list($STATUS, $STATUSMESSAGE) = checkFileStatus($MEDIA, $FILE, $REV, $WIDTH, $HEIGHT);
// prepare data for plugin events
$data = array(
@@ -180,7 +180,7 @@ function sendFile($file, $mime, $dl, $cache, $public = false) {
* @param $file reference to the file variable
* @returns array(STATUS, STATUSMESSAGE)
*/
-function checkFileStatus(&$media, &$file, $rev = '') {
+function checkFileStatus(&$media, &$file, $rev = '', $width=0, $height=0) {
global $MIME, $EXT, $CACHE, $INPUT;
//media to local file
@@ -200,6 +200,10 @@ function checkFileStatus(&$media, &$file, $rev = '') {
if(empty($media)) {
return array(400, 'Bad request');
}
+ // check token for resized images
+ if (($width || $height) && media_get_token($media, $width, $height) !== $INPUT->str('tok')) {
+ return array(412, 'Precondition Failed');
+ }
//check permissions (namespace only)
if(auth_quickaclcheck(getNS($media).':X') < AUTH_READ) {
Something went wrong with that request. Please try again.