-
-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close input early for read errors #1370
Comments
This patch seems to work for both random and sequential access: diff --git a/libvips/foreign/tiff2vips.c b/libvips/foreign/tiff2vips.c
index 1111111..2222222 100644
--- a/libvips/foreign/tiff2vips.c
+++ b/libvips/foreign/tiff2vips.c
@@ -557,6 +557,7 @@ rtiff_strip_read( Rtiff *rtiff, int strip, tdata_t buf )
if( length == -1 ) {
vips_foreign_load_invalidate( rtiff->out );
+ rtiff_free( rtiff );
vips_error( "tiff2vips", "%s", _( "read error" ) );
return( -1 );
} |
Oh, nice! This all feels very hacky though :( Perhaps there's a cleaner solution to all these early close problems? I was thinking about the evalstart / eval / evalend signals that are emitted for progress feedback, and the minimise signal that's sent after an eval pipeline has run. Perhaps we could use one of those somehow?
https://github.com/libvips/libvips/blob/master/libvips/conversion/tilecache.c#L350 It's emitted at the end of every threadpool iteration (ie. the end of every large computation): https://github.com/libvips/libvips/blob/master/libvips/iofuncs/threadpool.c#L987 Loaders would need to be prepared to reopen their fds if necessary, but they could maybe close on minimise of their output. |
I made a branch to experiment with using minimise for early close. This signal is emitted by threadpool at the end of a loop over an image and is remitted on all upstream images, so I think it should catch all cases, for example:
It'll early-close It seems to work, but obv. needs more testing. I did tiff, png and jpg. Perhaps other loaders could benefit as well. I think it should be a cleaner and more reliable way of catching end of computation. What do you think? |
We close loaders early in order to save file handles, and on Windows to make sure that files can be deleted as soon as possible. Currently loaders do this by watching the Y coordinate of requests and freeing the fd when the final line of the file is fetched. This is messy and does not always work, since there are cases when the final line is not fetched. Instead, this patch gets loaders to listen for "minimise" on their output and close on that. This signal is emitted on all upstream images whenever a threadpool finishes a scan of an image and is usually used to trim caches after computation. See #1370
I added minimise handlers for gif, heif, rad, webp. The branch is here: https://github.com/libvips/libvips/tree/loader-minimise-experiment |
Awesome! That looks much better and more reliable. I just tested the Output:
The PNG, GIF, HEIC, and EXR (I guess this can be ignored) loaders doesn't early close on truncated files. For PNG, it looks like it's the cause of this return statement: libvips/libvips/foreign/vipspng.c Line 207 in 4f2f4b4
I'm not sure why the other loaders don't work (perhaps the The above log also reveals that |
Oh huh interesting, I'll have a look. Thanks for testing! |
Kleis pointed out a suprious return in png load minimise. see #1370 (comment)
You're right, the logic was all tangled up in png minimise. Nice! I'm puzzled by gif and heif failing too. Do you have your test files somewhere handy? |
Thanks, PNG seems to work fine now! The GIF and HEIC files were created in this way:
The above gist also shows (within the comments) how the files were created. |
we were not closing early on a read error during gif scan see #1370 (comment)
Ooop, sorry, I missed the comment. I think I've fixed GIF and HEIC. They were closing early if there was an error reading pixels, but they were not closing early if the error happened during a scan of the image header. |
Thanks! The GIF and HEIC loaders seems to work fine now:
PDF and EXR still uses the old Y coordinate logic: libvips/libvips/foreign/pdfload.c Lines 401 to 406 in 9373d63
libvips/libvips/foreign/pdfiumload.c Lines 439 to 444 in 9373d63
libvips/libvips/foreign/openexr2vips.c Lines 351 to 354 in 9373d63
libvips/libvips/foreign/openexr2vips.c Line 444 in 9373d63
I guess these loader can use the same minimise logic? Also, do you plan to include this in the 8.8 branch? |
fwiw, these log messages:
Can be safely ignored, it tries to load the damaged image from the fallback loader. The original images ( |
Great! I added pdf/pdfium as well. OpenEXR can't really do this, unfortunately. It would need revising to be a class so that it could get at the access hint and discover if it's in sequential mode. The whole thing needs revising really -- it uses the OEXR C API, which is really poor. It ought to be redone to use the C++ API. I'm not sure anyone uses it, so there's not much point. Re. release: it feels like quite a big change in policy to me, and likely to have some consequences. We could probably use minimise more elsewhere too. Let's test it in master for a while and release in 8.9. |
pdf/pdfium also works with the minimise logic, thanks! EXR images are not widely used, so I don't think it's worth revising the OpenEXR loader (especially with this weather, it's 40 degrees Celsius here 😅). I'm fine with including this in 8.9 instead of 8.8. Perhaps we could revert commit 5e2d66d within the 8.8 branch? The minimise handlers solve this much more neatly, which also works with damaged images. I could try to integrate these commits into the Windows build of NetVips. Performance critical environments are more likely to run on Linux, and this allows us to get feedback at an earlier stage. What do you think of this? |
Heh yes it was 38C in London yesterday. I had to cycle 20 miles :( I've removed that code from shrinkv already, I think. Sure, try it out in NetVips. Early close is important there. I was thinking about other ways we could use this. How about the case where you are assembling 1000s of images with something like arrayjoin? Before, we would close input images when we read out the final line, and that would happen when the output read line swept past them as it built the image. Now though, they won't close until the end of evaluation. This will push memory use up quite a bit. Perhaps operations like insert, composite and arrayjoin need new logic for sequential mode to send minimise signals to their inputs when they have finished with them? |
I did some experiments, but I think the only good solution is close-on-last-line, as we had before. I've put close-on-last back for jpg/tif/png. Together with the new minimise handler, we get nice behaviour for things like:
Even if some of the inputs are damaged. |
One thing we're not doing is freeing the memory associated with the jpg etc. images early. I'll see if there's something that can be added there. |
Nope :( Not a simple thing. Let's merge and close. |
Many thanks for all your work! I'll integrate these commits into the Windows build of NetVips.
It has been removed within the master branch but not within the 8.8 branch, see for e.g. the change log: |
It needs to stay in 8.8 (I think!). We were able to remove it in master when we added support for "minimise". |
Indeed, let's keep that commit in the 8.8 branch. We can always refer users to the master branch where early close is handled more neatly. (I thought it might cause confusion because the early shutdown behaviour in shrinkv doesn't work for damaged images) |
It looks like the recent commits that makes the GIF parser less strict breaks the import logging
import os
import gc
import pyvips
logging.basicConfig(level=logging.DEBUG)
pyvips.cache_set_max(0)
# wget https://github.com/libvips/libvips/raw/master/test/test-suite/images/cogs.gif
# head -c 2000 cogs.gif > truncated.gif
file_name = 'truncated.gif'
try:
im = pyvips.Image.new_from_file(file_name, access=pyvips.Access.SEQUENTIAL)
# im = pyvips.Image.new_from_file(file_name)
im.write_to_file('x.jpg')
except Exception as e:
print(str(e))
finally:
# no effect
gc.collect()
try:
os.remove(file_name)
except:
# being used by another process
print('{0} could not be deleted'.format(file_name))
(tested on Windows with libvips 8.8.2 and the minimise patches applied) |
It works in master (I think). What patches are you applying? All the minimise stuff?
|
I tried your nice C program on master, and I see: $ head -c 2000 cogs.gif > truncated.gif
$ gcc -g -Wall try271.c `pkg-config vips --cflags --libs`
$ ./a.out truncated.gif
processing truncated.gif ...
open files after sniff and header read:
0 1 2 3
vips_foreign_load_gif_minimise:
open files after processing:
0 1 2 3
open files after cleanup:
0 1 2 So it calls $ wget https://raw.githubusercontent.com/lovell/sharp/master/test/fixtures/truncated.jpg
$ ./a.out truncated.jpg
processing truncated.jpg ...
open files after sniff and header read:
0 1 2
open files after processing:
0 1 2
(a.out:28078): VIPS-WARNING **: 14:58:46.086: read gave 2 warnings
(a.out:28078): VIPS-WARNING **: 14:58:46.087: VipsJpeg: Premature end of JPEG file
open files after cleanup:
0 1 2 |
Oh, I totally forgot about that. Yes, you're right, the scanner is obviously leaving something open. I'll have a look. |
I think I fixed it -- it was trying to keep the FILE open between header and load, and just rewinding it. It now has close as a proper vfunc and always closes after header read. With that fd test program I see:
Is this patch easy for you to integrate? I could put it to that minimize branch if that would be simpler. |
We were trying to keep the FILE open for gifload between header and load, but this meant some corrupt GIFs could keep the file open longer than they should. Instead, make close into a vfunc and always close between header and load. see #1370 (comment)
Thanks, it's working properly now! Patch integrated with: libvips/build-win64-mxe@6f20676. |
I thought of a horrible problem :( Consider this code: #!/usr/bin/env php
<?php
require __DIR__ . '/vendor/autoload.php';
use Jcupitt\Vips;
for ($i = 1; $i < count($argv); $i++) {
$image = Vips\Image::newFromFile($argv[$i], [
"n" => -1,
"access" => "sequential"
]);
$page_height = $image->get("page-height");
$n_pages = $image->get("n-pages");
echo($argv[$i] . " has " . $n_pages . " pages\n");
for ($p = 0; $p < $n_pages; $p++) {
echo(" rendering page " . $p . " ...\n");
$page = $image->crop(0, $p * $page_height, $image->width, $page_height);
$page->writeToFile($argv[$i] . "_page_" . $p . ".png");
}
} ie. open a PDF and write the pages out as a set of PNGs. After the first Possible solutions:
Yuk! |
We were using "minimise" to close pdf input early, but this will break programs which make several output images from one sequential input image. For example, loading all pages of a PDF as a toilet-roll image, then saving pages as a set of PNGs. This patch adds vfuncs for open and close, and makes _generate reopen the input if necessary. We will need similar patches for pdfiumload, gifload, gifnsload, tiffload etc. see #1370 (comment)
I guess you'll need to change that example to this: #!/usr/bin/env php
<?php
require __DIR__ . '/vendor/autoload.php';
use Jcupitt\Vips;
for ($i = 1, $count = count($argv); $i < $count; $i++) {
$image = Vips\Image::newFromFile($argv[$i], [
'page' => 0,
'access' => 'sequential'
]);
$n_pages = 1;
if ($image->typeof('n-pages') !== 0) {
$n_pages = $image->get('n-pages');
}
echo $argv[$i] . ' has ' . $n_pages . ' pages' . PHP_EOL;
echo ' rendering page 0 ...' . PHP_EOL;
$image->writeToFile($argv[$i] . '_page_0.png', [
'strip' => true,
]);
for ($p = 1; $p < $n_pages; $p++) {
echo ' rendering page ' . $p . ' ...' . PHP_EOL;
$page = Vips\Image::newFromFile($argv[$i], [
'page' => $p,
'access' => 'sequential'
]);
$page->writeToFile($argv[$i] . '_page_' . $p . '.png', [
'strip' => true,
]);
}
} Otherwise it will not handle multi-size PDF files. See for example: |
Yes, that's true. For things like GIF though, we can be sure that all pages are the same size, and being about to write out a strip as a set of small files is useful. |
heifload will restart read if necessary after minimise see #1370
after minimise, we need to reopen the underlying file passes pytest but a proper test is still to come #1370
do multiple renders from one seq iage, check fds are opened and closed as expected see #1370
OK, I think this is all done -- I've revised all the loaders to reopen after minimise, and there's a test that checks this behaviour. |
Nice! I can confirm that the #1370 (comment) test-case no longer produces errors (also tested on Windows). Performing |
Great! Thanks for testing. |
It might be necessary to close the input file prematurely for corrupt images / read errors (similar to kleisauke/net-vips#12 / #1066).
Consider this pyvips example:
This patch seems to work for me (for tiff images):
Originally posted in kleisauke/net-vips#12 (comment)
Unfortunately, the above patch only works for random access. It doesn't work for sequential access.
The text was updated successfully, but these errors were encountered: