-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image.WriteToFile crashes with NetVips.Native 8.9.0-rc4 #53
Comments
Hi @Gigas002, Thanks for testing the NetVips.Native v8.9.0-rc4 (pre-release) package! I couldn't reproduce this issue on Windows x64 with this test program: Test programnamespace RandomCropTest
{
using System;
using NetVips;
class RandomCropTest
{
static void Main(string[] args)
{
NetVips.CacheSetMax(0);
// source: https://github.com/Gigas002/GTiff2Tiles/blob/master/Examples/Input/Input3785.tif
var huge = Image.NewFromFile("huge.tif");
const int tileSize = 256;
const double scale = 0.7;
var rnd = new Random();
// make a black background image to insert our resized tiles into
var background = Image.Black(10000, 10000);
for (var j = 0; j < 100; j++)
{
// fetch source pixels from here ... tiles along the right and bottom will be
// cropped against the image edges
var oldX = rnd.Next(0, huge.Width);
var oldY = rnd.Next(0, huge.Height);
var width = Math.Min(tileSize, huge.Width - oldX);
var height = Math.Min(tileSize, huge.Height - oldY);
// write resized pixels here
var newX = rnd.Next(0, background.Width);
var newY = rnd.Next(0, background.Height);
// fetch cropped tile from input ... because tiles have been cropped,
// non-tile areas will appear transparent in the output
using var tile = huge.Crop(oldX, oldY, width, height);
using var resized = tile.Resize(scale);
// write into background
background = background.Insert(resized, newX, newY);
Console.WriteLine($"[{oldX}, {oldY}, {width}, {height}] -> [{newX}, {newY}]");
}
background.WriteToFile("x.png");
Console.WriteLine("Done!");
Console.ReadKey();
}
}
} (tested with both pre-compiled libvips v8.9.0-rc4 and v8.8.4 binaries) It could be a issue with libvips v8.9.0 (since you couldn't reproduce it with libvips 8.8.4). I've a few of questions:
This issue might be bitness problem because the exception did not occur on Linux (all pre-compiled binaries for Linux are 64-bit). @jcupitt Do you have any idea what could go wrong? I noticed that |
Hello. And here’s the results I’ve got from them on Windows 10 x64 (binaries are also built targeting x64 only, not AnyCPU): There’s some math in code, you don’t need to look on all of these, I marked it as |
I think the seek horribleness is OK -- libtiff does the same revolting mess: https://gitlab.com/libtiff/libtiff/blob/master/libtiff/tif_unix.c#L123 ie. they cast -1 to unint64 to signal error. We could make the cast explicit. |
I found this related libtiff issue: |
Thanks for you standalone sample! I could reproduce this issue on Windows x64. This slightly simpler program fails reliably for me (with sequential and random access): namespace RandomCropStressTest
{
using System;
using System.Threading.Tasks;
using NetVips;
public static class RandomCropStressTest
{
static readonly Random Rnd = new Random();
static Image RandomCrop(this Image image, int tileSize)
{
var x = Rnd.Next(0, image.Width);
var y = Rnd.Next(0, image.Height);
var width = Math.Min(tileSize, image.Width - x);
var height = Math.Min(tileSize, image.Height - y);
return image.Crop(x, y, width, height);
}
static void Main()
{
NetVips.CacheSetMax(0);
const string access = "sequential"/*"random"*/;
const int tileSize = 256;
const int threads = 6;
// dimensions: 6569 x 3073
var huge = Image.NewFromFile("4326.tif", access: access);
Parallel.For(0, 1000, new ParallelOptions { MaxDegreeOfParallelism = threads },
i => huge.RandomCrop(tileSize).WriteToFile($"x_{i}.png"));
Console.WriteLine("Done!");
Console.ReadKey();
}
}
} (i.e. random cropping with 6 threads on the same input image) I'll have a look. |
You probably saw, but this is how libtiff tests for seek error: https://gitlab.com/libtiff/libtiff/blob/master/libtiff/tif_aux.c#L409 return off <= (~(uint64)0)/2 && TIFFSeekFile(tif,off,SEEK_SET)==off; So I think we're OK (I think). |
To make our meaning clearer. libtiff uses uint64 for toff_t, with -1 cast to uint64 for an error indication, see: https://gitlab.com/libtiff/libtiff/blob/master/libtiff/tif_unix.c#L123 See also: kleisauke/net-vips#53
Thanks for making the casting explicit. This issue looks like a concurrency problem, I was able to reproduce it with pyvips (on Linux and Windows): pyvips reproduce script#!/usr/bin/env python
import random
import sys
from threading import Thread
import pyvips
# import logging
# logging.basicConfig(level = logging.DEBUG)
# pyvips.cache_set_trace(True)
pyvips.cache_set_max(0)
access = 'sequential' # 'random'
tile_size = 256
threads = 24 # 6
im = pyvips.Image.new_from_file(sys.argv[1], access=access)
def parallel_crop(image):
for i in range(1000):
x = random.randint(0, image.width - 1)
y = random.randint(0, image.height - 1)
width = min(tile_size, image.width - x)
height = min(tile_size, image.height - y)
crop = image.crop(x, y, width, height)
try:
avg = crop.avg()
except pyvips.Error as e:
print(e.message)
print(e.detail)
sys.exit(1)
# print(pyvips.ffi.cast('GObject *', image.pointer).ref_count)
if __name__ == '__main__':
# don't try this at home, I guess
for i in range(threads):
t = Thread(target=parallel_crop, args=(im,))
t.start() On Windows this issue occurs with 6 threads. On Linux it needs 24 threads(!) to reproduce it sometimes (you'll get an The weird thing is that I couldn't reproduce it with the multiprocessing script#!/usr/bin/env python
import multiprocessing as mp
import random
import sys
import pyvips
# import logging
# logging.basicConfig(level = logging.DEBUG)
# pyvips.cache_set_trace(True)
pyvips.cache_set_max(0)
access = 'sequential' # 'random'
tile_size = 256
threads = 6
image = pyvips.Image.new_from_file(sys.argv[1], access=access)
def parallel_crop(i):
x = random.randint(0, image.width - 1)
y = random.randint(0, image.height - 1)
width = min(tile_size, image.width - x)
height = min(tile_size, image.height - y)
crop = image.crop(x, y, width, height)
try:
avg = crop.avg()
except pyvips.Error as e:
print(e.message)
print(e.detail)
sys.exit(1)
print(pyvips.ffi.cast('GObject *', image.pointer).ref_count)
if __name__ == '__main__':
with mp.Pool(processes=threads) as p:
p.map(parallel_crop, range(0, 1000)) But perhaps it's using a mutex (the |
Huh I'm confused now I look at the Python code. You shouldn't be able to do random crops from a TIFF in sequential mode. Is this a tiled tiff? If so, access won't have any effect. |
Ah I see it here too, thanks! Yes, it's a tiled tiff, so access doesn't matter. I can reproduce with: #!/usr/bin/python3
import random
import sys
from threading import Thread
import pyvips
def parallel_crop(image):
for i in range(1000):
x = random.randint(0, image.width - 256)
y = random.randint(0, image.height - 256)
crop = image.crop(x, y, 256, 256)
avg = crop.avg()
im = pyvips.Image.new_from_file(sys.argv[1])
for i in range(24):
t = Thread(target=parallel_crop, args=(im,))
t.start() Some kind of horrible race condition. I'll have a look. |
I did a quick test, it appears that this issue only occurs with tiled-TIFF images. For example: vips copy tiled.tif non-tiled.tif
vips copy non-tiled.tif tiled2.tif[tile]
|
I tried with 8.8 and that seems to work, so I suppose it must be a race somewhere in the new IO system. |
Ah, found it! It's not in read at all, it's minimise. The tiff loader closes the file descriptor after reading the header, and again after each computation. When a computation starts up, it opens the file descriptor again and restores the seek point. There's a lock inside the cache of recent tiles which prevents races on reopen, but there's no lock on the minimise action. Probably the best solution is just not to minimise on end of computation for tiled images. |
We weer minimising sources in the ::minimise handler, but this is called outside the lock that protects _generate. This patch removes minimise in this case. See kleisauke/net-vips#53
That seems to fix it for me. Thanks for reporting this @Gigas002 ! |
Thanks! I confirm that it's working fine now. As a side note, you might be interested in this commit: 40cbfcd. I found it while debugging this issue. Example code: NetVips.CacheSetMax(0);
var image = Image.NewFromFile("lichtenstein.jpg");
for (var i = 0; i < 100; i++)
{
using (var crop = image.Crop(0, 0, 256, 256))
{
var _ = crop.Avg();
}
// RefCount should not increase (i.e. operation should be freed)
Console.WriteLine("reference count: " + image.RefCount);
} The reference count of the main image could easily climb to 100 as the I also tested ruby-vips (on Windows) with this: ruby-vips test#!/usr/bin/ruby
require 'vips'
Vips::cache_set_max 0
im = Vips::Image.new_from_file ARGV[0]
100.times do |_|
crop = im.crop 0, 0, 256, 256
avg = crop.avg
# This can easily climb to 20
puts "reference count: #{im.ref_count}"
end
def ref_count
return @struct[:parent][:parent][:ref_count]
end And noticed a similar behavior, but it looks like the GC is kicking-in around pyvips test#!/usr/bin/env python
import sys
import pyvips
pyvips.cache_set_max(0)
im = pyvips.Image.new_from_file(sys.argv[1])
for i in range(100):
crop = im.crop(0, 0, 256, 256)
avg = crop.avg()
# This prints always 2
print(pyvips.ffi.cast('GObject *', im.pointer).ref_count) The reference count remains steady to 2. Replacing the crop with: pyvips change im = pyvips.Image.new_from_file(sys.argv[1])
for i in range(100):
- crop = im.crop(0, 0, 256, 256)
- avg = crop.avg()
+ avg = im.crop(0, 0, 256, 256).avg()
print(pyvips.ffi.cast('GObject *', im.pointer).ref_count) Changes the reference count to 1. |
Great! Thanks for testing it. That's an interesting commit, I wonder if ruby could free more aggressively. Though it now has a very nice generational GC, so things like this should be found quickly. Python isn't GCd, it's refcounted -- things are freed as soon as they go out of scope. There's a mark-sweep GC as well, but it's only run occasionally to break any ref cycles. It's why py has such good memory behaviour. |
Ah I see, Python does indeed have (pretty) good memory behavior. Let me know if I should open a separate issue on the ruby-vips repo (the reference count might increase if running that loop parallelized, but I didn't test that). @Gigas002 NetVips v1.2.0 and NetVips.Native v8.9.1 is now available. Thanks for reporting this! |
Image.WriteToFile
method with NetVips.Native 8.9.0-rc4 sometimes crashes unpredictably. I have an application, that writes tiles by cropping GeoTIFF. I've run some tests on CI and local machine (Win10). Each test writes approximately 7000 tiles. I've got following results:8 tests on linux (CI) - passes without errors
12 tests on windows (CI) - 1/12 crashed
20 tests on windows (local) - 5/12 crashed
At first I tried to downgrade to NetVips 1.1.0 instead of 1.2.0-rc2 but that didn't solve the issue. Then I tried to downgrade to NetVips.Native 8.8.4 and run through unit tests again:
20 tests on windows (local) - all passes
I also looked into tiles, that was written wrong and their correct versions, they look like this:
Broken:
![27304](https://user-images.githubusercontent.com/24297712/72204488-85e89e80-3489-11ea-80c6-bae506b85739.png)
Correct:
![27304_2](https://user-images.githubusercontent.com/24297712/72204489-87b26200-3489-11ea-9309-580677eb9cdb.png)
The exact fragment of code, throwing the exception is following:
Vips's exception message:
The text was updated successfully, but these errors were encountered: