Skip to content

"system error: Too many open files" during stitching of more than 14000 single images on Windows #547

@jhackspiel

Description

@jhackspiel

Windows 10
Python 3.13.5
libvips 8.17

I am using pyvips very successfully to stitch large images. But now I am hitting a limit.
In Windows there seems to be a maximum number of simultaneously open files. It can be increased to 8192 but no further.

import win32file
win32file._setmaxstdio(8192)

I have more than 14000 single images (each with a size of 2048x2048 Px). ~10% overlap. I am correcting each single image with vips (vignetting, scaling, rotation,...). With the merge() command I am constructing one single large image. Finally I get my region of interest for the large image with an affine() transformation command. I am writing my output image to a zarr array:

block_size = 2**14
z = zarr.zeros(shape=[im_shape[0], im_shape[1]], 
                                  chunks=(block_size, block_size), 
                                  dtype='uint8', 
                                  store = r'D:\temp\z', 
                                  compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=zarr.Blosc.SHUFFLE))

for y in range(0, im_shape[0], block_size):
    for x in range(0, im_shape[1], block_size):
        chunk = vips_large_image.crop(x, y, min([x + block_size, im_shape[1]])-x, min([y + block_size, im_shape[0]])-y)
        z[y : min([y + block_size, im_shape[0]]), x : min([x + block_size, im_shape[1]])] = chunk.numpy()

During that i am tracking the number of open files for the process. And this number increases continuously until it reaches 8192. That ends it with an error:

print(len(psutil.Process().open_files()))

439
531
621
...
...
...
8021
8097
8178
_______________________________________________________
Error: unable to write to memory
D:\some_file.tif: unable to open for read
system error: Too many open files

I have tried the invalidate() command on the chunks after writing. No effect. I suppose this controls caching only. Not if a file stays open or not.

I have thought about multiprocessing as it would allow me to open 8192 images for every process? Also this could maybe increase performance? For some reason CPU usage never surpasses 20% on my system during processing? Or might there be a different bottleneck aside from processing power?

Is there a more elegant and straight forward solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions