Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening concurrent file handles for all instancefiles risks breaching OS limit. #352

Closed
ianmackinnon opened this issue Nov 8, 2023 · 6 comments · Fixed by #363
Closed
Labels
enhancement New feature or request

Comments

@ianmackinnon
Copy link

The use of the BinaryIO type for instancefiles causes file handles to be opened concurrently by Click for all instance files supplied at the command line.

A user may, for example, run a glob expansion on a number of targets that exceeds the limit of open files they are permitted by their operating system. Currently check-jsonschema fails in this case with OSError: [Errno 24] Too many open files. It would be nice if files were opened sequentially so that the user could supply a far greater number of arguments to a single command without error.

A minimal case for simulating this on Linux is as follows:

# Set open files limit to 1024 (this is a typical default on Linux,
#   but we do it explicitly to make the test repeatable):
ulimit -n 1024
mkdir /tmp/c-js
echo '{}' > /tmp/c-js/schema.js
for i in $(seq -f "%04g" 1024); do echo '{}' > /tmp/c-js/instance.$i.json; done;

check-jsonschema --schemafile /tmp/c-js/schema.js /tmp/c-js/instance.0001.json
# ok -- validation done

check-jsonschema --schemafile /tmp/c-js/schema.js /tmp/c-js/instance.*.json
# OSError: [Errno 24] Too many open files

Could it be possible to type the instancefiles argument as a str or Path and only open the file handles one by one as needed?

@sirosen
Copy link
Member

sirosen commented Nov 8, 2023

Thanks for the report/request!

It's certainly possible to do -- it is how earlier versions worked. I switched to the built-in click behavior when adding support for stdin, as it seemed like a nice simplification.

Before I pursue this, a quick question: are you running into this issue today, do you foresee running into it (soon), or is the issue purely theoretical?
This may be relevant for prioritization, and I may be pressed on it if I look to make any upstream contributions to click in support of it.

I'll want to look into click's capabilities to see if it can open the files lazily before pursuing a change here.

@ianmackinnon
Copy link
Author

Thanks for the quick response. The issue already occurs for me on a project where I'm globbing around 2,000 files. However, it's simple to work around as users can normally raise the ulimit value to anything they like (eg ulimit -n 4096), so it's more a matter of convenience and tidy resource management than a critical issue.

@sirosen
Copy link
Member

sirosen commented Nov 9, 2023

I checked and click does have a lazy mode for file objects, but it rigs them up to close as part of the command exiting. I don't think that using it naively will solve this.

I'll have this near the front of my queue when I'm next able to devote some time to this project (likely in a little over a week).

@sirosen
Copy link
Member

sirosen commented Dec 3, 2023

Minor update:

I've been tinkering on this but it's not working quite as smoothly as I wanted. The click lazy file object does an eager open/close to catch some errors (e.g. permissions) early, which messes with some of the fifos in the testsuite.

It works in a live context, when I use a shell to write a fifo, so I must be doing something slightly wrong in the tests. I might also file a click bug report, since I think the open/close behavior is slightly unsafe for special file types.

I'm going to try to figure out how to fix the fifo tests, but I do have some options for handling this all in the worst case.

@sirosen sirosen added the enhancement New feature or request label Dec 5, 2023
@ianmackinnon
Copy link
Author

Tested and working perfectly. Thanks very much!

@sirosen
Copy link
Member

sirosen commented Dec 6, 2023

Wow, thanks for testing before I even cut the release! I was meaning to push this out yesterday but got a bit delayed.
It's great to hear that it's working for you as intended.

I've just dropped v0.27.3 with this included -- it's the primary update in there. Let me know if you see any issues with the released version. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants