Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: too many open files #21

Open
captcoma opened this issue Apr 13, 2023 · 4 comments
Open

Error: too many open files #21

captcoma opened this issue Apr 13, 2023 · 4 comments

Comments

@captcoma
Copy link

captcoma commented Apr 13, 2023

Usually qpdf::pdf_combine worked fine for a few PDFs, but when I try to combine 500 PDFs get the error: Too many open files.

I found that this error is related to the fact that qpdf opens the files during the process: https://qpdf.readthedocs.io/en/stable/cli.html. There is also a solution with --keep-files-open=[y|n]. However, I think this is not implemented in the R package.

Could I modify pdf_combine that it works?

@jeroen
Copy link
Member

jeroen commented Apr 25, 2023

@jberkenbilt should I somehow be closing the input QPDF after reading it here?

qpdf/src/bindings.cpp

Lines 80 to 95 in a9aad79

QPDF outpdf;
outpdf.emptyPDF();
for (int i = 0; i < infiles.size(); i++) {
QPDF inpdf;
read_pdf_with_password(infiles.at(i), password, &inpdf);
std::vector<QPDFPageObjectHelper> pages = QPDFPageDocumentHelper(inpdf).getAllPages();
for (int i = 0; i < pages.size(); i++) {
QPDFPageDocumentHelper(outpdf).addPage(pages.at(i), false);
}
}
QPDFWriter outpdfw(outpdf, outfile);
outpdfw.setStaticID(true); // for testing only
outpdfw.setStreamDataMode(qpdf_s_preserve);
outpdfw.write();
return outfile;
}

fwiw we still bundle qpdf 8.4.0 right now (to support centos-7 systems).

@jberkenbilt
Copy link

What you'll have to do is to use ClosedFileInputSource and processInputSource. See https://github.com/qpdf/qpdf/blob/989819b75fba380ecdc7416a504ed4b3a2d42ccb/libqpdf/QPDFJob.cc#L2590 as an example, and let me know if you need more guidance. The idea is that ClosedFileInputSource is an input source that opens the file when it needs to use it and closes it afterwards. It causes some overhead, but on a local file system, it's negligible. The overhead is very high over a network file system. ClosedFileInputSource has a stayOpen method you can use as a hint to keep it open if you're going to be doing a lot of operations. The code in QPDFJob that combines pages keeps it open while adding pages, but ultimately it's QPDFWriter that will pull the data out of the original files, and it will open the files multiple times, which shouldn't be an issue. While QPDFJob is later than 8.4.0, all the basic methods called in this example are there in 8.4.0, though you will still need PointerHolder instead of std::shard_ptr. You can probably find this same block of code in qpdf/qpdf.cc in 8.4.0.

@jeroen
Copy link
Member

jeroen commented Mar 24, 2024

I tried to have a look at this ClosedFileInputSource api but I can't figure it out. I think we'll have to table it anyway until we upgrade the bundled libqpdf.

I wish there was just a simple way to close the files from a QPDF object once we are done with it.

@jberkenbilt
Copy link

You could use ClosedFileInputSource for this. You can find several examples in QPDFJob.cc. But, yeah, 8.4.0 is really old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants