New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docinfo fails in threads #27
Comments
It does work with a from concurrent.futures import ProcessPoolExecutor
from io import BytesIO
from urllib.request import urlopen
import sys
import threading
import pikepdf
print(f'sys.version = {sys.version.replace(chr(10), "")}')
print(f'pikepdf.__version__ = {pikepdf.__version__}')
print(f'pikepdf.libqpdf_version__ = {pikepdf.__libqpdf_version__}')
pdf_bytes = urlopen('https://www.fda.gov/downloads/drugs/guidances/ucm353925.pdf').read()
def get_docinfo(pdf_bytes):
thread_name = threading.current_thread().name
pdf = pikepdf.open(BytesIO(pdf_bytes))
print(f'{thread_name}: got pdf {pdf}')
docinfo = pdf.docinfo # GETS STUCK HERE IN THREAD.
print(f'{thread_name}: got docinfo')
docinfo = {k: str(v) for k, v in dict(docinfo).items()}
print(f'{docinfo}')
return docinfo
local_docinfo = get_docinfo(pdf_bytes)
executor = ProcessPoolExecutor(max_workers=1)
threaded_docinfos = list(executor.map(get_docinfo, [pdf_bytes]))
print('Finished.') |
It looks like the issue is in pybind11 and fixed in master but not in a release build. Essentially there are problems in pybind11 2.2.4 when a thread tries to acquire GIL, which is necessary here. If you apply that patch and build pikepdf against a local copy of pybind11 it should resolve the issue. I will wait for a tagged release of pybind11 that contains the fix. For what you're doing a ProcessPoolExecutor is probably more performant anyway because it avoids competition for the GIL, and so the work can be properly parallelized. As I mentioned above, there is currently a restriction that you can't marshall pikepdf objects across a process boundary, but if you force the objects into some Python representation then there is no issue. |
I am now using |
Fixed for v1.3.1 |
docinfo
doesn't work at all in a thread. The following code can demonstrate the problem.The output is:
It then gets stuck.
The text was updated successfully, but these errors were encountered: