New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway down when doing simple PDF extraction #5576
Comments
This error is actually due to For example, if you change |
@AnneYang720 thank you for your help. Yes, I also dropped the installed protobuf to The issue with your suggested approach is that it removes the data associated with the first page of the document. For example, the output of Is there a guide for handling large files with Jina? Seems like this could be an issue, especially for videos, images, and large PDF files. Or is this happening because the data is all getting loaded as a For example in the example-video-search-app I see that documents are indexed one at a time. Does that affect message size? |
My example was just another attempt to verify the error is because of the size limit. After extraction of images from pdf file, the images with shape A more general way we suggest is that you extract the images first and store them elsewhere (such as local file system). Instead of doc.chunks.extend([Document(tensor=img, mime_type='image/*') for img in images]) you can do doc.chunks.extend([Document(uri='data/your_image.png', mime_type='image/*') for img in images]) and call the function |
@AnneYang720 thank you for your input. |
Describe the bug
Using the
PDFSegmenter
, a gateway runtime error occurs. I can confirm that the issue is not with the text or image extraction of thePDFSegmenter
as that runs without error. After the code runs, it seems to hang for about 10 additional seconds then I get an error message:Additional details:
test.py
:pdf_segmenter.py
:Here is the output with running
JINA_LOG_LEVEL=DEBUG python test.py
:I also note that this does not occur with every PDF. The PDF which is causing problems is attached.
rg.25si055505.pdf
Describe how you solve it
Environment
The text was updated successfully, but these errors were encountered: