-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 在尝试单独使用PdfLoader出现问题 #367
Comments
Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files] |
好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格? |
报错信息如下: |
|
好的好的,十分感谢。既然不会ocr pdf,那感觉可以把pdf loader里面的ocr相关的东西先去掉,不然很迷惑人哈哈哈,明明都输出ocr finished了,但是实际上却没有ocr |
同感 上传一个单层PDF只有图片 就悲剧了 box找不到 直接报错 跟代码发现没有OCR |
我也真是无语了 |
我也是一样的错误:Error in Powerful PDF parsing: max() arg is an empty sequence。关键是我传的是一页论文pdf,不是图片 |
可以使用ocrmypdf 处理pdf。 |
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
我在self_pdf_loader.py在添加了这么几行代码,用来测试解析pdf的效果
file_path = 'cat.pdf'
file_path = os.path.abspath(os.path.join(os.path.dirname(file), file_path))
loader = PdfLoader(filename=file_path, from_page=14, to_page=15, root_dir=os.path.dirname(file_path))
markdown_dir = loader.load_to_markdown()
docs = convert_markdown_to_langchaindoc(markdown_dir)
docs = PdfLoader.pdf_process(docs)
print(docs)
但是却碰到了检索不到checkpoints的问题
Traceback (most recent call last):
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\core\test.py", line 203, in
loader = PdfLoader(filename=file_path, root_dir=os.path.dirname(file_path))
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\self_pdf_loader.py", line 14, in init
super().init()
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\pdf_to_markdown\core\parser\pdf_parser.py", line 34, in init
self.layouter = LayoutRecognizer("layout")
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\pdf_to_markdown\core\vision\layout_recognizer.py", line 20, in init
super().init(self.labels, domain, model_dir)
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\pdf_to_markdown\core\vision\recognizer.py", line 21, in init
raise ValueError("not find model file path {}".format(
ValueError: not find model file path c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel/utils/loader/pdf_to_markdown\checkpoints/layout\layout.onnx
期望行为 | Expected Behavior
No response
运行环境 | Environment
QAnything日志 | QAnything logs
No response
复现方法 | Steps To Reproduce
No response
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: