Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfLoader失败,显示ocr的模型权重加载失败 #400

Open
bill4689 opened this issue Jun 14, 2024 · 3 comments
Open

PdfLoader失败,显示ocr的模型权重加载失败 #400

bill4689 opened this issue Jun 14, 2024 · 3 comments

Comments

@bill4689
Copy link

bill4689 commented Jun 14, 2024

from qanything_kernel.utils.loader.self_pdf_loader import PdfLoader
pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf')
markdown_directory = pdf_loader.load_to_markdown()
print(f"Markdown文件在: {markdown_directory}")

➜ QAnything python QAnything_ocr.py
LOCAL DATA PATH: /mnt/user/QAnything-qanything-python/QAnything/QANY_DB/content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
Traceback (most recent call last):
File "/mnt/user/QAnything-qanything-python/QAnything/QAnything_ocr.py", line 6, in
pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf')
File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/self_pdf_loader.py", line 14, in init
super().init()
File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/parser/pdf_parser.py", line 34, in init
self.layouter = LayoutRecognizer("layout")
File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/layout_recognizer.py", line 20, in init
super().init(self.labels, domain, model_dir)
File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/recognizer.py", line 28, in init
self.ort_sess = ort.InferenceSession(model_file_path, providers=['CPUExecutionProvider'])
File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/checkpoints/layout/layout.onnx failed:Protobuf parsing failed.
尝试过的onnxruntime版本如下:
onnxruntime==1.17.1
onnxruntime-gpu==1.17.1
onnxruntime==1.18.0
onnxruntime-gpu==1.18.0

@191611
Copy link

191611 commented Jun 19, 2024

可能是模型文件损坏,或下载的是指针文件?我用魔塔的python重下了一遍就好了

@zhudongwork
Copy link

from qanything_kernel.utils.loader.self_pdf_loader import PdfLoader pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf') markdown_directory = pdf_loader.load_to_markdown() print(f"Markdown文件在: {markdown_directory}")

➜ QAnything python QAnything_ocr.py LOCAL DATA PATH: /mnt/user/QAnything-qanything-python/QAnything/QANY_DB/content LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1 LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1 Traceback (most recent call last): File "/mnt/user/QAnything-qanything-python/QAnything/QAnything_ocr.py", line 6, in pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf') File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/self_pdf_loader.py", line 14, in init super().init() File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/parser/pdf_parser.py", line 34, in init self.layouter = LayoutRecognizer("layout") File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/layout_recognizer.py", line 20, in init super().init(self.labels, domain, model_dir) File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/recognizer.py", line 28, in init self.ort_sess = ort.InferenceSession(model_file_path, providers=['CPUExecutionProvider']) File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/checkpoints/layout/layout.onnx failed:Protobuf parsing failed. 尝试过的onnxruntime版本如下: onnxruntime==1.17.1 onnxruntime-gpu==1.17.1 onnxruntime==1.18.0 onnxruntime-gpu==1.18.0

跑通了不,有没有遇到Error in Powerful PDF parsing: max() arg is an empty sequence

@xeon-ye
Copy link

xeon-ye commented Jun 25, 2024

代码:import os

from qanything_kernel.utils.loader.self_pdf_loader import PdfLoader
file_path = "C:/home/QAnything/example/test.pdf"
loader = PdfLoader(filename=file_path,binary=None, save_dir=os.path.dirname(file_path))
md = loader.load_to_markdown()
print(md)
输出:
LOCAL DATA PATH: C:\home\QAnything\QANY_DB\content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
table model initing...
cpu
table model inited...
WARNING:root:Miss outlines
23it [00:00, ?it/s]
2024-06-25 15:20:41,009 Start OCR!
2024-06-25 15:20:41,012 OCR finished in 2.0259380000061356 seconds
preprocess
preprocess
23it [00:00, 11515.94it/s]
### 2024-06-25 15:20:43,110 Error in Powerful PDF parsing: max() arg is an empty sequence
2024-06-25 15:20:43,112 PDF Parse finished in 2.0993914999999106 seconds
C:/home/QAnything/example\test_md\test.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants