Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

本地单独调用 UnstructuredPaddleImageLoader 遇到错误,但 UnstructuredPaddlePDFLoader 没问题 #396

Open
chenkuncloud opened this issue Jun 13, 2024 · 1 comment

Comments

@chenkuncloud
Copy link

版本:

  • QAnything: v1.4.1
  • Python 3.10.14
  • paddleocr: 2.7.0.3
  • Ubuntu 18.04.6 LTS

代码:

from paddleocr import PaddleOCR

from qanything_kernel.utils.loader import UnstructuredPaddleImageLoader
from qanything_kernel.utils.splitter import ChineseTextSplitter

file_path = r'地不容.png'
ocr_engine = PaddleOCR(use_angle_cls=True, lang="ch", use_gpu=False, show_log=False)
sentence_size = 100

# loader = UnstructuredPaddleImageLoader(file_path, ocr_engine, True)
loader = UnstructuredPaddleImageLoader(file_path, ocr_engine, mode="elements")
texts_splitter = ChineseTextSplitter(pdf=False, sentence_size=sentence_size)
docs = loader.load_and_split(text_splitter=texts_splitter)

# 输出docs
for doc in docs:
    print(doc)
    print('\n')

错误信息:

<Logger debug_logger (INFO)> <Logger qa_logger (INFO)>
LOCAL DATA PATH: /home/semweb/service/qanything-python/QANY_DB/content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
Traceback (most recent call last):
  File "/home/semweb/service/qanything-python/my_ocr/my_ocr_img.py", line 15, in <module>
    docs = loader.load_and_split(text_splitter=texts_splitter)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 63, in load_and_split
    docs = self.load()
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 29, in load
    return list(self.lazy_load())
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_community/document_loaders/unstructured.py", line 88, in lazy_load
    elements = self._get_elements()
  File "/home/semweb/service/qanything-python/qanything_kernel/utils/loader/image_loader.py", line 47, in _get_elements
    txt_file_path = image_ocr_txt(self.file_path)
  File "/home/semweb/service/qanything-python/qanything_kernel/utils/loader/image_loader.py", line 38, in image_ocr_txt
    result = self.ocr_engine(img_data)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/tools/infer/predict_system.py", line 76, in __call__
    dt_boxes, elapse = self.text_detector(img)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/tools/infer/predict_det.py", line 229, in __call__
    data = transform(data, self.preprocess_op)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/ppocr/data/imaug/__init__.py", line 56, in transform
    data = op(data)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/ppocr/data/imaug/operators.py", line 227, in __call__
    src_h, src_w, _ = img.shape
AttributeError: 'dict' object has no attribute 'shape'
@lwdnxu
Copy link

lwdnxu commented Jul 22, 2024

same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants