本地单独调用 UnstructuredPaddleImageLoader 遇到错误，但 UnstructuredPaddlePDFLoader 没问题 #396

chenkuncloud · 2024-06-13T08:09:44Z

版本：

QAnything: v1.4.1
Python 3.10.14
paddleocr: 2.7.0.3
Ubuntu 18.04.6 LTS

代码：

from paddleocr import PaddleOCR

from qanything_kernel.utils.loader import UnstructuredPaddleImageLoader
from qanything_kernel.utils.splitter import ChineseTextSplitter

file_path = r'地不容.png'
ocr_engine = PaddleOCR(use_angle_cls=True, lang="ch", use_gpu=False, show_log=False)
sentence_size = 100

# loader = UnstructuredPaddleImageLoader(file_path, ocr_engine, True)
loader = UnstructuredPaddleImageLoader(file_path, ocr_engine, mode="elements")
texts_splitter = ChineseTextSplitter(pdf=False, sentence_size=sentence_size)
docs = loader.load_and_split(text_splitter=texts_splitter)

# 输出docs
for doc in docs:
    print(doc)
    print('\n')

错误信息：

<Logger debug_logger (INFO)> <Logger qa_logger (INFO)>
LOCAL DATA PATH: /home/semweb/service/qanything-python/QANY_DB/content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
Traceback (most recent call last):
  File "/home/semweb/service/qanything-python/my_ocr/my_ocr_img.py", line 15, in <module>
    docs = loader.load_and_split(text_splitter=texts_splitter)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 63, in load_and_split
    docs = self.load()
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 29, in load
    return list(self.lazy_load())
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/langchain_community/document_loaders/unstructured.py", line 88, in lazy_load
    elements = self._get_elements()
  File "/home/semweb/service/qanything-python/qanything_kernel/utils/loader/image_loader.py", line 47, in _get_elements
    txt_file_path = image_ocr_txt(self.file_path)
  File "/home/semweb/service/qanything-python/qanything_kernel/utils/loader/image_loader.py", line 38, in image_ocr_txt
    result = self.ocr_engine(img_data)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/tools/infer/predict_system.py", line 76, in __call__
    dt_boxes, elapse = self.text_detector(img)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/tools/infer/predict_det.py", line 229, in __call__
    data = transform(data, self.preprocess_op)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/ppocr/data/imaug/__init__.py", line 56, in transform
    data = op(data)
  File "/home/semweb/miniconda3/envs/qanything-python/lib/python3.10/site-packages/paddleocr/ppocr/data/imaug/operators.py", line 227, in __call__
    src_h, src_w, _ = img.shape
AttributeError: 'dict' object has no attribute 'shape'

The text was updated successfully, but these errors were encountered:

lwdnxu · 2024-07-22T06:52:30Z

same error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

本地单独调用 UnstructuredPaddleImageLoader 遇到错误，但 UnstructuredPaddlePDFLoader 没问题 #396

本地单独调用 UnstructuredPaddleImageLoader 遇到错误，但 UnstructuredPaddlePDFLoader 没问题 #396

chenkuncloud commented Jun 13, 2024

lwdnxu commented Jul 22, 2024

本地单独调用 UnstructuredPaddleImageLoader 遇到错误，但 UnstructuredPaddlePDFLoader 没问题 #396

本地单独调用 UnstructuredPaddleImageLoader 遇到错误，但 UnstructuredPaddlePDFLoader 没问题 #396

Comments

chenkuncloud commented Jun 13, 2024

lwdnxu commented Jul 22, 2024