We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python模式,全CPU运行,调用外部大模型。 在config中打开PDF强力解析:
pdf_config = { # 设置是否使用快速PDF解析器,设置为False时,使用优化后的PDF解析器,但速度下降 "USE_FAST_PDF_PARSER": False } 运行,上传pdf,后台日志: Error in Powerful PDF parsing: PdfLoader.init() got an unexpected keyword argument 'root_dir', use fast PDF parser instead. ... insert_to_faiss: success num: 1, failed num: 0 从日志中看出来,强力解析出错,然后专用快速解析。
期望强力解析能够正常运行。
- OS:ubuntu 22.04 - NVIDIA Driver: 无 - CUDA: 无 - docker: 无 - docker-compose: 无 - NVIDIA GPU: 无 - NVIDIA GPU Memory: 无
debug.log中的内容:
2024-06-18 10:40:56,518 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - upload_files zzp 2024-06-18 10:40:56,520 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - mode: strong 2024-06-18 10:40:56,524 - [PID: 88643][MainProcess] - [Function: check_kb_exist] - INFO - check_kb_exist [('KB2baad59dd8b346f79ae06061c86da883',)] 2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - ori name: 建筑光伏系统应用技术标准.pdf 2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - decode name: 建筑光伏系统应用技术标准.pdf 2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - cleaned name: 建筑光伏系统应用技术标准.pdf 2024-06-18 10:40:56,526 - [PID: 88643][MainProcess] - [Function: check_user_exist_] - INFO - check_user_exist [('zzp',)] 2024-06-18 10:40:56,527 - [PID: 88643][MainProcess] - [Function: check_kb_exist] - INFO - check_kb_exist [('KB2baad59dd8b346f79ae06061c86da883',)] 2024-06-18 10:40:56,530 - [PID: 88643][MainProcess] - [Function: add_file] - INFO - add_file: e87590666140418eba9d0f135d5ea390 2024-06-18 10:40:56,530 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - 建筑光伏系统应用技术标准.pdf, e87590666140418eba9d0f135d5ea390, success 2024-06-18 10:40:56,541 - [PID: 88643][MainProcess] - [Function: init] - INFO - success init localfile 建筑光伏系统应用技术标准.pdf 2024-06-18 10:40:56,545 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert_files_to_faiss: KB2baad59dd8b346f79ae06061c86da883 2024-06-18 10:40:56,546 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - WARNING - Error in Powerful PDF parsing: PdfLoader.init() got an unexpected keyword argument 'root_dir', use fast PDF parser instead. 2024-06-18 10:40:57,513 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - before 2nd split doc lens: 8 2024-06-18 10:40:57,514 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - after 2nd split doc lens: 8 2024-06-18 10:40:57,515 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - langchain analysis content head: 住房城乡建设部信息公开 浏览专用 住房城乡建设部信息公开 浏览专用 住房城乡建设部信息公开 浏览专用 住房城乡建设部信息公开 浏览专用 住房城乡建设部信息公开 浏览 2024-06-18 10:40:57,515 - [PID: 88643][MainProcess] - [Function: inner] - INFO - 函数 split_file_to_docs 执行耗时: 0.9691917896270752 秒 2024-06-18 10:40:57,518 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - split time: 0.9694967269897461 8 2024-06-18 10:40:57,521 - [PID: 88643][MainProcess] - [Function: load_vector_store] - INFO - load faiss index: /root/QAnything/QANY_DB/faiss/KB2baad59dd8b346f79ae06061c86da883/faiss_index 2024-06-18 10:40:58,044 - [PID: 88643][MainProcess] - [Function: _load_kb_to_memory] - INFO - FAISS load kb_ids: ['KB2baad59dd8b346f79ae06061c86da883'] 2024-06-18 10:40:58,046 - [PID: 88643][MainProcess] - [Function: get_len_safe_embeddings] - INFO - embedding number: 1 2024-06-18 10:40:59,334 - [PID: 88643][MainProcess] - [Function: get_embedding] - INFO - onnx infer time: 1.2814881801605225 2024-06-18 10:40:59,337 - [PID: 88643][MainProcess] - [Function: get_embedding] - INFO - embedding shape: (8, 768) 2024-06-18 10:40:59,342 - [PID: 88643][MainProcess] - [Function: inner] - INFO - 函数 get_len_safe_embeddings 执行耗时: 1.2964568138122559 秒 2024-06-18 10:40:59,357 - [PID: 88643][MainProcess] - [Function: add_document] - INFO - add documents number: 8 2024-06-18 10:40:59,363 - [PID: 88643][MainProcess] - [Function: add_document] - INFO - save faiss index: /root/QAnything/QANY_DB/faiss/KB2baad59dd8b346f79ae06061c86da883/faiss_index 2024-06-18 10:40:59,363 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert time: 1.847867727279663 2024-06-18 10:40:59,365 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert_to_faiss: success num: 1, failed num: 0 2024-06-18 10:41:22,223 - [PID: 88643][MainProcess] - [Function: list_docs] - INFO - list_docs zzp 2024-06-18 10:41:22,224 - [PID: 88643][MainProcess] - [Function: list_docs] - INFO - kb_id: KB2baad59dd8b346f79ae06061c86da883
1.python模式,全CPU运行,调用外部LLM。 2.config中打开强力解析。 3.启动。 4.上传PDF,观察日志。
No response
The text was updated successfully, but these errors were encountered:
The same problem. Any solution?
Sorry, something went wrong.
一样遇到了这个问题
No branches or pull requests
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
python模式,全CPU运行,调用外部大模型。
在config中打开PDF强力解析:
pdf解析参数
pdf_config = {
# 设置是否使用快速PDF解析器,设置为False时,使用优化后的PDF解析器,但速度下降
"USE_FAST_PDF_PARSER": False
}
运行,上传pdf,后台日志:
Error in Powerful PDF parsing: PdfLoader.init() got an unexpected keyword argument 'root_dir', use fast PDF parser instead.
...
insert_to_faiss: success num: 1, failed num: 0
从日志中看出来,强力解析出错,然后专用快速解析。
期望行为 | Expected Behavior
期望强力解析能够正常运行。
运行环境 | Environment
QAnything日志 | QAnything logs
debug.log中的内容:
2024-06-18 10:40:56,518 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - upload_files zzp
2024-06-18 10:40:56,520 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - mode: strong
2024-06-18 10:40:56,524 - [PID: 88643][MainProcess] - [Function: check_kb_exist] - INFO - check_kb_exist [('KB2baad59dd8b346f79ae06061c86da883',)]
2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - ori name: 建筑光伏系统应用技术标准.pdf
2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - decode name: 建筑光伏系统应用技术标准.pdf
2024-06-18 10:40:56,525 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - cleaned name: 建筑光伏系统应用技术标准.pdf
2024-06-18 10:40:56,526 - [PID: 88643][MainProcess] - [Function: check_user_exist_] - INFO - check_user_exist [('zzp',)]
2024-06-18 10:40:56,527 - [PID: 88643][MainProcess] - [Function: check_kb_exist] - INFO - check_kb_exist [('KB2baad59dd8b346f79ae06061c86da883',)]
2024-06-18 10:40:56,530 - [PID: 88643][MainProcess] - [Function: add_file] - INFO - add_file: e87590666140418eba9d0f135d5ea390
2024-06-18 10:40:56,530 - [PID: 88643][MainProcess] - [Function: upload_files] - INFO - 建筑光伏系统应用技术标准.pdf, e87590666140418eba9d0f135d5ea390, success
2024-06-18 10:40:56,541 - [PID: 88643][MainProcess] - [Function: init] - INFO - success init localfile 建筑光伏系统应用技术标准.pdf
2024-06-18 10:40:56,545 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert_files_to_faiss: KB2baad59dd8b346f79ae06061c86da883
2024-06-18 10:40:56,546 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - WARNING - Error in Powerful PDF parsing: PdfLoader.init() got an unexpected keyword argument 'root_dir', use fast PDF parser instead.
2024-06-18 10:40:57,513 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - before 2nd split doc lens: 8
2024-06-18 10:40:57,514 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - after 2nd split doc lens: 8
2024-06-18 10:40:57,515 - [PID: 88643][MainProcess] - [Function: split_file_to_docs] - INFO - langchain analysis content head: 住房城乡建设部信息公开
浏览专用
住房城乡建设部信息公开
浏览专用
住房城乡建设部信息公开
浏览专用
住房城乡建设部信息公开
浏览专用
住房城乡建设部信息公开
浏览
2024-06-18 10:40:57,515 - [PID: 88643][MainProcess] - [Function: inner] - INFO - 函数 split_file_to_docs 执行耗时: 0.9691917896270752 秒
2024-06-18 10:40:57,518 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - split time: 0.9694967269897461 8
2024-06-18 10:40:57,521 - [PID: 88643][MainProcess] - [Function: load_vector_store] - INFO - load faiss index: /root/QAnything/QANY_DB/faiss/KB2baad59dd8b346f79ae06061c86da883/faiss_index
2024-06-18 10:40:58,044 - [PID: 88643][MainProcess] - [Function: _load_kb_to_memory] - INFO - FAISS load kb_ids: ['KB2baad59dd8b346f79ae06061c86da883']
2024-06-18 10:40:58,046 - [PID: 88643][MainProcess] - [Function: get_len_safe_embeddings] - INFO - embedding number: 1
2024-06-18 10:40:59,334 - [PID: 88643][MainProcess] - [Function: get_embedding] - INFO - onnx infer time: 1.2814881801605225
2024-06-18 10:40:59,337 - [PID: 88643][MainProcess] - [Function: get_embedding] - INFO - embedding shape: (8, 768)
2024-06-18 10:40:59,342 - [PID: 88643][MainProcess] - [Function: inner] - INFO - 函数 get_len_safe_embeddings 执行耗时: 1.2964568138122559 秒
2024-06-18 10:40:59,357 - [PID: 88643][MainProcess] - [Function: add_document] - INFO - add documents number: 8
2024-06-18 10:40:59,363 - [PID: 88643][MainProcess] - [Function: add_document] - INFO - save faiss index: /root/QAnything/QANY_DB/faiss/KB2baad59dd8b346f79ae06061c86da883/faiss_index
2024-06-18 10:40:59,363 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert time: 1.847867727279663
2024-06-18 10:40:59,365 - [PID: 88643][MainProcess] - [Function: insert_files_to_faiss] - INFO - insert_to_faiss: success num: 1, failed num: 0
2024-06-18 10:41:22,223 - [PID: 88643][MainProcess] - [Function: list_docs] - INFO - list_docs zzp
2024-06-18 10:41:22,224 - [PID: 88643][MainProcess] - [Function: list_docs] - INFO - kb_id: KB2baad59dd8b346f79ae06061c86da883
复现方法 | Steps To Reproduce
1.python模式,全CPU运行,调用外部LLM。
2.config中打开强力解析。
3.启动。
4.上传PDF,观察日志。
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: