Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo.py运行长时间没有运行完毕 #121

Closed
randydl opened this issue Jul 3, 2024 · 9 comments
Closed

demo.py运行长时间没有运行完毕 #121

randydl opened this issue Jul 3, 2024 · 9 comments

Comments

@randydl
Copy link

randydl commented Jul 3, 2024

MinerU是一个离线部署的工具吗,我的机器没有联网,运行demo.py长时间没有结束

@randydl
Copy link
Author

randydl commented Jul 3, 2024

(miner) ➜ MinerU git:(master) python demo/demo.py
Traceback (most recent call last):
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1448, in connect
super().connect()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 942, in connect
self.sock = self._create_connection(
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 9, in detect_lang
lang_upper = detect_langs(text)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_langs
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect
model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 70, in get_model_loaded
download(url=url, folder=cache, filename=name, proxy=download_proxy, retry_max=3, timeout=20)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/robust_downloader/downloader.py", line 69, in download
url = _get_redirect_url(url, max_hops=max_redirect_hops)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/robust_downloader/downloader.py", line 272, in _get_redirect_url
with urllib.request.urlopen(
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 1448, in connect
super().connect()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/http/client.py", line 942, in connect
self.sock = self._create_connection(
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nas_data/userdata/randy/projects/MinerU/demo/demo.py", line 18, in
pipe.pipe_classify()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify
self.pdf_type = AbsPipe.classify(self.pdf_bytes)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/pipe/AbsPipe.py", line 69, in classify
pdf_meta = pdf_meta_scan(pdf_bytes)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 337, in pdf_meta_scan
text_language = get_language(doc)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 289, in get_language
page_language = detect_lang(text_block)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/magic_pdf/libs/language.py", line 12, in detect_lang
lang_upper = detect_langs(html_no_ctrl_chars)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/init.py", line 23, in detect_langs
lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 80, in detect
model = get_model_loaded(low_memory=low_memory, download_proxy=model_download_proxy)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py", line 70, in get_model_loaded
download(url=url, folder=cache, filename=name, proxy=download_proxy, retry_max=3, timeout=20)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/robust_downloader/downloader.py", line 69, in download
url = _get_redirect_url(url, max_hops=max_redirect_hops)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/site-packages/robust_downloader/downloader.py", line 272, in _get_redirect_url
with urllib.request.urlopen(
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/app.e0016372/miniconda3/envs/miner/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

@myhloli
Copy link
Collaborator

myhloli commented Jul 3, 2024

首次运行时,内部的一些模块可能需要联网环境以下载一些小模型资源,看了您的报错日志,是fast_langdetect需要下载一个语言检测用的模型文件,如您的机器不能联网,请将附件中压缩包内容解压到"/tmp"目录下
fasttext-langdetect.zip
参考:
https://github.com/LlmKira/fast-langdetect

@randydl
Copy link
Author

randydl commented Jul 3, 2024

感谢,确实是这个问题,手动下载下来了,已解决

@randydl randydl closed this as completed Jul 3, 2024
@SweeneyW
Copy link

可以生成结果,但是仍旧报题目的错误是啥情况呢?
(MinerU) [appadmin@DX-HH15-12-H20-node1 MinerU]$ magic-pdf pdf-command --pdf "/home/MinerU/pdf_demo/demo1.pdf" --inside_model true
2024-07-20 17:32:03.994 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /home/MinerU/pdf_demo/demo1.json existed
2024-07-20 17:32:03.994 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /home/MinerU/magic-pdf/demo1/auto
2024-07-20 17:32:04.880 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 9, text_len: 33339, cid_chars_radio: 0.000270392068499324
[2024-07-20 17:32:18,795] [ ERROR] check_version.py:39 - Error fetching version info
Traceback (most recent call last):
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1448, in connect
super().connect()
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 942, in connect
self.sock = self._create_connection(
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/albumentations/check_version.py", line 29, in fetch_version_info
with opener.open(url, timeout=2) as response:
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

@SweeneyW
Copy link

以下是完整的命令行信息:

(MinerU) [appadmin@DX-HH15-12-H20-node1 MinerU]$ magic-pdf pdf-command --pdf "/home/MinerU/pdf_demo/demo2.pdf" --inside_model true
2024-07-20 17:34:24.468 | WARNING | magic_pdf.cli.magicpdf:get_model_json:310 - not found json /home/MinerU/pdf_demo/demo2.json existed
2024-07-20 17:34:24.468 | INFO | magic_pdf.cli.magicpdf:do_parse:91 - local output dir is /home/MinerU/magic-pdf/demo2/auto
2024-07-20 17:34:25.244 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 14, text_len: 26394, cid_chars_radio: 0.0005324003650745361
[2024-07-20 17:34:39,003] [ ERROR] check_version.py:39 - Error fetching version info
Traceback (most recent call last):
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 1448, in connect
super().connect()
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/http/client.py", line 942, in connect
self.sock = self._create_connection(
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/albumentations/check_version.py", line 29, in fetch_version_info
with opener.open(url, timeout=2) as response:
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/appadmin/.conda/envs/MinerU/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>
2024-07-20 17:34:40.056 | INFO | magic_pdf.model.pdf_extract_kit:init:92 - DocAnalysis init, this may take some times. apply_layout: True, apply_formula: True, apply_ocr: False
2024-07-20 17:34:40.057 | INFO | magic_pdf.model.pdf_extract_kit:init:100 - using device: cpu
CustomVisionEncoderDecoderModel init
CustomMBartForCausalLM init
CustomMBartDecoder init
[07/20 17:34:49 detectron2]: Rank of current process: 0. World size: 1
cuobjdump info : File '/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/detectron2/_C.cpython-310-x86_64-linux-gnu.so' does not contain device code
[07/20 17:34:51 detectron2]: Environment info:


sys.platform linux
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
numpy 1.26.4
detectron2 0.6 @/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/detectron2
detectron2._C not built correctly: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/detectron2/_C.cpython-310-x86_64-linux-gnu.so)
Compiler ($CXX) c++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
CUDA compiler Build cuda_12.2.r12.2/compiler.32965470_0
detectron2 arch flags /home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/detectron2/_C.cpython-310-x86_64-linux-gnu.so
DETECTRON2_ENV_MODULE
PyTorch 2.3.1+cu121 @/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0,1,2,3,4,5,6,7 NVIDIA H20 (arch=9.0)
Driver version 535.161.08
CUDA_HOME /usr/local/cuda
Pillow 10.4.0
torchvision 0.18.1+cu121 @/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/torchvision
torchvision arch flags 5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.6.0


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX512
  • CUDA Runtime 12.1
  • NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;
    arch=compute_90,code=sm_90
  • CuDNN 8.9.2
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEB
    UG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-
    field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -W
    no-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WI
    TH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OF
    F, USE_ROCM_KERNEL_ASSERT=OFF,

[07/20 17:34:51 detectron2]: Command line arguments: {'config_file': '/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/resources/model_config/layoutlmv3/layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gp
us': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', '/home/MinerU/models/Layout/model_final.pth']}
[07/20 17:34:51 detectron2]: Contents of args.config_file=/home/appadmin/.conda/envs/MinerU/lib/python3.10/site-packages/magic_pdf/resources/model_config/layoutlmv3/layoutlmv3_base_inference.yaml:
AUG:
DETR: true
CACHE_DIR: /mnt/localdata/users/yupanhuang/cache/huggingface
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: false
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • scihub_train
    TRAIN:
  • scihub_train
    GLOBAL:
    HACK: 1.0
    ICDAR_DATA_DIR_TEST: ''
    ICDAR_DATA_DIR_TRAIN: ''
    INPUT:
    CROP:
    ENABLED: true
    SIZE:
    • 384
    • 600
      TYPE: absolute_range
      FORMAT: RGB
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
  • 480
  • 512
  • 544
  • 576
  • 608
  • 640
  • 672
  • 704
  • 736
  • 768
  • 800
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 64
      • 128
      • 256
      • 512
        BACKBONE:
        FREEZE_AT: 2
        NAME: build_vit_fpn_backbone
        CONFIG_PATH: ''
        DEVICE: cuda
        FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      NORM: ''
      OUT_CHANNELS: 256
      IMAGE_ONLY: true
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: true
      META_ARCHITECTURE: VLGeneralizedRCNN
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 127.5
  • 127.5
  • 127.5
    PIXEL_STD:
  • 127.5
  • 127.5
  • 127.5
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res4
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: true
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 10
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.1
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 10.0
    • 10.0
    • 5.0
    • 5.0
      CLS_AGNOSTIC_BBOX_REG: true
      CONV_DIM: 256
      FC_DIM: 1024
      NAME: FastRCNNConvFCHead
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 2
      POOLER_RESOLUTION: 7
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: CascadeROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 10
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 4
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
    • p6
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 1000
      PRE_NMS_TOPK_TRAIN: 2000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 10
      VIT:
      DROP_PATH: 0.1
      IMG_SIZE:
    • 224
    • 224
      NAME: layoutlmv3_base
      OUT_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      POS_TYPE: abs
      WEIGHTS:
      OUTPUT_DIR:
      SCIHUB_DATA_DIR_TRAIN: /mnt/petrelfs/share_data/zhaozhiyuan/publaynet/layout_scihub/train
      SEED: 42
      SOLVER:
      AMP:
      ENABLED: true
      BACKBONE_MULTIPLIER: 1.0
      BASE_LR: 0.0002
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 2000
      CLIP_GRADIENTS:
      CLIP_TYPE: full_model
      CLIP_VALUE: 1.0
      ENABLED: true
      NORM_TYPE: 2.0
      GAMMA: 0.1
      GRADIENT_ACCUMULATION_STEPS: 1
      IMS_PER_BATCH: 32
      LR_SCHEDULER_NAME: WarmupCosineLR
      MAX_ITER: 20000
      MOMENTUM: 0.9
      NESTEROV: false
      OPTIMIZER: ADAMW
      REFERENCE_WORLD_SIZE: 0
      STEPS:
  • 10000
    WARMUP_FACTOR: 0.01
    WARMUP_ITERS: 333
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.05
    WEIGHT_DECAY_BIAS: null
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 1000
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      VERSION: 2
      VIS_PERIOD: 0

[07/20 17:34:52 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /home/MinerU/models/Layout/model_final.pth ...
[07/20 17:34:52 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/MinerU/models/Layout/model_final.pth ...
2024-07-20 17:34:53.060 | INFO | magic_pdf.model.pdf_extract_kit:init:124 - DocAnalysis init done!
2024-07-20 17:34:53.060 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:74 - model init cost: 27.814802408218384
2024-07-20 17:34:55.182 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 1.94

0: 1888x1472 2 embeddings, 86.6ms
Speed: 17.5ms preprocess, 86.6ms inference, 175.7ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:34:57.214 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 2, mfr time: 0.8
2024-07-20 17:34:59.296 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 2.08

0: 1888x1472 24 embeddings, 3 isolateds, 31.1ms
Speed: 14.2ms preprocess, 31.1ms inference, 1.1ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:35:04.152 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 27, mfr time: 4.75
2024-07-20 17:35:06.137 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 1.98

0: 1888x1472 28 embeddings, 6 isolateds, 31.0ms
Speed: 13.3ms preprocess, 31.0ms inference, 1.5ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:35:13.460 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 34, mfr time: 7.21
2024-07-20 17:35:15.302 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 1.84

0: 1888x1472 26 embeddings, 31.1ms
Speed: 12.3ms preprocess, 31.1ms inference, 0.9ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:35:16.665 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 26, mfr time: 1.27
2024-07-20 17:35:18.519 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 1.85

0: 1888x1472 8 embeddings, 31.1ms
Speed: 14.6ms preprocess, 31.1ms inference, 1.2ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:35:19.008 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 8, mfr time: 0.42
2024-07-20 17:35:20.742 | INFO | magic_pdf.model.pdf_extract_kit:call:135 - layout detection cost: 1.73

0: 1888x1472 (no detections), 31.0ms
Speed: 15.1ms preprocess, 31.0ms inference, 0.5ms postprocess per image at shape (1, 3, 1888, 1472)
2024-07-20 17:35:20.790 | INFO | magic_pdf.model.pdf_extract_kit:call:164 - formula nums: 0, mfr time: 0.0
2024-07-20 17:35:20.790 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:92 - doc analyze cost: 27.54484724998474
2024-07-20 17:35:20.885 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 0, last_page_cost_time: 0.0
2024-07-20 17:35:20.935 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 1, last_page_cost_time: 0.05
2024-07-20 17:35:20.999 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 2, last_page_cost_time: 0.06
2024-07-20 17:35:21.087 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 3, last_page_cost_time: 0.09
2024-07-20 17:35:21.335 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 4, last_page_cost_time: 0.25
2024-07-20 17:35:21.467 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:219 - page_id: 5, last_page_cost_time: 0.13
2024-07-20 17:35:21.721 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:140 - 发现了列表,列表行数:[(24, 43)], [[24, 27, 30, 34, 39]]
2024-07-20 17:35:21.721 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:153 - 列表行的第24到第43行是列表
2024-07-20 17:35:21.724 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:140 - 发现了列表,列表行数:[(0, 59)], [[0, 4, 8, 12, 16, 21, 24, 27, 30, 34, 38, 42, 46, 49, 54, 57]]
2024-07-20 17:35:21.724 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:153 - 列表行的第0到第59行是列表
2024-07-20 17:35:21.724 | INFO | magic_pdf.para.para_split_v2:para_split:764 - 连接了第0页和第1页的段落
2024-07-20 17:35:21.725 | INFO | magic_pdf.para.para_split_v2:para_split:764 - 连接了第1页和第2页的段落
2024-07-20 17:35:21.725 | INFO | magic_pdf.para.para_split_v2:para_split:764 - 连接了第2页和第3页的段落
2024-07-20 17:35:21.725 | INFO | magic_pdf.para.para_split_v2:para_split:764 - 连接了第3页和第4页的段落
2024-07-20 17:35:22.337 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
2024-07-20 17:35:22.410 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:43 - uni_pipe mk content list finished

@myhloli
Copy link
Collaborator

myhloli commented Jul 20, 2024

@SweeneyW torch有个更新检测,你没网就会报错的

@SweeneyW
Copy link

@SweeneyW torch有个更新检测,你没网就会报错的

那这个报错,可以直接忽略是吗?

@myhloli
Copy link
Collaborator

myhloli commented Jul 22, 2024

@SweeneyW torch有个更新检测,你没网就会报错的

那这个报错,可以直接忽略是吗?

可以

@bridgeW
Copy link

bridgeW commented Aug 4, 2024

How fix this error?

2024-08-04 22:38:22.072 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 49168, cid_chars_radio: 0.0
2024-08-04 22:38:22.080 | ERROR | magic_pdf.user_api:parse_pdf:85 - string indices must be integers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants