Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle.utils.run_check()报错 #63872

Closed
wikithink opened this issue Apr 25, 2024 · 0 comments
Closed

paddle.utils.run_check()报错 #63872

wikithink opened this issue Apr 25, 2024 · 0 comments
Assignees
Labels
status/close 已关闭 type/build 编译/安装问题

Comments

@wikithink
Copy link

问题描述 Issue Description

环境说明:
docker24.0
Ubuntu20.04
4块3060
cuda11.8
cudnn 8.9
python3.8.19
paddle_gpu_2.6.1
NCCL 2.16.5

安装好了之后,在终端执行:
python -c "import paddle; paddle.utils.run_check()"
成功输出
PaddlePaddle works well on 4 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

在jupyter的单元格里面输入:
import paddle
paddle.utils.run_check()
成功输出
PaddlePaddle works well on 4 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

但是将上述2行脚本写到test.py文件里面,只有这两行
然后在同一个终端运行:python test.py会报如下错误:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Termination signal is detected by the operating system.
[TimeInfo: *** Aborted at 1714037430 (unix time) try "date -d @1714037430" if you are using GNU date ***]
[SignalInfo: *** SIGTERM (@0xacdb) received by PID 44332 (TID 0x7f0920465740) from PID 44251 ***]

WARNING:root:PaddlePaddle meets some problem with 4 GPUs. This may be caused by:

  1. There is not enough GPUs visible on your system
  2. Some GPUs are occupied by other process now
  3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests
    to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
    WARNING:root:
    Original Error is: Process 1 terminated with exit code 1.
    PaddlePaddle is installed successfully ONLY for single GPU! Let's start deep learning with PaddlePaddle now.
    Traceback (most recent call last):
    File "/root/paddle/root/dsti/paddle_env_test.py", line 5, in
    paddle.utils.run_check()
    File "/usr/local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 302, in run_check
    raise e
    File "/usr/local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 283, in run_check
    _run_parallel(device_list)
    File "/usr/local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 210, in _run_parallel
    paddle.distributed.spawn(train_for_run_parallel, nprocs=len(device_list))
    File "/usr/local/lib/python3.8/site-packages/paddle/distributed/spawn.py", line 614, in spawn
    while not context.join():
    File "/usr/local/lib/python3.8/site-packages/paddle/distributed/spawn.py", line 423, in join
    self._throw_exception(error_index)
    File "/usr/local/lib/python3.8/site-packages/paddle/distributed/spawn.py", line 435, in _throw_exception
    raise Exception(
    Exception: Process 1 terminated with exit code 1.

做过的尝试:
1.重新安装NCCL,并将libnccl.so软连接
ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2.16.5 /usr/lib64/libnccl.so
ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2.16.5 /usr/local/bin/libnccl.so
2. 检查环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/bin:/usr/lib64
3.问过chatgpt,没有解决

没有用,一直报错,还请帮帮忙,谢谢!

版本&环境信息 Version & Environment Information

环境说明:
docker24.0
Ubuntu20.04
4块3060
cuda11.8
cudnn 8.9
python3.8.19
paddle_gpu_2.6.1
NCCL 2.16.5

@wikithink wikithink added status/new-issue 新建 type/build 编译/安装问题 labels Apr 25, 2024
@paddle-bot paddle-bot bot added status/close 已关闭 and removed status/new-issue 新建 labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/close 已关闭 type/build 编译/安装问题
Projects
None yet
Development

No branches or pull requests

2 participants