Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Issue #28

Closed
zhangheng408 opened this issue Mar 26, 2020 · 6 comments
Closed

BUG Issue #28

zhangheng408 opened this issue Mar 26, 2020 · 6 comments
Labels
status: needs information Further information is requested

Comments

@zhangheng408
Copy link

zhangheng408 commented Mar 26, 2020

环境

1.系统环境:Centos 7.4
2.MegEngine版本:0.3.1
3.python版本:3.6

复现步骤

  1. cd official/vision/classification/resnet
  2. 运行model中的resnet18或resnet50
    python3 -u train.py --data /mnt/lustre/share/images --arch resnet18 --batch-size 32 --learning-rate 0.0125 --ngpus 8 --save .
  3. 出现错误:ERR cudaGetDeviceCount failed: CUDA driver version is insufficient for CUDA runtime version (err 35)

请提供关键的代码片段便于追查问题

image
image

请提供完整的日志及报错信息

@zhangheng408
Copy link
Author

已经正确设置LIBCUDA_PATH环境变量。是否是有其他选项或配置需要注意?

@zhangheng408
Copy link
Author

而且正常情况下,第一次getLastError()之后,第二次getLastError()应该不会报错的。

@MegEngine MegEngine deleted a comment Mar 27, 2020
@zhangheng408
Copy link
Author

我使用的环境是 CUDA 10.1.168 , driver 418.67, V100显卡。满足官网所属环境要求

@megvii-mge
Copy link
Member

感谢您对 MegEngine 项目的关注,您可以用 ldd 命令看一下 _internal 下的动态库 .so 文件指向的 cuda 路径

@megvii-mge megvii-mge added the status: needs information Further information is requested label Mar 30, 2020
@windreamer
Copy link

@zhangheng408 如果您了解 LIBCUDA_PATH 环境变量,您应该是已经看过 cuda-stub 相关的代码

为了使得 MegEngine 可以正确的打包为 manylinux2010 标准的wheel包,同时不需要包含 libcuda.so 驱动,MegEngine 包含了一个 cuda-stub 的 so 用来将相关请求转发到系统的 libcuda.so 中

具体代码您可以参考: https://github.com/MegEngine/MegEngine/blob/master/dnn/cuda-stub/src/libcuda.cpp

所以你可以观察一下是否您的驱动(libcuda.so) 是在 LIBCUDA_PATH 的路径中,或者在 cuda-stub 可以找到的标准路径中

从目前的返回值(35)来看,cuda-stub是找到了一个 驱动,但是这个驱动过旧了,所以可能您需要检查是否正确配置了环境。

更多情况也希望您能补充更多日志来协助我们分析

@xxr3376
Copy link
Member

xxr3376 commented Sep 25, 2020

closed due to inactive.

@xxr3376 xxr3376 closed this as completed Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants