Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error handling and faq.md #2976

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
6 changes: 5 additions & 1 deletion docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Feel free to enrich the list if you find any frequent issues and have ways to he
The registry mechanism will be triggered only when the file of the module is imported.
So you need to import that file somewhere. More details can be found at [KeyError: "MaskRCNN: 'RefineRoIHead is not in the models registry'"](https://github.com/open-mmlab/mmdetection/issues/5974).

- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"
- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"; "ImportError: DLL load failed while importing \_ext"

1. Uninstall existing mmcv in the environment using `pip uninstall mmcv`
2. Install mmcv-full following the [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) or [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html)
Expand Down Expand Up @@ -91,3 +91,7 @@ Feel free to enrich the list if you find any frequent issues and have ways to he
- "RuntimeError: Trying to backward through the graph a second time"

`GradientCumulativeOptimizerHook` and `OptimizerHook` are both set which causes the `loss.backward()` to be called twice so `RuntimeError` was raised. We can only use one of these. More datails at [Trying to backward through the graph a second time](https://github.com/open-mmlab/mmcv/issues/1379).

- "RuntimeError: xxx: implementation for device cuda:0 not found."

This error indicates that maybe mmcv was not installed with cuda op support. You can uninstall and install with `pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html`.
6 changes: 5 additions & 1 deletion docs/zh_cn/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

只有模块所在的文件被导入时,注册机制才会被触发,所以您需要在某处导入该文件,更多详情请查看 [KeyError: "MaskRCNN: 'RefineRoIHead is not in the models registry'"](https://github.com/open-mmlab/mmdetection/issues/5974)。

- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"
- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"; "ImportError: DLL load failed while importing \_ext"

1. 使用 `pip uninstall mmcv` 卸载您环境中的 mmcv
2. 参考 [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) 或者 [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html) 安装 mmcv-full
Expand Down Expand Up @@ -89,3 +89,7 @@
- "RuntimeError: Trying to backward through the graph a second time"

不能同时设置 `GradientCumulativeOptimizerHook` 和 `OptimizerHook`,这会导致 `loss.backward()` 被调用两次,于是程序抛出 `RuntimeError`。我们只需设置其中的一个。更多细节见 [Trying to backward through the graph a second time](https://github.com/open-mmlab/mmcv/issues/1379)。

- "RuntimeError: xxx: implementation for device cuda:0 not found."

这个错误是因为mmcv可能没有安装cuda-op支持。您可以卸载mmcv并使用以下命令进行安装`pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html`。
4 changes: 3 additions & 1 deletion mmcv/ops/csrc/common/pytorch_device_registry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,9 @@ auto Dispatch(const R& registry, const char* name, Args&&... args) {
" vs ", GetDeviceStr(device).c_str(), "\n")
auto f_ptr = registry.Find(device.type());
TORCH_CHECK(f_ptr != nullptr, name, ": implementation for device ",
GetDeviceStr(device).c_str(), " not found.\n")
GetDeviceStr(device).c_str(), " not found.\n",
"For more information, see ",
"https://github.com/open-mmlab/mmcv/blob/main/docs/en/faq.md \n")
return f_ptr(std::forward<Args>(args)...);
}

Expand Down
69 changes: 68 additions & 1 deletion mmcv/utils/ext_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,76 @@
import torch

if torch.__version__ != 'parrots':
"""Three subclasses of ImportError are defined in order to help users solve
the following errors.

1. DLL load failed while importing _ext
https://github.com/open-mmlab/mmcv/issues/2937

2. undefined symbol
https://github.com/open-mmlab/mmcv/issues/2904

3. No module named 'mmcv._ext'
https://github.com/open-mmlab/mmcv/issues/2929
"""

class ExtImportError(ImportError):

def __init__(self, arg):
print(arg)
print(
'mmcv is installed incorrectly.',
'1. Uninstall existing mmcv',
'2. Install mmcv-full',
'For more information, see',
'https://github.com/open-mmlab/mmcv/blob/main/docs/en/faq.md',
sep='\n')

class UndefinedSymbolError(ImportError):

def __init__(self, arg):
print(arg)
print(
'1. For CUDA/C++ symbols, '
'check whether the CUDA/GCC runtimes are the same '
'as those used for compiling mmcv. ',
'2. For Pytorch symbols, '
'check whether the Pytorch version is the same '
'as that used for compiling mmcv.',
'For more information, see '
'https://github.com/open-mmlab/mmcv/blob/main/docs/en/faq.md',
sep='\n')

class ExtNotFoundError(ModuleNotFoundError):

def __init__(self, arg):
print(arg)
print(
'mmcv is installed incorrectly.',
'1. Uninstall existing mmcv',
'2. Install mmcv-full',
'For more information, see',
'https://github.com/open-mmlab/mmcv/blob/main/docs/en/faq.md',
sep='\n')

def load_ext(name, funcs):
ext = importlib.import_module('mmcv.' + name)
try:
ext = importlib.import_module('mmcv.' + name)
except Exception as e:
exception_inf = str(e)

message_error = 'DLL load failed while importing _ext'
if message_error in exception_inf:
raise ExtImportError(exception_inf)

message_error = 'undefined symbol'
if message_error in exception_inf:
raise UndefinedSymbolError(exception_inf)

message_error = "No module named 'mmcv._ext'"
if message_error in exception_inf:
raise ExtNotFoundError(exception_inf)

for fun in funcs:
assert hasattr(ext, fun), f'{fun} miss in module {name}'
return ext
Expand Down