Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**KAGGLE** --- mmagic error - undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs #2904

Open
MasterHM-ml opened this issue Aug 14, 2023 · 7 comments
Assignees

Comments

@MasterHM-ml
Copy link

MasterHM-ml commented Aug 14, 2023

          cd mmcv && git checkout 2.x

Originally posted by @uniyushu in #2660 (comment)

cd mmcv && git checkout 2.x

I'm using mmcv=2.0.1, and still facing the same issue. I installed mmcv via mim. Here is how I installed it on Kaggle

!pip3 install -U openmim
!mim install 'mmcv>=2.0.0'
!mim install 'mmengine'

%cd /kaggle/working
!rm -rf mmagic
!git clone https://github.com/open-mmlab/mmagic.git
%cd mmagic
!pip3 install -e . -v

!python -c "import mmagic; print(mmagic.__version__)"

No error in installation.

But, I'm getting the error when calling !python3 tools/train.py "configs/edsr/edsr_x2c64b16_1xb16-300k_UCMerced.py" --auto-scale-lr
Here is the stack trace cutted from last calls

after printing logs, it first shows some warnings

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
  warnings.warn(f"file system plugins are not loaded: {e}")

and then an error

...
...
...
 /opt/conda/lib/python3.10/site-packages/mmcv/utils/ext_loader.py:13 in       │
│ load_ext                                                                     │
│                                                                              │
│   10 if torch.__version__ != 'parrots':                                      │
│   11 │                                                                       │
│   12 │   def load_ext(name, funcs):                                          │
│ ❱ 13 │   │   ext = importlib.import_module('mmcv.' + name)                   │
│   14 │   │   for fun in funcs:                                               │
│   15 │   │   │   assert hasattr(ext, fun), f'{fun} miss in module {name}'    │
│   16 │   │   return ext                                                      │
│                                                                              │
│ /opt/conda/lib/python3.10/importlib/__init__.py:126 in import_module         │
│                                                                              │
│   123 │   │   │   if character != '.':                                       │
│   124 │   │   │   │   break                                                  │
│   125 │   │   │   level += 1                                                 │
│ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)        │
│   127                                                                        │
│   128                                                                        │
│   129 _RELOADING = {}    
ImportError: 
/opt/conda/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.s
o: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

Any solution to the problem or clue to debug will be highly helpful and appreciated. Thank you.
Same code works fine in Colab.

@zengyh1900
Copy link
Collaborator

hi @MasterHM-ml , it seems that mmcv was not successfully installed. Can you reinstall mmcv again and check whether it is installed successfully? You may refer https://mmcv.readthedocs.io/en/latest/get_started/installation.html# to install mmcv

@MasterHM-ml
Copy link
Author

Hello, @zengyh1900 - thanks for the update. But I installed the mmcv according to the official documentation guidelines. Here is the gist to see a complete detailed stack trace.

@zengyh1900
Copy link
Collaborator

hi @zhouzaida I think the error comes from https://gist.github.com/MasterHM-ml/619dee045ce44c5184cd93cb833328b1#file-gistfile1-txt-L1120 , where the codes try to import ops from mmcv. Is it caused by installing the wrong version of mmcv in different platform? Do you have any ideas?

@MasterHM-ml
Copy link
Author

Any update?

@zengyh1900 zengyh1900 transferred this issue from open-mmlab/mmagic Aug 21, 2023
@zengyh1900 zengyh1900 assigned zhouzaida and unassigned zengyh1900 Aug 21, 2023
@tomarvimal
Copy link

I am also facing the same issue!

@uniyushu
Copy link
Contributor

Try mmagic docker ?
image

or maybe it cause by pytorch 2.x version
try 1.x
conda install pytorch=1.10

@VadimShabashov
Copy link

For those who are still struggling to install and use mmcv.
I tried the officially recommended approach (https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-mim-recommended) as well as the instruction from this comment (open-mmlab/mmdetection#10401 (comment)).
They didn't work for me.
However, I noticed that there is no error when running in a CPU-only regime on Kaggle. So, I suspected there might be conflicts with the latest CUDA (I had CUDA 12.1 in my environment).
After I downgraded CUDA (downgraded by finding an old notebook with a pinned environment) to 11.3, everything started to work.
Here is a notebook with the pinned environment (CUDA 11.3), where no errors appear in mmcv:
https://www.kaggle.com/code/vadimshabashov/mmdetection-startup-on-kaggle?scriptVersionId=180583679

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants