Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_from conflicts with resume #245

Closed
3 tasks done
serser opened this issue Nov 5, 2022 · 1 comment
Closed
3 tasks done

load_from conflicts with resume #245

serser opened this issue Nov 5, 2022 · 1 comment
Labels

Comments

@serser
Copy link

serser commented Nov 5, 2022

Prerequisite

🐞 Describe the bug

I am training with COCO pretrained YOLOv5m model (specifying load_from=xxx in the config) on a custom dataset, after training for a while, the training stopped on exception. I now try with --resume to resume the training. It apparently loads load_from other than saved ckpt under work_dir. I now need to remove load_from from config to resume. So what is the suggested way to do this?

Traceback (most recent call last):
  File "tools/train.py", line 106, in <module>
    main()
  File "tools/train.py", line 102, in main
    runner.train()
  File "/anaconda3/envs/mm2_cu11/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1653, in train
    self.load_or_resume()
  File "/anaconda3/envs/mm2_cu11/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1599, in load_or_resume
    self.resume(resume_from)
  File "/anaconda3/envs/mm2_cu11/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1954, in resume
    self.message_hub.load_state_dict(checkpoint['message_hub'])
KeyError: 'message_hub'
/anaconda3/envs/mm2_cu11/lib/python3.8/site-packages/mmengine/runner/runner.py:1948: UserWarning: The dataset metainfo from the resumed checkpoint is different from the current training dataset, please check the correctness of the checkpoint or the training dataset.

Environment

sys.platform: linux
Python: 3.8.13 (default, Oct 21 2022, 23:50:54) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.0, V11.0.221
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1+cu110
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2+cu110
OpenCV: 4.6.0
MMEngine: 0.3.0
MMCV: 2.0.0rc2
MMDetection: 3.0.0rc2
MMYOLO: 0.1.2+dc3377b

Additional information

No response

@serser serser changed the title Loading COCO pretrained conflict with resume load_from conflicts with resume Nov 5, 2022
@RangeKing
Copy link
Collaborator

RangeKing commented Nov 5, 2022

If load_from and resume=True are both set, only load_from will take effect according to MMEngine docs. It could be more convenient.
I already sent feedback to mmengine developers. They will discuss it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants