[Bug] Default process group has not been initialized with autoslim search #74

twmht · 2022-02-10T03:00:06Z

I tried to search subnets from supernet with autoslim.

python ./tools/mmcls/search_mmcls.py \
  configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_in1k.py \
  your_pre-training_checkpoint_path \
  --work-dir your_work_dir

Do I need to use distributed mode when searching?

The text was updated successfully, but these errors were encountered:

HIT-cwh · 2022-02-10T06:48:38Z

Thank you for your issue.
At present, distributed mode is needed when searching even if only one gpu is used. It is hacky and we are refactoring the search part. The new version will no longer have this problem.

tanghy2016 · 2022-04-15T05:43:01Z

这个问题现在的版本解决了吗？我也遇到一样的问题

HIT-cwh · 2022-04-15T06:18:50Z

You can avoid this by trying distributed mode.

Plus, using English is more appreciated for better community discussion around the world.

tanghy2016 · 2022-04-15T06:24:03Z

where to do the setup you said

HIT-cwh · 2022-04-15T06:35:07Z

where to do the setup you said

You can set the job launcher to one of pytorch, slurm or mpi (ref to here ) to use distributed mode.

tanghy2016 · 2022-04-19T08:53:52Z

$ python ./tools/mmcls/search_mmcls.py \
>   configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_ci10.py \
>   output/epoch_50.pth \
>   --work-dir output \
>   --launcher pytorch
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
Traceback (most recent call last):
  File "./tools/mmcls/search_mmcls.py", line 181, in <module>
    main()
  File "./tools/mmcls/search_mmcls.py", line 99, in main
    init_dist(args.launcher, **cfg.dist_params)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
    _init_dist_pytorch(backend, **kwargs)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 29, in _init_dist_pytorch
    rank = int(os.environ['RANK'])
  File "/usr/lib64/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'RANK'

Is it necessary to configure cfg.dist_params? And, how to configure it?

tanghy2016 · 2022-04-19T09:11:45Z

$ python ./tools/mmcls/search_mmcls.py \
>   configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_ci10.py \
>   output/epoch_50.pth \
>   --work-dir output \
>   --launcher pytorch
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
Traceback (most recent call last):
  File "./tools/mmcls/search_mmcls.py", line 181, in <module>
    main()
  File "./tools/mmcls/search_mmcls.py", line 99, in main
    init_dist(args.launcher, **cfg.dist_params)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
    _init_dist_pytorch(backend, **kwargs)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 29, in _init_dist_pytorch
    rank = int(os.environ['RANK'])
  File "/usr/lib64/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'RANK'

Is it necessary to configure cfg.dist_params? And, how to configure it?

it's runing, use the following command:

$ RANK=0 WORLD_SIZE=1 MASTER_ADDR=127.0.0.1 MASTER_PORT=1692 python ./tools/mmcls/search_mmcls.py \
  configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_ci10.py \
  output/epoch_50.pth \
  --work-dir output \
  --launcher pytorch

tanghy2016 · 2022-04-19T09:14:59Z

$ python ./tools/mmcls/search_mmcls.py \
>   configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_ci10.py \
>   output/epoch_50.pth \
>   --work-dir output \
>   --launcher pytorch
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting OMP_NUM_THREADS environment variable for each process '
/home/tanghuayang/venv_torch/lib/python3.6/site-packages/mmrazor/utils/setup_env.py:43: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  f'Setting MKL_NUM_THREADS environment variable for each process '
Traceback (most recent call last):
  File "./tools/mmcls/search_mmcls.py", line 181, in <module>
    main()
  File "./tools/mmcls/search_mmcls.py", line 99, in main
    init_dist(args.launcher, **cfg.dist_params)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
    _init_dist_pytorch(backend, **kwargs)
  File "/home/tanghuayang/venv_torch/lib64/python3.6/site-packages/mmcv/runner/dist_utils.py", line 29, in _init_dist_pytorch
    rank = int(os.environ['RANK'])
  File "/usr/lib64/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'RANK'

Is it necessary to configure cfg.dist_params? And, how to configure it?

it's runing, use the following command:

$ RANK=0 WORLD_SIZE=1 MASTER_ADDR=127.0.0.1 MASTER_PORT=1692 python ./tools/mmcls/search_mmcls.py \
  configs/pruning/autoslim/autoslim_mbv2_search_8xb1024_ci10.py \
  output/epoch_50.pth \
  --work-dir output \
  --launcher pytorch

but, how to write these configuration parameters into cfg.dist_params?

* [Refactor] Refactor configs according to new standard (open-mmlab#67) * modify cfg and cfg_util * modify tensorrt config * fix bug * lint * Fix 1. Delete print 2. Modify the return value from "False, None" to "None" and related code 3. Rename 2 get functions * modify apply_marks * [Feature] Refactor ocr config (open-mmlab#71) * add text detection config refactor * add text recognition refactor * add static exporting for mmocr * fix lint * set max space in child config * use Sequence[int] instead * add assert input_shape * fix static bug and add ppl ort and trt static (open-mmlab#77) * [Feature] Refine setup.py (open-mmlab#61) * add setup.py and related files * lint * Edit requirements * modify onnx version * modify according to comments * [Refactor] Refactor mmseg configs (open-mmlab#73) * refactor mmseg config * change create_input * fix lint * fix lint * fix lint * fix yapf * fix yapf * update export * remove Segmentation * remove tast assert * add onnx_config * remove hardcode * Inherit with static * Remove blank line * Add segmentation task enum * add assert task * mmocr version 0.3.0 (open-mmlab#79) * add dump_info * [Feature]: Refactor config in mmdet (open-mmlab#75) * support onnxruntime * add two stage * test two-stage ort and ppl * update fcos post_params * fix calib * test ok with maskrcnn dynamic * add empty line * add static into config filename * add input_shape to create_input in mmdet * add static to some configs * remove todo codes * remove partition config in base * refactor create_input * rename task name in mmdet * return None if input_shape is None * add size info into mmdet configs filenames * reorganize mmdet configs * add object detection task for mmdet * rename get_mmdet_params * keep naming style consistent * update post_params for fcos * fix typo in ncnn config * [Refactor] Refactor mmedit static config (open-mmlab#78) * add static cfg * update create_input * [Refactor]: Refactor mmcls configs (open-mmlab#74) * refactor mmcls2.0 * fix classify_tensorrt_dynamic.py * fix classify_tensorrt_dynmic.py * classify_tensorrt_dynamic_int8.py * fix file name * fix ncnn ppl * updata prepare_input.py * update utils.py * updata constant.py * add * fix prepare_input.py * fix prepare_input.py * add static config file * add blank lines * fix prepare_input.py(wait test) * fix input_shape(wait test) * Update prepare_input.py * fix classification_tensorrt_dynamic(wait test) * fix classification_tensorrt_dynamic_int8(wait test) * fix classification_tensorrt_static_int8(wait test) * Rename classification_tensorrt_dynamic.py to classification_tensorrt_dynamic-224x224-224x224.py * Rename classification_tensorrt_dynamic_int8.py to classification_tensorrt_dynamic_int8-224x224-224x224.py * Rename classification_tensorrt_dynamic_int8-224x224-224x224.py to classification_tensorrt_int8_dynamic_224x224-224x224.py * Rename classification_tensorrt_dynamic-224x224-224x224.py to classification_tensorrt_dynamic_224x224-224x224.py * Rename classification_tensorrt_static.py to classification_tensorrt_static_224x224.py * Rename classification_tensorrt_static_int8.py to classification_tensorrt_int8_static_224x224.py * Update prepare_input.py * Rename classification_tensorrt_dynamic_224x224-224x224.py to classification_tensorrt_dynamic-224x224-224x224.py * Rename classification_tensorrt_int8_dynamic_224x224-224x224.py to classification_tensorrt_int8-dynamic_224x224-224x224.py * Rename classification_tensorrt_int8-dynamic_224x224-224x224.py to classification_tensorrt_int8_dynamic-224x224-224x224.py * Rename classification_tensorrt_int8_static_224x224.py to classification_tensorrt_int8_static-224x224.py * Rename classification_tensorrt_static_224x224.py to classification_tensorrt_static-224x224.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * change logging msg Co-authored-by: maningsheng <mnsheng@yeah.net> * fix * fix else branch * fix bug for trt in mmseg * enable dump trt info * fix trt static for mmdet * remove two-stage_partition_tensorrt_static-800x1344 config * fix wrong backend in ppl config * fix partition calibration Co-authored-by: Yifan Zhou <singlezombie@163.com> Co-authored-by: AllentDan <41138331+AllentDan@users.noreply.github.com> Co-authored-by: hanrui1sensetime <83800577+hanrui1sensetime@users.noreply.github.com> Co-authored-by: RunningLeon <maningsheng@sensetime.com> Co-authored-by: VVsssssk <88368822+VVsssssk@users.noreply.github.com> Co-authored-by: maningsheng <mnsheng@yeah.net> Co-authored-by: AllentDan <AllentDan@yeah.net>

twmht added the bug Something isn't working label Feb 10, 2022

twmht changed the title ~~[Bug] Default process group has not been initialized~~ [Bug] Default process group has not been initialized with autoslim search Feb 10, 2022

twmht closed this as completed Feb 10, 2022

pppppM assigned HIT-cwh Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Default process group has not been initialized with autoslim search #74

[Bug] Default process group has not been initialized with autoslim search #74

twmht commented Feb 10, 2022

HIT-cwh commented Feb 10, 2022

tanghy2016 commented Apr 15, 2022

HIT-cwh commented Apr 15, 2022

tanghy2016 commented Apr 15, 2022

HIT-cwh commented Apr 15, 2022

tanghy2016 commented Apr 19, 2022

tanghy2016 commented Apr 19, 2022

tanghy2016 commented Apr 19, 2022

[Bug] Default process group has not been initialized with autoslim search #74

[Bug] Default process group has not been initialized with autoslim search #74

Comments

twmht commented Feb 10, 2022

HIT-cwh commented Feb 10, 2022

tanghy2016 commented Apr 15, 2022

HIT-cwh commented Apr 15, 2022

tanghy2016 commented Apr 15, 2022

HIT-cwh commented Apr 15, 2022

tanghy2016 commented Apr 19, 2022

tanghy2016 commented Apr 19, 2022

tanghy2016 commented Apr 19, 2022