Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #809

Closed
AKMourato opened this issue Aug 22, 2021 · 3 comments
Assignees
Labels
FAQ good first issue Good for newcomers

Comments

@AKMourato
Copy link

Greetings.
I'm getting this error at the start of a U-Net training with a custom dataset.

2021-08-23 00:02:42,352 - mmseg - INFO - Loaded 1408 images
2021-08-23 00:02:45,215 - mmseg - INFO - Loaded 245 images
2021-08-23 00:02:45,216 - mmseg - INFO - load checkpoint from ./mmsegmentation/deeplabv3_unet_s5-d16_256x256_40k_hrf_20201226_094047-3a1fdf85.pth
2021-08-23 00:02:45,217 - mmseg - INFO - Use load_from_local loader
2021-08-23 00:02:45,305 - mmseg - INFO - Start running, host: amourato@cslave, work_dir: /home/amourato/VM/UNET
2021-08-23 00:02:45,306 - mmseg - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2021-08-23 00:02:45,307 - mmseg - INFO - workflow: [('train', 1)], max: 500 iters

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-b622679dccf6> in <module>
     16 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
     17 train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
---> 18                 meta=dict())

~/VM/mmsegmentation/mmseg/apis/train.py in train_segmentor(model, dataset, cfg, distributed, validate, timestamp, meta)
    118     elif cfg.load_from:
    119         runner.load_checkpoint(cfg.load_from)
--> 120     runner.run(data_loaders, cfg.workflow)

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in run(self, data_loaders, workflow, max_iters, **kwargs)
    131                     if mode == 'train' and self.iter >= self._max_iters:
    132                         break
--> 133                     iter_runner(iter_loaders[i], **kwargs)
    134 
    135         time.sleep(1)  # wait for some hooks like loggers to finish

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in train(self, data_loader, **kwargs)
     58         data_batch = next(data_loader)
     59         self.call_hook('before_train_iter')
---> 60         outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
     61         if not isinstance(outputs, dict):
     62             raise TypeError('model.train_step() must return a dict')

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
     65 
     66         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 67         return self.module.train_step(*inputs[0], **kwargs[0])
     68 
     69     def val_step(self, *inputs, **kwargs):

~/VM/mmsegmentation/mmseg/models/segmentors/base.py in train_step(self, data_batch, optimizer, **kwargs)
    136                 averaging the logs.
    137         """
--> 138         losses = self(**data_batch)
    139         loss, log_vars = self._parse_losses(losses)
    140 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
     95                                 'method of nn.Module')
     96             if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
---> 97                 return old_func(*args, **kwargs)
     98 
     99             # get the arg spec of the decorated method

~/VM/mmsegmentation/mmseg/models/segmentors/base.py in forward(self, img, img_metas, return_loss, **kwargs)
    106         """
    107         if return_loss:
--> 108             return self.forward_train(img, img_metas, **kwargs)
    109         else:
    110             return self.forward_test(img, img_metas, **kwargs)

~/VM/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in forward_train(self, img, img_metas, gt_semantic_seg)
    137         """
    138 
--> 139         x = self.extract_feat(img)
    140 
    141         losses = dict()

~/VM/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in extract_feat(self, img)
     63     def extract_feat(self, img):
     64         """Extract features from images."""
---> 65         x = self.backbone(img)
     66         if self.with_neck:
     67             x = self.neck(x)

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/VM/mmsegmentation/mmseg/models/backbones/unet.py in forward(self, x)
    406         enc_outs = []
    407         for enc in self.encoder:
--> 408             x = enc(x)
    409             enc_outs.append(x)
    410         dec_outs = [x]

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
    137     def forward(self, input):
    138         for module in self:
--> 139             input = module(input)
    140         return input
    141 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/VM/mmsegmentation/mmseg/models/backbones/unet.py in forward(self, x)
     83             out = cp.checkpoint(self.convs, x)
     84         else:
---> 85             out = self.convs(x)
     86         return out
     87 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
    137     def forward(self, input):
    138         for module in self:
--> 139             input = module(input)
    140         return input
    141 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py in forward(self, x, activate, norm)
    198                 x = self.conv(x)
    199             elif layer == 'norm' and norm and self.with_norm:
--> 200                 x = self.norm(x)
    201             elif layer == 'act' and activate and self.with_activation:
    202                 x = self.activate(x)

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py in forward(self, input)
    729             if self.process_group:
    730                 process_group = self.process_group
--> 731             world_size = torch.distributed.get_world_size(process_group)
    732             need_sync = world_size > 1
    733 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in get_world_size(group)
    746         return -1
    747 
--> 748     return _get_group_size(group)
    749 
    750 

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in _get_group_size(group)
    272     """
    273     if group is GroupMember.WORLD or group is None:
--> 274         default_pg = _get_default_group()
    275         return default_pg.size()
    276     if group not in _pg_group_ranks:

~/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in _get_default_group()
    356     """
    357     if not is_initialized():
--> 358         raise RuntimeError("Default process group has not been initialized, "
    359                            "please make sure to call init_process_group.")
    360     return GroupMember.WORLD

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

This is strange because somehow the process is assuming distributed data-parallel while the flag is false and the GPU range is 1.

settings:

cfg = Config.fromfile('./mmsegmentation/configs/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf.py')
from mmseg.apis import set_random_seed
path = '/home/amourato/VM/New_COCO_DPD/openmm/' 
cfg.data_root = path
cfg.data.train.dataset.data_root = path
cfg.data.train.dataset.img_dir = path+'img_dir/train'
cfg.data.train.dataset.ann_dir = path+'ann_dir/train'
cfg.data.samples_per_gpu = 1
cfg.data.workers_per_gpu = 4
cfg.data.val.data_root = './New_COCO_DPD/openmm/'
cfg.data.val.img_dir = 'img_dir/test'
cfg.data.val.ann_dir = 'ann_dir/test'
cfg.data.test.data_root = './New_COCO_DPD/openmm/'
cfg.data.test.img_dir = 'img_dir/test'
cfg.data.test.ann_dir = 'ann_dir/test'
cfg.load_from = './mmsegmentation/deeplabv3_unet_s5-d16_256x256_40k_hrf_20201226_094047-3a1fdf85.pth'
cfg.work_dir = './UNET'
cfg.optimizer.lr = 0.003
cfg.runner.max_iters = 500
cfg.log_config.interval = 100
cfg.evaluation.interval = 500
cfg.checkpoint_config.interval = 500
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)

config:

Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(
        type='UNet',
        in_channels=3,
        base_channels=64,
        num_stages=5,
        strides=(1, 1, 1, 1, 1),
        enc_num_convs=(2, 2, 2, 2, 2),
        dec_num_convs=(2, 2, 2, 2),
        downsamples=(True, True, True, True),
        enc_dilations=(1, 1, 1, 1, 1),
        dec_dilations=(1, 1, 1, 1),
        with_cp=False,
        conv_cfg=None,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        act_cfg=dict(type='ReLU'),
        upsample_cfg=dict(type='InterpConv'),
        norm_eval=False),
    decode_head=dict(
        type='ASPPHead',
        in_channels=64,
        in_index=4,
        channels=16,
        dilations=(1, 12, 24, 36),
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=128,
        in_index=3,
        channels=64,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    train_cfg=dict(),
    test_cfg=dict(mode='slide', crop_size=(256, 256), stride=(170, 170)))
dataset_type = 'HRFDataset'
data_root = '/home/amourato/VM/New_COCO_DPD/openmm/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
img_scale = (2336, 3504)
crop_size = (256, 256)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(2336, 3504), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(256, 256), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(256, 256), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2336, 3504),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=40000,
        dataset=dict(
            type='HRFDataset',
            data_root='/home/amourato/VM/New_COCO_DPD/openmm/',
            img_dir='/home/amourato/VM/New_COCO_DPD/openmm/img_dir/train',
            ann_dir='/home/amourato/VM/New_COCO_DPD/openmm/ann_dir/train',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations'),
                dict(
                    type='Resize',
                    img_scale=(2336, 3504),
                    ratio_range=(0.5, 2.0)),
                dict(
                    type='RandomCrop',
                    crop_size=(256, 256),
                    cat_max_ratio=0.75),
                dict(type='RandomFlip', prob=0.5),
                dict(type='PhotoMetricDistortion'),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size=(256, 256), pad_val=0, seg_pad_val=255),
                dict(type='DefaultFormatBundle'),
                dict(type='Collect', keys=['img', 'gt_semantic_seg'])
            ])),
    val=dict(
        type='HRFDataset',
        data_root='./New_COCO_DPD/openmm/',
        img_dir='img_dir/test',
        ann_dir='ann_dir/test',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2336, 3504),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='HRFDataset',
        data_root='./New_COCO_DPD/openmm/',
        img_dir='img_dir/test',
        ann_dir='ann_dir/test',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2336, 3504),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=100, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = './mmsegmentation/deeplabv3_unet_s5-d16_256x256_40k_hrf_20201226_094047-3a1fdf85.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.003, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=500)
checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mDice', pre_eval=True)
work_dir = './UNET'
seed = 0
gpu_ids = range(0, 1)

running:

from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor


datasets = [build_dataset(cfg.data.train)]
datasets[0].CLASSES = ('background', 'ss')


model = build_segmentor(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES


mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
                meta=dict())

Environment

sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce GTX 1070
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.4.r11.4/compiler.30188945_0
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMSegmentation: 0.16.0+e235c1a

Any solution?

@MengzhangLI
Copy link
Contributor

Hi, it seems that it is caused by your local single GPU enviroment while default MMSegmentation is 4 GPUs.

Trying to change norm_cfg = dict(type='SyncBN', requires_grad=True) to norm_cfg = dict(type='BN', requires_grad=True)

Please let us know whether your problem could be fixed.

Best,

@Junjun2016
Copy link
Collaborator

If you use SyncBN, you should use distributed training.

Refer to #772.

@AKMourato
Copy link
Author

Quite right, my lapse.
Running, thanks.

@MengzhangLI MengzhangLI self-assigned this Aug 23, 2021
@MengzhangLI MengzhangLI added FAQ good first issue Good for newcomers labels Aug 23, 2021
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023
sibozhang pushed a commit to sibozhang/mmsegmentation that referenced this issue Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FAQ good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants