Classification references does not work without distributed setup #6529

pmeier · 2022-09-01T08:26:46Z

If you don't set the respective env vars

vision/references/classification/utils.py

Lines 255 to 258 in d5bd8b7

    
           else: 
        
               print("Not using distributed mode") 
        
               args.distributed = False 
        
               return

training will not be distributed and in turn the backend will not be initialized. However, during evaluation we check

vision/references/classification/train.py

Line 88 in d5bd8b7

and torch.distributed.get_rank() == 0

unguarded, which then fails with

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

cc @datumbox

The text was updated successfully, but these errors were encountered:

pmeier · 2022-09-12T15:32:11Z

Same for segmentation:

vision/references/segmentation/train.py

Line 84 in cac4e22

and torch.distributed.get_rank() == 0

YosuaMichael · 2022-09-14T16:10:25Z

I think this case is implicitly guarded on

vision/references/classification/train.py

Line 87 in d5bd8b7

and len(data_loader.dataset) != num_processed_samples

since len(data_loader.dataset) != num_processed_samples shouldn't be true on non-distributed setting.

Do you get the error during non-distributed training @pmeier ?

pmeier added bug module: reference scripts labels Sep 1, 2022

datumbox added the help wanted label Sep 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification references does not work without distributed setup #6529

Classification references does not work without distributed setup #6529

pmeier commented Sep 1, 2022 •

edited

pmeier commented Sep 12, 2022

YosuaMichael commented Sep 14, 2022

Classification references does not work without distributed setup #6529

Classification references does not work without distributed setup #6529

Comments

pmeier commented Sep 1, 2022 • edited

pmeier commented Sep 12, 2022

YosuaMichael commented Sep 14, 2022

pmeier commented Sep 1, 2022 •

edited