Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in forward #3

Closed
Haochen-Wang409 opened this issue Oct 13, 2021 · 6 comments
Closed

Bug in forward #3

Haochen-Wang409 opened this issue Oct 13, 2021 · 6 comments

Comments

@Haochen-Wang409
Copy link

Hi, I am trying to run your code, and find a bug.

As you can see, the batch_size of images_unsup_strong is 1, which is not allowed in BatchNorm2d when the model is on train mode.
https://github.com/hzhupku/SemiSeg-AEL/blob/main/train.py#L351

The information for the bug is

File "../../train.py", line 497, in <module>
    main()
  File "../../train.py", line 134, in main
    labeled_epoch, model_teacher, trainloader_unsup, criterion_cons, class_criterion, cutmix_bank)
  File "../../train.py", line 344, in train
    preds_student_unsup = model(images_unsup_strong)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/wanghaochen/semiseg/semseg/models/model_helper.py", line 48, in forward
    pred_head = self.decoder([f1, f2,feat1, feat2])
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/wanghaochen/semiseg/semseg/models/decoder.py", line 54, in forward
    aspp_out = self.aspp(x4)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/wanghaochen/semiseg/semseg/models/base.py", line 46, in forward
    feat1 = F.upsample(self.conv1(x), size=(h, w), mode='bilinear', align_corners=True)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 539, in forward
    bn_training, exponential_average_factor, self.eps)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
    _verify_batch_size(input.size())
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.4/lib/python3.6/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
@hzhupku
Copy link
Owner

hzhupku commented Oct 13, 2021

Did you change the batch size?

@Haochen-Wang409
Copy link
Author

I change the batch size in config.yaml to 8

@wqhIris
Copy link

wqhIris commented Jan 28, 2022

Hi @Haochen-Wang409,
I also meet this bug, do you solve it?

@Amos1109
Copy link

Hi,Did you solve the question?

@Haochen-Wang409
Copy link
Author

Oh, I have solved the problem by using multi-GPUs, i.e., python -m torch.distributed.launch.

@Amos1109
Copy link

@Haochen-Wang409 thank you! Is there a better way to avoid using distributed training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants