Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the dimensionality of the mask. #3

Closed
MsWik opened this issue Aug 28, 2021 · 4 comments
Closed

Question about the dimensionality of the mask. #3

MsWik opened this issue Aug 28, 2021 · 4 comments

Comments

@MsWik
Copy link

MsWik commented Aug 28, 2021

Thank you for your work.

It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

@sithu31296
Copy link
Owner

Hello,
First, speed comparison will be added in near future.

About your question, the dimension of the PALEETE should be (num_classes, 3). Each value is a color value (R, G, B) for corresponding categorical value of 2 dimensional label image.

@MsWik
Copy link
Author

MsWik commented Aug 30, 2021

The thing is, when I train with my own dataset, I get the following error :

{'DATASET': {'NAME': 'ade20k',
'ROOT': '/content/data/ADEChallenge/ADEChallengeData2016'},
'DEVICE': 'cuda',
'EVAL': {'IMAGE_SIZE': [256, 256],
'MODEL_PATH': 'checkpoints/pretrained/segformer/segformer.b0.ade.pth',
'MSF': {'ENABLE': False,
'FLIP': True,
'SCALES': [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]}},
'LOSS': {'CLS_WEIGHTS': True, 'NAME': 'ohemce', 'THRESH': 0.7},
'MODEL': {'NAME': 'segformer',
'PRETRAINED': '/content/semantic-segmentation/mit_b0.pth',
'VARIANT': 'B0'},
'OPTIMIZER': {'LR': 0.001, 'NAME': 'adamw', 'WEIGHT_DECAY': 0.01},
'SAVE_DIR': 'output',
'SCHEDULER': {'NAME': 'warmuppolylr',
'POWER': 0.9,
'WARMUP': 10,
'WARMUP_RATIO': 0.1},
'TEST': {'FILE': 'assests/ade',
'IMAGE_SIZE': [256, 256],
'MODEL_PATH': 'checkpoints/pretrained/segformer/segformer.b0.ade.pth',
'OVERLAY': False},
'TRAIN': {'AMP': True,
'BATCH_SIZE': 64,
'DDP': False,
'EPOCHS': 20,
'EVAL_INTERVAL': 10,
'IMAGE_SIZE': [256, 256]}}
Found 16443 training images.
Found 5481 validation images.
Epoch: [1/20] Iter: [0/256] LR: 0.00100000 Loss: 0.00000000: 0% 0/256 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/content/semantic-segmentation/tools/train.py", line 132, in
main(cfg, gpu, save_dir)
File "/content/semantic-segmentation/tools/train.py", line 76, in main
loss = loss_fn(logits, lbl)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "./utils/losses.py", line 49, in forward
return self._forward(preds, labels)
File "./utils/losses.py", line 38, in _forward
loss = self.criterion(preds, labels).view(-1)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 1121, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2824, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [64, 3, 256, 256]

From which I conclude that the mask must be a [mxm] matrix...

I would like to understand why the learning rate increases over time and not decreases. I also get strange results:

Found 16443 training images.
Found 5481 validation images.
Epoch: [1/20] Iter: [0/256] LR: 0.00100000 Loss: 0.00000000: 0% 0/256 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Epoch: [0/20] Iter: [256/256] LR: 0.00019000 Loss: 2.17269479: 100% 256/256 [03:07<00:00, 1.36it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [1/20] Iter: [256/256] LR: 0.00028000 Loss: 1.29710598: 100% 256/256 [03:06<00:00, 1.37it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [2/20] Iter: [256/256] LR: 0.00037000 Loss: 1.13825008: 100% 256/256 [03:06<00:00, 1.37it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [3/20] Iter: [256/256] LR: 0.00046000 Loss: 1.02070744: 100% 256/256 [03:07<00:00, 1.36it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [4/20] Iter: [256/256] LR: 0.00055000 Loss: 1.02786795: 100% 256/256 [03:06<00:00, 1.38it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [5/20] Iter: [256/256] LR: 0.00064000 Loss: 0.94620133: 100% 256/256 [03:06<00:00, 1.37it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [6/20] Iter: [256/256] LR: 0.00073000 Loss: 0.93045863: 100% 256/256 [03:06<00:00, 1.37it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [7/20] Iter: [256/256] LR: 0.00082000 Loss: 1.36724860: 100% 256/256 [03:05<00:00, 1.38it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [8/20] Iter: [256/256] LR: 0.00091000 Loss: 1.74576994: 100% 256/256 [03:07<00:00, 1.37it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [9/20] Iter: [256/256] LR: 0.00100000 Loss: 1.71283879: 100% 256/256 [03:05<00:00, 1.38it/s]
Evaluating...
0% 0/5481 [00:00<?, ?it/s][W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
100% 5481/5481 [01:10<00:00, 78.11it/s]
Current mIoU: 0.4781 Best mIoU: 0.48
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Epoch: [10/20] Iter: [111/256] LR: 0.00096089 Loss: 1.36222436: 43% 111/256 [01:21<01:44, 1.39it/s]

Loss after 5-7 epochs begins to increase dramatically.

@sithu31296
Copy link
Owner

The mask should be in shape [B, H, W]; each value is the categorical value with range(0, num_classes) in training. Only then, we can use cross-entropy based loss.

The learning rate will increase until warmup epochs (defined in SCHEDULER > WARMUP) and then it will decrease. You can see the learning rate behavior by running scheduler.py. Actually, the loss increasing around warmup epoch is normal. It will decrease later.

About the warning on thread-pool, see this issue pytorch/pytorch#57273.

@MsWik
Copy link
Author

MsWik commented Aug 30, 2021

Thank you.

@MsWik MsWik closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants