-
Notifications
You must be signed in to change notification settings - Fork 158
Sparse transfer learning training error: AssertionError: min nan should be less than max nan #915
Description
Describe the bug
After training the first batch before starting the testing the training stops with "assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan"
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
- Ubuntu 18.04
- Python version 3.8
- SparseML version or commit hash [e.g. 0.1.0,
f7245c8]: - ML framework version 1.9.1+cu102
- Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]:
- Other relevant environment information [e.g. hardware, CUDA version]:
To Reproduce
python3 train.py --data x-data.yaml --cfg ../models_v5.0/yolov5l.yaml --weights zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95?recipe_type=transfer --hyp data/hyps/hyp.finetune.yaml --recipe ../recipes/yolov5.transfer_learn_pruned_quantized.md --batch-size -1 --img 1024 --epochs 10
Errors
Epoch gpu_mem box obj cls labels img_size
0/51 6.06G 0.02741 0.01176 0 2 1024: 100%|██████████| 10211/10211 [1:48:05<00:00, 1.57it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 0%| | 0/116 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 681, in
main(opt)
File "train.py", line 577, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 402, in train
results, maps, _ = val.run(data_dict,
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/val.py", line 192, in run
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/yolo.py", line 128, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/yolo.py", line 151, in _forward_once
x = m(x) # run
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/common.py", line 208, in forward
return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/common.py", line 47, in forward
return self.act(self.bn(self.conv(x)))
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/stubs.py", line 56, in forward
X = self.quant(X)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1076, in _call_impl
hook_result = hook(self, input, result)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/quantize.py", line 83, in _observer_forward_hook
return self.activation_post_process(output)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/fake_quantize.py", line 131, in forward
_scale, _zero_point = self.calculate_qparams()
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/fake_quantize.py", line 126, in calculate_qparams
return self.activation_post_process.calculate_qparams()
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/observer.py", line 410, in calculate_qparams
return self._calculate_qparams(self.min_val, self.max_val)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/observer.py", line 250, in _calculate_qparams
assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan
Additional context
This is a 1 class training, .yaml has been changed according to this.