Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Sparse transfer learning training error: AssertionError: min nan should be less than max nan #915

@mate-hegedus

Description

@mate-hegedus

Describe the bug
After training the first batch before starting the testing the training stops with "assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan"

Expected behavior
A clear and concise description of what you expected to happen.

Environment
Include all relevant environment information:

  1. Ubuntu 18.04
  2. Python version 3.8
  3. SparseML version or commit hash [e.g. 0.1.0, f7245c8]:
  4. ML framework version 1.9.1+cu102
  5. Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce
python3 train.py --data x-data.yaml --cfg ../models_v5.0/yolov5l.yaml --weights zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95?recipe_type=transfer --hyp data/hyps/hyp.finetune.yaml --recipe ../recipes/yolov5.transfer_learn_pruned_quantized.md --batch-size -1 --img 1024 --epochs 10

Errors
Epoch gpu_mem box obj cls labels img_size
0/51 6.06G 0.02741 0.01176 0 2 1024: 100%|██████████| 10211/10211 [1:48:05<00:00, 1.57it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 0%| | 0/116 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 681, in
main(opt)
File "train.py", line 577, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 402, in train
results, maps, _ = val.run(data_dict,
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/val.py", line 192, in run
out, train_out = model(im) if training else model(im, augment=augment, val=True) # inference, loss outputs
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/yolo.py", line 128, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/yolo.py", line 151, in _forward_once
x = m(x) # run
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/common.py", line 208, in forward
return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/sparseml/integrations/ultralytics-yolov5/yolov5/models/common.py", line 47, in forward
return self.act(self.bn(self.conv(x)))
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/stubs.py", line 56, in forward
X = self.quant(X)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1076, in _call_impl
hook_result = hook(self, input, result)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/quantize.py", line 83, in _observer_forward_hook
return self.activation_post_process(output)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/fake_quantize.py", line 131, in forward
_scale, _zero_point = self.calculate_qparams()
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/fake_quantize.py", line 126, in calculate_qparams
return self.activation_post_process.calculate_qparams()
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/observer.py", line 410, in calculate_qparams
return self._calculate_qparams(self.min_val, self.max_val)
File "/home/mate/Developer/recog/yolov5-neural/venv/lib/python3.8/site-packages/torch/quantization/observer.py", line 250, in _calculate_qparams
assert min_val <= max_val, "min {} should be less than max {}".format(
AssertionError: min nan should be less than max nan

Additional context
This is a 1 class training, .yaml has been changed according to this.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions