Structured pruning for YOLOv5 #925

hawrot · 2022-06-29T09:13:44Z

I have been using sparseml for pruning YOLOv5 recently and I can see big improvement in inference time, however, the model size stays the same. I have realised that it happens due to unstrustured pruning which only fills the 0 rather than removing the weights.

I was wondering whether it is possible to implement structured pruning for YOLOv5

dbogunowicz · 2022-06-30T14:54:56Z

Hey @hawrot

Regarding model size:
Yes, this is true, the model size will not drop significantly due to unstructured pruning (for both pth and onnx representations I believe). However, it will be much more efficient to compress, if this is something you care about (large number of zero weights --> lower entropy of the data --> easier to compress).

Structured pruning of YOLOv5 is possible, we haven't looked into it yet though. It may be a bit tricky, due to the architecture of YOLO (e.g. the "long" skip connections may enforce the network to prune the same channel weights along the depth of the network, this may be slightly problematic). If you are interested in trying, here are the structured pruning modifiers that you can incorporate into your recipe. We can provide further advice if needed.

If the size is what you care about, we would recommend you to use YOLO quantization (if not already employed) not only for further speedup but also for significant model file size reduction

soohwanlim · 2022-07-01T08:07:00Z

I also pruned yolov5 through sparseml, but it didn't seem to have much effect.
Did you use

python train.py --cfg ../models_v5.0/yolov5s.yaml --weights PATH_TO_COCO_PRETRAINED_WEIGHTS --data coco.yaml --hyp data/hyps/hyp.scratch.yaml --recipe ../recipes/yolov5s.pruned.md

I want to know how to get a big effect. Can you tell me?

dbogunowicz · 2022-07-01T11:22:59Z

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

soohwanlim · 2022-07-03T23:55:17Z

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

Yes, I wonder how it got much faster.

hawrot · 2022-07-06T06:39:17Z

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

Yes, I wonder how it got much faster.

Did you try runnning pruned model through the Deepsparse engine?

dbogunowicz · 2022-07-06T07:30:40Z

@OSMasterSoohwan -> @hawrot makes a very good point.
The command you brought up:

python train.py --cfg ../models_v5.0/yolov5s.yaml --weights PATH_TO_COCO_PRETRAINED_WEIGHTS --data coco.yaml --hyp data/hyps/hyp.scratch.yaml --recipe ../recipes/yolov5s.pruned.md

would lunch a training run that takes an original, dense yolo5s and applies the sparsification recipe, that will prune the weights of the network.

Once this is complete and you want to use that sparsified model for fast inference, you would need to export the trained weights to onnx file and then compile the model in deepsparse engine.
For yolov5, you should probably take a look at this doc: https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/yolo/README.md

If you have any pending questions, feel free to follow up.

ithmz · 2022-07-13T06:52:49Z

@dbogunowicz can the converted onnx model be converted to Qualcomm DLC model to run on Qualcomm hardware accelrated device to have the benefit in inference speed?

dbogunowicz · 2022-07-14T07:11:47Z

@tsangz189 I am not aware of any support from our side. I would say we should probably ask the Qualcomm team, whether their accelerator:

supports converting from onnx to their supported file
if yes, does it also support the pruned/quantized models.

dbogunowicz · 2022-10-03T10:58:08Z

Closing due to inactivity.

SunHaozhe · 2023-01-18T00:04:00Z

I have been using sparseml for pruning YOLOv5 recently and I can see big improvement in inference time, however, the model size stays the same. I have realised that it happens due to unstrustured pruning which only fills the 0 rather than removing the weights.

@hawrot If I understand correctly, you saw reduced inference time even when doing unstructured pruning, right?

Does this inference time reduction happen only when you run the pruned model with the DeepSparse engine? Does this also happen when you export the pruned model to the normal PyTorch engine?

hawrot added the enhancement New feature or request label Jun 29, 2022

dbogunowicz self-assigned this Jun 30, 2022

dbogunowicz closed this as completed Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured pruning for YOLOv5 #925

Structured pruning for YOLOv5 #925

hawrot commented Jun 29, 2022

dbogunowicz commented Jun 30, 2022 •

edited by bfineran

soohwanlim commented Jul 1, 2022

dbogunowicz commented Jul 1, 2022

soohwanlim commented Jul 3, 2022

hawrot commented Jul 6, 2022

dbogunowicz commented Jul 6, 2022 •

edited

ithmz commented Jul 13, 2022

dbogunowicz commented Jul 14, 2022

dbogunowicz commented Oct 3, 2022

SunHaozhe commented Jan 18, 2023

Structured pruning for YOLOv5 #925

Structured pruning for YOLOv5 #925

Comments

hawrot commented Jun 29, 2022

dbogunowicz commented Jun 30, 2022 • edited by bfineran

soohwanlim commented Jul 1, 2022

dbogunowicz commented Jul 1, 2022

soohwanlim commented Jul 3, 2022

hawrot commented Jul 6, 2022

dbogunowicz commented Jul 6, 2022 • edited

ithmz commented Jul 13, 2022

dbogunowicz commented Jul 14, 2022

dbogunowicz commented Oct 3, 2022

SunHaozhe commented Jan 18, 2023

dbogunowicz commented Jun 30, 2022 •

edited by bfineran

dbogunowicz commented Jul 6, 2022 •

edited