Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured pruning for YOLOv5 #925

Closed
hawrot opened this issue Jun 29, 2022 · 10 comments
Closed

Structured pruning for YOLOv5 #925

hawrot opened this issue Jun 29, 2022 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@hawrot
Copy link

hawrot commented Jun 29, 2022

I have been using sparseml for pruning YOLOv5 recently and I can see big improvement in inference time, however, the model size stays the same. I have realised that it happens due to unstrustured pruning which only fills the 0 rather than removing the weights.

I was wondering whether it is possible to implement structured pruning for YOLOv5

@hawrot hawrot added the enhancement New feature or request label Jun 29, 2022
@dbogunowicz dbogunowicz self-assigned this Jun 30, 2022
@dbogunowicz
Copy link
Contributor

dbogunowicz commented Jun 30, 2022

Hey @hawrot

Regarding model size:
Yes, this is true, the model size will not drop significantly due to unstructured pruning (for both pth and onnx representations I believe). However, it will be much more efficient to compress, if this is something you care about (large number of zero weights --> lower entropy of the data --> easier to compress).

Structured pruning of YOLOv5 is possible, we haven't looked into it yet though. It may be a bit tricky, due to the architecture of YOLO (e.g. the "long" skip connections may enforce the network to prune the same channel weights along the depth of the network, this may be slightly problematic). If you are interested in trying, here are the structured pruning modifiers that you can incorporate into your recipe. We can provide further advice if needed.

If the size is what you care about, we would recommend you to use YOLO quantization (if not already employed) not only for further speedup but also for significant model file size reduction

@soohwanlim
Copy link

I also pruned yolov5 through sparseml, but it didn't seem to have much effect.
Did you use

python train.py --cfg ../models_v5.0/yolov5s.yaml --weights PATH_TO_COCO_PRETRAINED_WEIGHTS --data coco.yaml --hyp data/hyps/hyp.scratch.yaml --recipe ../recipes/yolov5s.pruned.md

I want to know how to get a big effect. Can you tell me?

@dbogunowicz
Copy link
Contributor

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

@soohwanlim
Copy link

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

Yes, I wonder how it got much faster.

@hawrot
Copy link
Author

hawrot commented Jul 6, 2022

@OSMasterSoohwan what do you mean when you say big effect? Inference speedup right?

Yes, I wonder how it got much faster.

Did you try runnning pruned model through the Deepsparse engine?

@dbogunowicz
Copy link
Contributor

dbogunowicz commented Jul 6, 2022

@OSMasterSoohwan -> @hawrot makes a very good point.
The command you brought up:

python train.py --cfg ../models_v5.0/yolov5s.yaml --weights PATH_TO_COCO_PRETRAINED_WEIGHTS --data coco.yaml --hyp data/hyps/hyp.scratch.yaml --recipe ../recipes/yolov5s.pruned.md

would lunch a training run that takes an original, dense yolo5s and applies the sparsification recipe, that will prune the weights of the network.

Once this is complete and you want to use that sparsified model for fast inference, you would need to export the trained weights to onnx file and then compile the model in deepsparse engine.
For yolov5, you should probably take a look at this doc: https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/yolo/README.md

If you have any pending questions, feel free to follow up.

@ithmz
Copy link

ithmz commented Jul 13, 2022

@dbogunowicz can the converted onnx model be converted to Qualcomm DLC model to run on Qualcomm hardware accelrated device to have the benefit in inference speed?

@dbogunowicz
Copy link
Contributor

@tsangz189 I am not aware of any support from our side. I would say we should probably ask the Qualcomm team, whether their accelerator:

  • supports converting from onnx to their supported file
  • if yes, does it also support the pruned/quantized models.

@dbogunowicz
Copy link
Contributor

Closing due to inactivity.

@SunHaozhe
Copy link

I have been using sparseml for pruning YOLOv5 recently and I can see big improvement in inference time, however, the model size stays the same. I have realised that it happens due to unstrustured pruning which only fills the 0 rather than removing the weights.

@hawrot If I understand correctly, you saw reduced inference time even when doing unstructured pruning, right?

Does this inference time reduction happen only when you run the pruned model with the DeepSparse engine? Does this also happen when you export the pruned model to the normal PyTorch engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants