Skip to content

qiaolian9/Pruner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

A repo for Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

  • The pipeline of Pruner is shown as following figure.

pipeline

  • This repo is based on a fork of Tenset.

Installation

  1. Build and install this repo following the guide.

  2. Version information can refer to here.

  3. Register device abstraction on Pruner as abstraction

Using Pruner on Nvidia GPUs

Quick tutorial on using Pruner on NVIDIA A100 refer to tutorial.

On-Line cost model Mode

Steps

  1. Search with Pruner w/o MTL (tuning 2,000 trials)
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam --target "cuda --model=a100" --psa a100_40
  1. Search with the Pruner (tuning 2,000 trials)
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam-siamese-update --load-model pam_k80_1500.pkl --target "cuda --model=a100" --psa a100_40
  1. Search with Ansor (tuning 2,000 trials)
# Using TensetMLP code
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model mlp --target "cuda --model=a100"

Off-line cost model Mode

Steps

  1. Pretrain or Finetune a model refer to tutorial

  2. Search with the Pruner w/ finetuned model (tuning 2,000 trials)

python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam-no-update --load-model pam_finetune.pkl --target "cuda --model=a100" --psa a100_40
  1. Search with the TensetMLP (tuning 2,000 trials)
# Using TensetMLP code
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model mlp-no-update --load-model mlp_finetune.pkl --target "cuda --model=a100"

Summary

method (tuning 2,000 trails) ansor Pruner w/o MTL Pruner TensetMLP Pruner w/ finetuned model
Update mode on-line online online offline offline
Search time(s) 6,691 5,563 4,978 5,469 4,212
Estimated total latency(ms) 1.592 1.476 1.457 1.621 1.469

Note: Details are reported in './scripts/res/resnet_50-nodeicml'

  • The Resnet-50's tuning curve with different method is shown as follows.

R50_a100

Tuning result for end-to-end workload benchmark

  • Detiled tuning results refer to E2E_Tuning_Comparison

  • The following figure shows that the search time required for Pruner to reach the performance of different approach tuning 2,000 trials on A100.

compilertime_a100

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages