Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

A repo for Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

The pipeline of Pruner is shown as following figure.

This repo is based on a fork of Tenset.

Installation

Build and install this repo following the guide.
Version information can refer to here.
Register device abstraction on Pruner as abstraction

Using Pruner on Nvidia GPUs

Quick tutorial on using Pruner on NVIDIA A100 refer to tutorial.

On-Line cost model Mode

Steps

Search with Pruner w/o MTL (tuning 2,000 trials)

python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam --target "cuda --model=a100" --psa a100_40

Search with the Pruner (tuning 2,000 trials)

python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam-siamese-update --load-model pam_k80_1500.pkl --target "cuda --model=a100" --psa a100_40

Search with Ansor (tuning 2,000 trials)

# Using TensetMLP code
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model mlp --target "cuda --model=a100"

Off-line cost model Mode

Steps

Pretrain or Finetune a model refer to tutorial
Search with the Pruner w/ finetuned model (tuning 2,000 trials)

python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model pam-no-update --load-model pam_finetune.pkl --target "cuda --model=a100" --psa a100_40

Search with the TensetMLP (tuning 2,000 trials)

# Using TensetMLP code
python3 tune_network.py --network resnet_50 --n-trials 2000 --cost-model mlp-no-update --load-model mlp_finetune.pkl --target "cuda --model=a100"

Summary

method (tuning 2,000 trails)	ansor	Pruner w/o MTL	Pruner	TensetMLP	Pruner w/ finetuned model
Update mode	on-line	online	online	offline	offline
Search time(s)	6,691	5,563	4,978	5,469	4,212
Estimated total latency(ms)	1.592	1.476	1.457	1.621	1.469

Note: Details are reported in './scripts/res/resnet_50-nodeicml'

The Resnet-50's tuning curve with different method is shown as follows.

Tuning result for end-to-end workload benchmark

Detiled tuning results refer to E2E_Tuning_Comparison
The following figure shows that the search time required for Pruner to reach the performance of different approach tuning 2,000 trials on A100.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

scripts

scripts

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

Installation

Using Pruner on Nvidia GPUs

On-Line cost model Mode

Steps

Off-line cost model Mode

Steps

Summary

Tuning result for end-to-end workload benchmark

About

Releases

Packages

Languages

qiaolian9/Pruner

Folders and files

Latest commit

History

Repository files navigation

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

Installation

Using Pruner on Nvidia GPUs

On-Line cost model Mode

Steps

Off-line cost model Mode

Steps

Summary

Tuning result for end-to-end workload benchmark

About

Resources

Stars

Watchers

Forks

Languages