ssprofile

Work in progress. Proceed with caution!

This repo is an implementation of the ASP-DAC20 paper Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search.

This repo works closely with the aw_nas NAS framework. It takes the aw_nas YAML configuration files as inputs and outputs the optimized search space in cell_shared_primitives. aw_nas is required to make use of the YAML configuration files, but is not required for running the program in this repo.

Requires:

Python, >= 3.6, for f-strings
PyTorch
PyYAML, for loading and dumping YAML files
requests, for sending DPU ELF files to DPU

Usage

Demo

You can use DEBUG=1 to see a demo of the program running. Or you can use DEBUG_ACC=1 and DEBUG_LAT=1 to partially skip accuracy or latency profiling.

$ DEBUG=1 python ssprofile/main.py test.yaml

Not Demo

First, start the auto_deploy backend on your DPU server.

$ git clone http://192.168.3.224:8081/toolchain/auto_deploy.git # only works at Novauto
$ cd auto_deploy && python3 manage.py runserver 0.0.0.0:8055

You will need the Xilinx Vitis AI GPU docker image to quantize and compile models for DPU latency profiling.

On you working machine, inside the root folder of this repo, run

$ nvidia-docker run -ti -v `pwd`:`pwd` -w `pwd` \
        -p 127.0.0.1:80:8080/tcp \
        xilinx/vitis-ai:latest bash

Insider the docker container

$ conda activate vitis-ai-caffe
$ conda install requests pyyaml # (add pytorch if it's not installed)

Check the command line options by

$ python ssprofile/main.py --help

Run the profiler

$ python ssprofile/main.py <path to your YAML> --gpu 0 --profile-dir <path to profile dir>

Workflow

Generate all SSBN models. Train or finetune them to get accuracy_table. Trained/finetuned PyTorch module state dicts and text representations are saved to <profile dir>/checkpoints.
Convert PyTorch models to caffemodels using pytorch2caffe, and save prototxts and caffemodels to <profile dir>/caffemodels.
Quantize (vai_q_caffe) and compile (vai_c_caffe) caffemodels. Quaitzed models are saved to <profile dir>/vitis/XXX_quantize. Compiled ELF files are saved to <profile dir>/vitis/XXX_compile.
Post ELF files to DPU server via HTTP and save the responses to <profile dir>/vitis/XXX.latency.txt.
Read latency results and get latency_table.
Get cell_shared_primitives from accuracy_table and latency_table.

Known Issues:

pytorch2caffe -> vai_q_caffe -> vai_c_caffe toolchain still fails sometimes, especially for VGG-style networks.
Calibration images only have 32x32 resolution right now (inside calib_data, Vitis docs).
Other issues that I haven't noticed.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
bbssp_config		bbssp_config
calib_data		calib_data
original_scripts		original_scripts
ssprofile		ssprofile
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
test.yaml		test.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ssprofile

Usage

Demo

Not Demo

Workflow

Known Issues:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ssprofile

Usage

Demo

Not Demo

Workflow

Known Issues:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages