Work in progress. Proceed with caution!
This repo is an implementation of the ASP-DAC20 paper Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search.
This repo works closely with the aw_nas NAS framework. It takes the aw_nas YAML configuration files as inputs and outputs the optimized search space in cell_shared_primitives. aw_nas is required to make use of the YAML configuration files, but is not required for running the program in this repo.
Requires:
- Python, >= 3.6, for f-strings
- PyTorch
- PyYAML, for loading and dumping YAML files
- requests, for sending DPU ELF files to DPU
You can use DEBUG=1 to see a demo of the program running. Or you can use DEBUG_ACC=1 and DEBUG_LAT=1 to partially skip accuracy or latency profiling.
$ DEBUG=1 python ssprofile/main.py test.yamlFirst, start the auto_deploy backend on your DPU server.
$ git clone http://192.168.3.224:8081/toolchain/auto_deploy.git # only works at Novauto
$ cd auto_deploy && python3 manage.py runserver 0.0.0.0:8055You will need the Xilinx Vitis AI GPU docker image to quantize and compile models for DPU latency profiling.
On you working machine, inside the root folder of this repo, run
$ nvidia-docker run -ti -v `pwd`:`pwd` -w `pwd` \
-p 127.0.0.1:80:8080/tcp \
xilinx/vitis-ai:latest bashInsider the docker container
$ conda activate vitis-ai-caffe
$ conda install requests pyyaml # (add pytorch if it's not installed)Check the command line options by
$ python ssprofile/main.py --helpRun the profiler
$ python ssprofile/main.py <path to your YAML> --gpu 0 --profile-dir <path to profile dir>-
Generate all SSBN models. Train or finetune them to get
accuracy_table. Trained/finetuned PyTorch module state dicts and text representations are saved to<profile dir>/checkpoints. -
Convert PyTorch models to caffemodels using
pytorch2caffe, and saveprototxts andcaffemodels to<profile dir>/caffemodels. -
Quantize (
vai_q_caffe) and compile (vai_c_caffe) caffemodels. Quaitzed models are saved to<profile dir>/vitis/XXX_quantize. Compiled ELF files are saved to<profile dir>/vitis/XXX_compile. -
Post ELF files to DPU server via HTTP and save the responses to
<profile dir>/vitis/XXX.latency.txt. -
Read latency results and get
latency_table. -
Get
cell_shared_primitivesfromaccuracy_tableandlatency_table.
-
pytorch2caffe->vai_q_caffe->vai_c_caffetoolchain still fails sometimes, especially for VGG-style networks. -
Calibration images only have
32x32resolution right now (insidecalib_data, Vitis docs). -
Other issues that I haven't noticed.