DIVISION: Memory Efficient Training via Dual Activation Precision

About This Wrok

Research Motivation

Existing work of activation compressed training (ACT) relies on searching for optimal bit-width during DNN training to reduce the quantization noise, which makes the procedure complicated and less transparent.

In our project, we have an instructive observation: DNN backward propagation mainly utilizes the low-frequency component (LFC) of the activation maps, while the majority of memory is for caching the high-frequency component (HFC) during the training. This indicates the HFC of activation maps is highly redundant and compressible during DNN training. To this end, we propose a simple and transparent framework to reduce the memory cost of DNN training, Dual ActIVation PrecISION (DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of DNN backward propagation such that it maintains competitive model accuracy.

DIVISION Framework

The framework of DIVISION is shown in the following figure. After the feed-forward operation of each layer, DIVISION estimates the LFC and compresses the HFC into a low-precision copy such that the total memory cost is significantly decreased after the compression. Before the backward propagation of each layer, the low-precision HFC is decompressed and combined with LFC to reconstruct the activation map.

Advantages of DIVISION

Compared with the existing frameworks that integrate searching into learning, DIVISION has a more simplified compressor and decompressor, speeding up the procedure of ACT. More importantly, it reveals the compressible (HFC) and non-compressible factors (LFC) during DNN training, improving the transparency of ACT.

Dependency

python >= 3.6
torch >= 1.10.2+cu113
torchvision >= 0.11.2+cu113
lmdb >= 1.3.0
pyarrow >= 6.0.1

Run this Repo

Prepare the ImageNet dataset

First, download the ImageNet dataset from image-net.org. Then, generate the LMDB-format ImageNet dataset by running:

cd data
python folder2lmdb.py -f [Your ImageNet folder] -s train
python folder2lmdb.py -f [Your ImageNet folder] -s split
cd ../

Transformation to the LMDB-format aims to reduce the communication cost. It will be fine to use the original dataset.

Generate the CUDA executive (*.so) file

Generate the "*.so" file by running:

cd cpp_extension
python setup.py build_ext --inplace
cd ../

You should find a "backward_func.cpython-36m-x86_64-linux-gnu.so", "calc_precision.cpython-36m-x86_64-linux-gnu.so", "minimax.cpython-36m-x86_64-linux-gnu.so", and "quantization.cpython-36m-x86_64-linux-gnu.so" in the "cpp_extension" folder.

Train a deep neural network via DIVISION

Train a DNN using DIVISION by running the bash commend:

bash scripts/resnet18_cifar10_division.sh
bash scripts/resnet164_cifar10_division.sh
bash scripts/densenet121_cifar100_division.sh
bash scripts/resnet164_cifar100_division.sh
bash scripts/resnet50_division.sh
bash scripts/densenet161_division.sh

Check the model accuracy and training log files.

Dataset	Architecture	Top-1 Validation Accuracy	Normal Training Accuracy	Log file
CIFAR-10	ResNet-18	94.7	94.9	LOG
CIFAR-10	ResNet-164	94.5	94.9	LOG
CIFAR-100	DenseNet-121	79.5	79.8	LOG
CIFAR-100	ResNet-164	76.9	77.3	LOG
ImageNet	ResNet-50	75.9	76.2	LOG
ImageNet	DenseNet-161	77.6	77.6	LOG

Reproduce our experiment results

Model Accuracy

Training Memory Cost

Training Throughput

Overall Evaluation

Acknowledgment

The LMDB-format data loading is developed based on the opensource repo of Efficient-PyTorch. The cuda kernel of activation map quantization is developed based on the opensource repo of ActNN.

Thanks those teams for their contribution to the ML community!

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
cifar_models		cifar_models
cpp_extension		cpp_extension
fdmp		fdmp
figure		figure
log		log
plot		plot
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mem_speed_benchmark.py		mem_speed_benchmark.py
train.py		train.py
train_cifar10.py		train_cifar10.py
train_cifar100.py		train_cifar100.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIVISION: Memory Efficient Training via Dual Activation Precision

About This Wrok

Research Motivation

DIVISION Framework

Advantages of DIVISION

Dependency

Run this Repo

Prepare the ImageNet dataset

Generate the CUDA executive (*.so) file

Train a deep neural network via DIVISION

Reproduce our experiment results

Model Accuracy

Training Memory Cost

Training Throughput

Overall Evaluation

Acknowledgment

About

Releases

Packages

Languages

License

guanchuwang/division

Folders and files

Latest commit

History

Repository files navigation

DIVISION: Memory Efficient Training via Dual Activation Precision

About This Wrok

Research Motivation

DIVISION Framework

Advantages of DIVISION

Dependency

Run this Repo

Prepare the ImageNet dataset

Generate the CUDA executive (*.so) file

Train a deep neural network via DIVISION

Reproduce our experiment results

Model Accuracy

Training Memory Cost

Training Throughput

Overall Evaluation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages