Skip to content

guanchuwang/division

Repository files navigation

DIVISION: Memory Efficient Training via Dual Activation Precision

About This Wrok

Research Motivation

Existing work of activation compressed training (ACT) relies on searching for optimal bit-width during DNN training to reduce the quantization noise, which makes the procedure complicated and less transparent.

In our project, we have an instructive observation: DNN backward propagation mainly utilizes the low-frequency component (LFC) of the activation maps, while the majority of memory is for caching the high-frequency component (HFC) during the training. This indicates the HFC of activation maps is highly redundant and compressible during DNN training. To this end, we propose a simple and transparent framework to reduce the memory cost of DNN training, Dual ActIVation PrecISION (DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of DNN backward propagation such that it maintains competitive model accuracy.

DIVISION Framework

The framework of DIVISION is shown in the following figure. After the feed-forward operation of each layer, DIVISION estimates the LFC and compresses the HFC into a low-precision copy such that the total memory cost is significantly decreased after the compression. Before the backward propagation of each layer, the low-precision HFC is decompressed and combined with LFC to reconstruct the activation map.

Advantages of DIVISION

Compared with the existing frameworks that integrate searching into learning, DIVISION has a more simplified compressor and decompressor, speeding up the procedure of ACT. More importantly, it reveals the compressible (HFC) and non-compressible factors (LFC) during DNN training, improving the transparency of ACT.

Dependency

python >= 3.6
torch >= 1.10.2+cu113
torchvision >= 0.11.2+cu113
lmdb >= 1.3.0
pyarrow >= 6.0.1

Run this Repo

Prepare the ImageNet dataset

First, download the ImageNet dataset from image-net.org. Then, generate the LMDB-format ImageNet dataset by running:

cd data
python folder2lmdb.py -f [Your ImageNet folder] -s train
python folder2lmdb.py -f [Your ImageNet folder] -s split
cd ../

Transformation to the LMDB-format aims to reduce the communication cost. It will be fine to use the original dataset.

Generate the CUDA executive (*.so) file

Generate the "*.so" file by running:

cd cpp_extension
python setup.py build_ext --inplace
cd ../

You should find a "backward_func.cpython-36m-x86_64-linux-gnu.so", "calc_precision.cpython-36m-x86_64-linux-gnu.so", "minimax.cpython-36m-x86_64-linux-gnu.so", and "quantization.cpython-36m-x86_64-linux-gnu.so" in the "cpp_extension" folder.

Train a deep neural network via DIVISION

Train a DNN using DIVISION by running the bash commend:

bash scripts/resnet18_cifar10_division.sh
bash scripts/resnet164_cifar10_division.sh
bash scripts/densenet121_cifar100_division.sh
bash scripts/resnet164_cifar100_division.sh
bash scripts/resnet50_division.sh
bash scripts/densenet161_division.sh

Check the model accuracy and training log files.

Dataset Architecture Top-1 Validation Accuracy Normal Training Accuracy Log file
CIFAR-10 ResNet-18 94.7 94.9 LOG
CIFAR-10 ResNet-164 94.5 94.9 LOG
CIFAR-100 DenseNet-121 79.5 79.8 LOG
CIFAR-100 ResNet-164 76.9 77.3 LOG
ImageNet ResNet-50 75.9 76.2 LOG
ImageNet DenseNet-161 77.6 77.6 LOG

Reproduce our experiment results

Model Accuracy

Training Memory Cost

Training Throughput

Overall Evaluation

Acknowledgment

The LMDB-format data loading is developed based on the opensource repo of Efficient-PyTorch. The cuda kernel of activation map quantization is developed based on the opensource repo of ActNN.

Thanks those teams for their contribution to the ML community!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published