Existing work of activation compressed training (ACT) relies on searching for optimal bit-width during DNN training to reduce the quantization noise, which makes the procedure complicated and less transparent.
In our project, we have an instructive observation: DNN backward propagation mainly utilizes the low-frequency component (LFC) of the activation maps, while the majority of memory is for caching the high-frequency component (HFC) during the training. This indicates the HFC of activation maps is highly redundant and compressible during DNN training. To this end, we propose a simple and transparent framework to reduce the memory cost of DNN training, Dual ActIVation PrecISION (DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of DNN backward propagation such that it maintains competitive model accuracy.
The framework of DIVISION is shown in the following figure. After the feed-forward operation of each layer, DIVISION estimates the LFC and compresses the HFC into a low-precision copy such that the total memory cost is significantly decreased after the compression. Before the backward propagation of each layer, the low-precision HFC is decompressed and combined with LFC to reconstruct the activation map.
Compared with the existing frameworks that integrate searching into learning, DIVISION has a more simplified compressor and decompressor, speeding up the procedure of ACT. More importantly, it reveals the compressible (HFC) and non-compressible factors (LFC) during DNN training, improving the transparency of ACT.
python >= 3.6
torch >= 1.10.2+cu113
torchvision >= 0.11.2+cu113
lmdb >= 1.3.0
pyarrow >= 6.0.1
First, download the ImageNet dataset from image-net.org. Then, generate the LMDB-format ImageNet dataset by running:
cd data
python folder2lmdb.py -f [Your ImageNet folder] -s train
python folder2lmdb.py -f [Your ImageNet folder] -s split
cd ../
Transformation to the LMDB-format aims to reduce the communication cost. It will be fine to use the original dataset.
Generate the "*.so" file by running:
cd cpp_extension
python setup.py build_ext --inplace
cd ../
You should find a "backward_func.cpython-36m-x86_64-linux-gnu.so", "calc_precision.cpython-36m-x86_64-linux-gnu.so", "minimax.cpython-36m-x86_64-linux-gnu.so", and "quantization.cpython-36m-x86_64-linux-gnu.so" in the "cpp_extension" folder.
Train a DNN using DIVISION by running the bash commend:
bash scripts/resnet18_cifar10_division.sh
bash scripts/resnet164_cifar10_division.sh
bash scripts/densenet121_cifar100_division.sh
bash scripts/resnet164_cifar100_division.sh
bash scripts/resnet50_division.sh
bash scripts/densenet161_division.sh
Check the model accuracy and training log files.
Dataset | Architecture | Top-1 Validation Accuracy | Normal Training Accuracy | Log file |
---|---|---|---|---|
CIFAR-10 | ResNet-18 | 94.7 | 94.9 | LOG |
CIFAR-10 | ResNet-164 | 94.5 | 94.9 | LOG |
CIFAR-100 | DenseNet-121 | 79.5 | 79.8 | LOG |
CIFAR-100 | ResNet-164 | 76.9 | 77.3 | LOG |
ImageNet | ResNet-50 | 75.9 | 76.2 | LOG |
ImageNet | DenseNet-161 | 77.6 | 77.6 | LOG |
The LMDB-format data loading is developed based on the opensource repo of Efficient-PyTorch. The cuda kernel of activation map quantization is developed based on the opensource repo of ActNN.
Thanks those teams for their contribution to the ML community!