Skip to content

ppingzhang/SCMC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rethinking Semantic Image Compression: Scalable Representation with Cross-modality Transfer

This is the official code about the paper: Rethinking Semantic Image Compression: Scalable Representation with Cross-modality Transfer. It contains three layers: the semantic layer, the structure layer and the signal layer. Herein, the project provides the code to achieve these three layers.

You can find the data in this link

Dataset: Our dataset

preprocessing the dataset:

  1. extract the captions of training and testing data.
  2. split training and testing data.
  3. resize them into 256x256. (You can resize and center crop images!)

Note: README.md in ./dataset provides more details.

Caltech-UCSD Birds-200-2011 (CUB-200-2011)


1. The semantic layer:

1.1 Image2Text: Image Caption

This code only offers the methods of training on the COCO dataset. We need to write the data processing code to process the CUB dataset.

cd layer1

1.1.1. Train the model

CUDA_LAUNCH_BLOCKING=1 
python train_bird.py  --model_path=../../ckpt/I2T/ --image_list_dir=../../dataset/CUB_200_2011/filenames_train.pickle --vocab_path=../../dataset/CUB_200_2011/captions.pickle --image_dir=../../dataset/CUB_200_2011/train_image_resize/ --caption_path=../../dataset/CUB_200_2011/text/
Argument Possible values
--model_path The path of saving models
--image_list_dir The path of filenames_train.pickle
--vocab_path The path of captions.pickle
--image_dir The path of caption files

1.1.2. Test a single image

python sample_bird.py --image='./example.png'

1.1.3. Test the whole dataset

python test_all_bird.py 

Note, you need to modify the paths in the test_all_bird.py .

1.2. Text2Image: AttnGAN

We reference the AttnGAN model to generate images, and you can realize it following the "README.md" file provided via AttnGAN.

Note: After training AttGAN, you need to generate all images in the training dataset, because the second layer is based on the results of the first layer.

2. The structure layer:

cd layer2

2.1. Extract the RCF structure maps

Put bsds500_pascal_model.pth in the ./layer2/RCF

cd layer2/RCF

First, we need to generate the RCF edges of the training data for the layer2 training.

python test_bird.py --checkpoint=bsds500_pascal_model.pth --save-dir=../../results/layer2/RCF_train/ --dataset=../../dataset/CUB_200_2011/train_image_resize/

Then, we need to generate the RCF edges of the testing data for evaluation.

python test_bird.py --checkpoint=bsds500_pascal_model.pth --save-dir=../../results/layer2/RCF_test/ --dataset=../../dataset/CUB_200_2011/test_image_resize/

Please note that we need to perform the operation to reverse RCF edges (255-RCF edges). The correct edge map for our model can be found in Figure 1 of the paper. Then, we use VTM to compress the structure maps.

2.2 Compress the RCF structure maps

After downloading VTM software,

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j

It mainly contains four steps to compress RCF structure maps

1. convert the RGB format into the YUV400 format.
2. compress them via VTM with SCC.
3. decompress the bin files.
4. convert the YUV400 format to the RGB format.

The commands of encoding and decoding

EncoderAppStatic -c ./cfg/encoder_intra_vtm.cfg -c ./cfg/per-class/classSCC.cfg -c input.cfg -i input_path -b bin_path -q qp
DecoderAppStatic -b bin_path -o output_path

where encoder_intra_vtm.cfg and classSCC.cfg can be found in the ./cfg file. input.cfg contains the information about the input sequence, and you can reference ./cfg/per-sequence/BasketballDrill.cfg and modify SourceWidth, SourceHeight, and FramesToBeEncoded.

You need to compress and decompress RCF structure maps in the training and testing datasets.

2.3 Training the structure layer.

python3 train_l2_gan.py 
--train_imgs_path=../dataset/CUB_200_2011/train_image_resize/ 
--train_caps_path=../dataset/CUB_200_2011/text/ 
--train_style_path=../result/layer2/ # the outputs of layer2
--train_pickle_path=../dataset/CUB_200_2011/filenames_train.pickle 
--train_edge_path=./CUB_200_2011/VTM_encode_result/train/de_img_down_RCF/50/ #decoded structure maps
--label_str=basic 

2.4 Testing the structure layer

python3 train_l2_gan.py 
--train_imgs_path=../dataset/CUB_200_2011/train_image_resize/ 
--train_caps_path=../dataset/CUB_200_2011/text/ 
--train_style_path=../result/layer1_for_train/ # the outputs of layer2
--train_pickle_path=../dataset/CUB_200_2011/filenames_train.pickle 
--train_edge_path=./dataset/CUB_200_2011/VTM_encode_result/train/de_img_down_RCF/50/ #decoded structure maps
--label_str=basic 

We provide the pretained model, and you can download it. model

python3 train_l2_gan.py 
--mode=test 
--test_imgs_path=../dataset/CUB_200_2011/test_image_resize/  
--test_caps_path=../dataset/CUB_200_2011/text/  
--test_style_path=../result/layer1_for_test/  
--train_pickle_path=../dataset/CUB_200_2011/filenames_train.pickle  
--test_edge_path=./dataset/CUB_200_2011/VTM_encode_result/test/de_img_down_RCF/50/ # decoded structure maps of testing dataset
--label_str=basic  
--ckpt=../ckpt/layer2/model/layer2.pth.tar # ckpt model

3. The signal layer:

python3 main_codec.py 
--model=L3_Codec_Hyperprior
--train_imgs_path=../dataset/CUB_200_2011/train_image_resize/ 
--train_base_img_path=../result/layer1_for_train/ # the outputs of layer2
--train_pickle_path=../dataset/CUB_200_2011/filenames_train.pickle 
--train_edge_path=./dataset/CUB_200_2011/VTM_encode_result/train/de_img_down_RCF/50/ #decoded structure maps

We provide the pretained model, and you can download it. model

python3 main_codec.py
--mode=test 
--model=L3_Codec_Hyperprior
--test_imgs_path=../dataset/CUB_200_2011/test_image_resize/  
--test_base_img_path=../result/layer1_for_test/  
--train_pickle_path=../dataset/CUB_200_2011/filenames_train.pickle  
--test_edge_path=../dataset/CUB_200_2011/VTM_encode_result/test/de_img_down_RCF/50/ # decoded structure maps of testing dataset
--ckpt=../ckpt/L3_Codec_Hyperprior/basic/1/best.pth.tar # ckpt model

BibTeX

@ARTICLE{Learning2022Zhang,
  author={Zhang, Pingping and Wang, Shiqi and Wang, Meng and Li, Jiguo and Wang, Xu and Kwong, Sam},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Rethinking Semantic Image Compression: Scalable Representation with Cross-modality Transfer}, 
  year={2023},
  volume={},
  number={},
  pages={}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages