Image compression and reconstruction using pretrained autoencoder, vqgan first stage models from latent diffusion and taming transformer repo. Model codes, configs are copied from these repos with unnecessary parts removed.
Saves autoencoding model encoded output as compressed format. This output is passed to decoder on receiver side to reconstruct a lossy compressed version original image. Based on chosen seettings of autoencoding models the encoded output or its indices in case vqgan can be saved which can be used for reconstuction.
To save vram and prevent extra processing only encoder or decoder weights based on compression or decompression task is loaded. Training code is removed but should be able to load models trained on original repo.
Compressed data is saved in safetensors
format. For compression if batch size larger than 1 is used then each output contains encode output tensor for the whole batch.
vq-f4
, vq-f8
, kl-f4
, kl-f8
configs provide the best reconstruction results.
Compressing with vqgan model by removing --kl
and adding --vq_ind
for vq-f4
, vq-f8
should provide best compression ratio. Further using a zip program to compress the saved output may provide better quality and file size reduction than using jpeg with quality reduced to around 60 percent. A good quality pretrained reconstruction model for vq-f8
followed by zip compression may provide best results in terms of file size.
When using --vq_ind
also adding ind_bit
to 8
should give the most compressed output though not best quality. It will not work with most configs as output value ranges need to be from 0-255. Only this config and its associated model will work with int_bit
set to 8
.
Run following command on setup.py folder before running library.
pip install -e .
kl compress,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --batch 2 --img_size 384
vq compress with indices,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --batch 1 --img_size 512 --vq_ind --ind_bit 16
kl decompress,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --dc
vq decompress with indices,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --dc --vq_ind
If --dc
flag is provided it runs decompression otherwise compresses input.
--aspect
resize image keeping aspect ratio with smaller dimension size set to --img_size
. May fail for large images not fitting in gpu memory.
For --ind_bit
with possible values 8
or 16
vqgan indices are saved as uint8 or int16 reducing compressed output file size. Only needed for compression.
--xformers
uses xformers if available to reduce memory consumption and may also increse speed.
--float16
process in float16 precision to reduce memory consumption.
Currently 3 types of data compression is available.
- For
--kl
autoencoder kl pretrained model encode output is saved. - If
--kl
not specified then vqgan encode output is saved. - If
--vq_ind
specified then indices are saved. These are used to reconstruct image.
Original configs can be found here. More weights can be found on latent diffusion repo. ru-dalle vq-f8-gumbel
model trained on taming transformers repo can also be used.
For kl-f8
stable diffusion vae ckpt can be used. Gives 8x downsampling.
- https://huggingface.co/stabilityai/sd-vae-ft-ema-original/tree/main
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original/tree/main
For kl-f4
config,
For vq-f4
config,
Following may provide better compression rates but there maybe noticable degradation in reconstructed images.
For vq-f8
config,
For vq-f8-n256
config,
For kl-f16
config,
For kl-f32
config,
For vq-f8-gumbel
config,
For vq-f8-rudalle
config,