Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
fhkingma committed May 5, 2019
0 parents commit 2068c78
Show file tree
Hide file tree
Showing 164 changed files with 36,456 additions and 0 deletions.
11 changes: 11 additions & 0 deletions .idea/bitswap (Friso Kingma's conflicted copy 2019-02-17).iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions .idea/bitswap.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/encodings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc (Friso Kingma's conflicted copy 2019-02-17).xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

730 changes: 730 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-02-14).xml

Large diffs are not rendered by default.

634 changes: 634 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-02-17).xml

Large diffs are not rendered by default.

746 changes: 746 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-02-28).xml

Large diffs are not rendered by default.

815 changes: 815 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-03-04).xml

Large diffs are not rendered by default.

795 changes: 795 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-03-05).xml

Large diffs are not rendered by default.

966 changes: 966 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-03-13).xml

Large diffs are not rendered by default.

506 changes: 506 additions & 0 deletions .idea/workspace (Friso Kingma's conflicted copy 2019-03-16).xml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

890 changes: 890 additions & 0 deletions .idea/workspace.xml

Large diffs are not rendered by default.

169 changes: 169 additions & 0 deletions README.md
@@ -0,0 +1,169 @@
# Bit-Swap

Code for reproducing results of [Bit-Swap: Practical Lossless Compression with Recursive Bits Back Coding](), appearing at ICML 2019.

The code is written by [Friso H. Kingma](https://www.linkedin.com/in/friso-kingma-b94496a0/). The paper is written by [Friso H. Kingma](https://www.linkedin.com/in/friso-kingma-b94496a0/), [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/) and [Jonathan Ho](http://www.jonathanho.me/).

## Introduction
The ''bits back'' argument suggests that latent variable models can be turned into lossless compression schemes. Translating the ''bits back'' argument into efficient and practical lossless compression schemes for general latent variable models, however, is still an open problem. Bits-Back with Asymmetric Numeral Systems ([BB-ANS](https://github.com/bits-back/bits-back)), makes bits back coding practically feasible for latent variable models with one latent layer, but it is inefficient for hierarchical latent variable models. In the paper we propose Bit-Swap, a new compression scheme that generalizes BB-ANS and achieves strictly better compression rates for hierarchical latent variable models with Markov chain structure. Through experiments we verify that our proposed technique results in lossless compression rates that are empirically superior to existing techniques.

## Overview
The repository consists of two main parts:
- Training of the variational-autoencoders
- Compression with Bit-Swap and BB-ANS using the trained models

Scripts relating to **training of the models** on MNIST ([mnist_train.py]()), CIFAR-10 ([cifar_train.py]()) and ImageNet (32x32) ([imagenet_train.py]()) can be found in the subdirectory [model](). Scripts relating to **compression with Bit-Swap and BB-ANS** of MNIST ([mnist_compress.py]()), CIFAR-10 ([cifar_compress.py]()) and ImageNet (32x32) ([imagenet_compress.py]()) are in the top directory. The script for compression using the benchmark compressors ([benchmark_compress.py]()) and the script for discretization of the latent space ([discretization.py]()) can also be found in the top directory.

## Requirements
- Python (3.7)
- OpenMPI and Horovod (0.16.0)
- Numpy (1.15.4)
- PyTorch (1.0.0)
- Torchvision (0.2.1)
- Tensorflow (1.13.1)
- Tensorboard (1.31.1)
- TensorboardX (1.6)
- tqdm (4.28.1)
- Matplotlib (3.0.2)
- Scipy (1.1.0)
- Scikit-learn (0.20.1)

Run
```
pip install -r requirements.txt
```

Installation instructions for OpenMPI + Horovod are available on the [github page of Horovod](https://github.com/horovod/horovod).

## Launch

### Model training
##### MNIST (on 1 GPU)
###### 8 latent layers
```
python mnist_train.py --nz=8 --width=61
```
###### 4 latent layers
```
python mnist_train.py --nz=4 --width=62
```
###### 2 latent layers
```
python mnist_train.py --nz=2 --width=63
```
###### 1 latent layer
```
python mnist_train.py --nz=1 --width=64
```
##### CIFAR-10 (on 8 GPU's with OpenMPI + Horovod)
###### 8 latent layers
```
mpiexec -np 8 cifar_train.py --nz=8 --width=252
```
###### 4 latent layers
```
mpiexec -np 8 python cifar_train.py --nz=4 --width=254
```
###### 2 latent layers
```
mpiexec -np 8 python cifar_train.py --nz=2 --width=255
```
###### 1 latent layer
```
mpiexec -np 8 python cifar_train.py --nz=1 --width=256
```
##### ImageNet (32x32) (on 8 GPU's with OpenMPI + Horovod)
###### 4 latent layers
```
mpiexec -np 8 python imagenet_train.py --nz=4 --width=254
```
###### 2 latent layers
```
mpiexec -np 8 python imagenet_train.py --nz=2 --width=255
```
###### 1 latent layer
```
mpiexec -np 8 python imagenet_train.py --nz=1 --width=256
```
### Compression
##### MNIST
###### 8 latent layers
```
python mnist_compress.py --nz=8 --bitswap=1
```
```
python mnist_compress.py --nz=8 --bitswap=0
```
###### 4 latent layers
```
python mnist_compress.py --nz=4 --bitswap=1
```
```
python mnist_compress.py --nz=4 --bitswap=0
```
###### 2 latent layers
```
python mnist_compress.py --nz=2 --bitswap=1
```
```
python mnist_compress.py --nz=2 --bitswap=0
```
##### CIFAR-10
###### 8 latent layers
```
python cifar_compress.py --nz=8 --bitswap=1
```
```
python cifar_compress.py --nz=8 --bitswap=0
```
###### 4 latent layers
```
python cifar_compress.py --nz=4 --bitswap=1
```
```
python cifar_compress.py --nz=4 --bitswap=0
```
###### 2 latent layers
```
python cifar_compress.py --nz=2 --bitswap=1
```
```
python cifar_compress.py --nz=2 --bitswap=0
```
##### ImageNet (32x32)
###### 4 latent layers
```
python imagenet_compress.py --nz=4 --bitswap=1
```
```
python imagenet_compress.py --nz=4 --bitswap=0
```
###### 2 latent layers
```
python imagenet_compress.py --nz=2 --bitswap=1
```
```
python imagenet_compress.py --nz=2 --bitswap=0
```

### Benchmark compressors
```
python benchmark_compress.py
```

### Plots
##### Cumulative Moving Averages (CMA) of the compression results
```
python cma.py
```

##### Stack plot of the different latent layers
```
python stackplot.py
```


## Contact
Please contact Friso Kingma ([fhkingma@gmail.com](mailto:fhkingma@gmail.com)) if you have any questions.

## Credits and Acknowledgements
Empty file added __init__.py
Empty file.
110 changes: 110 additions & 0 deletions bechmark_compress.py
@@ -0,0 +1,110 @@
import io
import gzip
import bz2
import lzma
import numpy as np
from utils.torch.modules import ImageNet
import os

from torchvision import datasets, transforms
import PIL.Image as pimg

# code that applies benchmark compressors on the three datasets (MNIST, CIFAR-10 and ImageNet)
# heavily based on benchmark_compressors.py from https://github.com/bits-back/bits-back

# seed
np.random.seed(100)

class ToInt:
def __call__(self, pic):
return pic * 255

def mnist():
transform_ops = transforms.Compose([transforms.ToTensor(), ToInt()])
mnist = datasets.MNIST(root="model/data/mnist", train=False, transform=transform_ops, download=True)
return mnist.test_data.numpy()

def cifar():
transform_ops = transforms.Compose([transforms.ToTensor(), ToInt()])
cifar = datasets.CIFAR10(root="model/data/cifar", train=False, transform=transform_ops, download=True)
return cifar.test_data

def imagenet():
transform_ops = transforms.Compose([transforms.ToTensor(), ToInt()])
imagenet = ImageNet(root='model/data/imagenet/test', file='test.npy', transform=transform_ops)
if not os.path.exists("bitstreams/imagenet/indices"):
randindices = np.random.choice(len(imagenet.dataset), size=(100, 100), replace=False)
np.save("bitstreams/imagenet/indices", randindices)
else:
randindices = np.load("bitstreams/imagenet/indices")
randindices = randindices.reshape(-1)
return imagenet.dataset[randindices]

def gzip_compress(images):
images = np.packbits(images) if images.dtype is np.dtype(bool) else images
assert images.dtype == np.dtype('uint8')
return gzip.compress(images.tobytes())

def bz2_compress(images):
images = np.packbits(images) if images.dtype is np.dtype(bool) else images
assert images.dtype == np.dtype('uint8')
return bz2.compress(images.tobytes())

def lzma_compress(images):
images = np.packbits(images) if images.dtype is np.dtype(bool) else images
assert images.dtype == np.dtype('uint8')
return lzma.compress(images.tobytes())

def pimg_compress(format='PNG', **params):
def compress_fun(images):
compressed_data = bytearray()
for n, image in enumerate(images):
image = pimg.fromarray(image)
img_bytes = io.BytesIO()
image.save(img_bytes, format=format, **params)
compressed_data.extend(img_bytes.getvalue())
return compressed_data
return compress_fun

def gz_and_pimg(images, format='PNG', **params):
pimg_compressed_data = pimg_compress(images, format, **params)
return gzip.compress(pimg_compressed_data)

def bench_compressor(compress_fun, compressor_name, images, images_name):
byts = compress_fun(images)
n_bits = len(byts) * 8
bitsperdim = n_bits / np.size(images)
print(f"Dataset: {images_name}. Compressor: {compressor_name}. Rate: {bitsperdim:.2f} bits/dim.")

if __name__ == "__main__":
# MNIST
images = mnist()
bench_compressor(gzip_compress, "gzip", images, 'MNIST')
bench_compressor(bz2_compress, "bz2", images, 'MNIST')
bench_compressor(lzma_compress, "lzma", images, 'MNIST')
bench_compressor(
pimg_compress("PNG", optimize=True), "PNG", images, 'MNIST')
bench_compressor(
pimg_compress('WebP', lossless=True, quality=100), "WebP", images, 'MNIST')
print("")

# CIFAR-10
images = cifar()
bench_compressor(gzip_compress, "gzip", images, 'CIFAR-10')
bench_compressor(bz2_compress, "bz2", images, 'CIFAR-10')
bench_compressor(lzma_compress, "lzma", images, 'CIFAR-10')
bench_compressor(
pimg_compress("PNG", optimize=True), "PNG", images, 'CIFAR-10')
bench_compressor(
pimg_compress('WebP', lossless=True, quality=100), "WebP", images, 'CIFAR-10')
print("")

# ImageNet
images = imagenet()
bench_compressor(gzip_compress, "gzip", images, 'ImageNet')
bench_compressor(bz2_compress, "bz2", images, 'ImageNet')
bench_compressor(lzma_compress, "lzma", images, 'ImageNet')
bench_compressor(
pimg_compress("PNG", optimize=True), "PNG", images, 'ImageNet')
bench_compressor(
pimg_compress('WebP', lossless=True, quality=100), "WebP", images, 'ImageNet')
Binary file added bitstreams/imagenet/indices.npy
Binary file not shown.

0 comments on commit 2068c78

Please sign in to comment.