Skip to content

resc2801/pyDAC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyDAC

Upload Python Package

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.

In terms of compression, a DAC structure is very likely to outperform standard base-128 compression schemes aka VByte, Varint, VInt, EncInt etc..

As a bonus, a DAC structure gives to random access to each and every sequence element without any decoding.

Installation

Install from PyPi using

pip install pyDAC

Usage

from pyDAC import DAC

imports the module.

import random 
from pyDAC import DAC

values = random.sample(range(2**32), 10**7)
encoded_values = DAC(iter(values))

creates a DAC structure encoded_values for the values sequence.

Access

The ith element from the original values sequence can be retrieved from a DAC structure encoded_values using the subscript operator

for i in range(len(values)):
    assert values[i] == encoded_values[i]

A DAC structure encoded_values is also iterable.

You can easily loop through the stored elements stored

dac_iter = iter(encoded_values)
while True:
    try:
        val = next(dac_iter)
    except StopIteration:
        break  # Iterator exhausted: stop the loop
    else:
        print(val)

or return all stored elements at once

assert values == list(iter(encoded_values))

Miscellaneous

A DAC structure can provide compression ratios and space_savings in comparision to the minimal fixed width representation and to the variable byte representation of the original values sequence.

For example,

values = [1, 2, 1, 8, 3, 4, 5, 9, 13, 1024, 262189]
encoded_values = DAC(iter(values))

print(encoded_values.space_savings)
>>> {'vbyte': 0.08214285714285718, 'fixed_width': 0.508133971291866}

print(encoded_values.compression_ratios)
>>> {'vbyte': 1.0894941634241246, 'fixed_width': 2.0330739299610894}

Attributions

@article{
    title = {{Algorithms and Compressed Data Structures for Information Retrieval}},
    author = {Ladra, Susana},
    type = {Phd Thesis},
    institution = {Universidade da Coru{\~{n}}a},
    pages = {272},
    year = {2011},
    isbn = {5626895531}
}
@inproceedings{
    title = {{Directly addressable variable-length codes}},
    author = {Brisaboa, Nieves R. and Ladra, Susana and Navarro, Gonzalo},
    booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
    volume = {5721 LNCS},
    doi = {10.1007/978-3-642-03784-9_12},
    isbn = {3642037836},
    issn = {03029743},
    pages = {122--130},
    publisher = {Springer, Berlin, Heidelberg},
    year = {2009}
}

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers that enables direct access to any element of the encoded sequence and obtains compact spaces.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages