Skip to content

r4victor/bwt_compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

bwt_compressor is a lossless compressor/decompressor based on Burrows–Wheeler transform (BWT). It can be used as a CLI-tool or as a Python library.

The compression of data involves three major steps:

Warning: This project is for educational purposes only. It is written in Python and hasn't been optimized for speed and memory consumption.

Requirements

  • Python (tested on 3.9)
  • numpy
  • pydivsufsort
  • pytest (for tests only)
  • bitarray (for tests only)

Installation

  1. Get the source code:
$ git clone https://github.com/r4victor/bwt_compressor && cd bwt_compressor
  1. Install the requirements:
$ python -m pip install -r requirements.txt
  1. Check that everything is ok by running tests:
$ python -m pytest tests/

Troubleshooting

If you have problems installing the pydivsufsort library with pip, consider installing it from the source:

  1. Get the source:
$ git clone https://github.com/louisabraham/pydivsufsort
  1. Install from the source:
$ python -m pip install pydivsufsort/.

Usage

The program reads the input data from stdin and outputs the result of the compression to stdout. Here's how you may use it:

$ cat resources/martin_eden.txt | python -m bwt_compressor > resources/martin_eden.bwt

To decompress the data, specify the -d option:

$ cat resources/martin_eden.bwt | python -m bwt_compressor -d > resources/martin_eden_decompressed.txt

Limitations

At this moment the compressor works only with ASCII-texts that do not contain the null byte (\x00). This limitation can be lifted in the future.

About

A lossless BWT+DC+Huffman compressor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages