This repository contains the implementation and benchmark script for LZ78 substring compression on a CDAWG.
You can use ./prepare_files.sh to prepare the dataset.
This script will download three text files (sources, dna, and english) from the Pizza&Chilli corpus, create fibonacchi string file (fib), and extract the first 128MiB of each text.
You can use cmake to build the code.
This project requires sdsl.
You should install sdsl and set SDSL_INCLUDE_DIR and SDSL_LIBRARY_DIR in CMakeLists.txt
After completing the above steps, you can build the project using the following commands:
cmake . -DCMAKE_BUILD_TYPE=Release
make
To run all benchmarks, use the following script:
./run_benchmark_all.sh {path_to_executable}
To run only the compression or construction benchmarks, use the corresponding scripts:
./run_benchmark_construction.sh {path_to_executable} {filename}
./run_benchmark_compression_cdawg.sh {path_to_executable} {filename}
./run_benchmark_compression_suffixtree.sh {path_to_executable} {filename}
./run_benchmark_compression_rlbwt.sh {path_to_executable} {filename}