Building a highly-available search engine using rqlite. Check out this blog post for full details.
You can download the test data set with the following command (tested on Linux):
curl https://storage.googleapis.com/bucket-vallified/rqlite/access-5million.log.gz >access.log.gz
Decompress the data set as follows:
gunzip access.log.gz
What results is an Apache web server access log file, containing 5 million entries.
Use the Python program in this repository to index the data. You must have at least 1 rqlite node up and running (check the Quick Start guide to get rqlite up and running). The indexing program assume rqlite is available at 127.0.0.1:4001
, but you can override this via command line options.
python indexer.py access.log
Pass -h
to the program to see full options. Depending on the hardware you use for your rqlite system, it could take a few minutes to index all the log data.