Skip to content

salvadorbu/quickread

Repository files navigation

Description

Memory efficient multithreaded searcher for large single-byte encoded files (i.e. UTF-8, Latin1, ISO 8859-1). Multiple threads search through a memory mapped (mmap) pointer such that the entire file is never loaded into memory. Boyer-Moore algorithm is used for the string search. Users on 32-bit machines may run into issues with larger files since program is limited by the address space.

Here is an example with a 15GB file (COVID-19 vector embeddings):

Screencast.from.2023-09-15.02-35-55.mp4

Under optimal conditions 15GB file search should take <4 seconds. Files under 5GB should search almost instantly. Note that this is highly dependent on disk read speeds.

Usage

~ ./executable -f "file.ext" -s "term to search" -t <# of threads to use (optional)>

Specify file name, pattern to search for, and number of threads (default thread count of 25). It isn't recommended for the number of threads to go over 100.

Installation (Linux only)

Clone the repo

git clone https://github.com/salvadorbu/quickread.git

Run make file (requires ncurses)

make

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published