Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Not 'yo PDF. A library for extracting text from a PDF. Presumably the PDF is not yours, but that doesn't matter!
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
libnachopdf =========== libnachopdf (Not 'yo PDF) is a library for extracting text from a PDF. This library is quite limited and incomplete. Currently it can decompress FlateDecode (zlib) text streams and it has a basic PS interpreter to decode text from PS encoded streams. pdfsearch ======= PDFsearch is a utility built upon libnachopdf. This can serve as an example of using libnachopdf. PDFsearch takes a PDF and regular expression as input and tries to locate a match within the PDF. If a match is found, the page number from which it occurs is reported to the user. Why not call it 'PDFgrep?' Well, that name is taken. Caveat/Warning ============== Unfortunately the detection of spacing between words is lacking in libnachopdf. Therefore, since pdfsearch is based on this library spaces are ignored. Hey, it's a work in progress. Building ======== If you just want to build the library: > make libnachopdf Building is simple (no config is provided), just run the following: > make pdfsearch Installing ========== This project is still alpha, build and then copy the binary (and or library) wherever you wish. Contact ======= Matt Davis (enferex) email@example.com