Search PDF, DOC and DOCX files
sudo npm install -g unisearch
Search comes with some bins.
pdfCat converts pdf to txt and sends it to output.
pdfCat pdf.pdf
You can use pdfSearch to read PDF files page by page or whole document.
var pdfSearch = require('unisearch').pdfSearch;
pdfSearch.open(process.argv[process.argv.length - 1]).then(function(pdf){
pdfSearch.readAllPages().then(function(pages){
console.log(pages.join('\n'));
});
});
pdfSearch.open will open PDF file and you can use that same object to read that PDF.
It will return Promise which will resolve when pdfSearch is ready to read pages.
pdfSearch.readAllPages will return Promise for Array of strings with text from pages.
pdfSearch.readPage will return promise for string which is content of that page
pdf.js by Mozilla
Nemanja Nedeljković