-
-
Notifications
You must be signed in to change notification settings - Fork 579
Adding ability to extract text to an io.Reader #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thx for your PR! |
|
Please share your specific intention of returning text. |
|
I appreciate you taking the time to look at this pull request. It turns out that since submitting the pull request I created my own library to convert PDF’s to text at https://github.com/EndFirstCorp/pdf2txt. My use case for this is that I’m filling out a pipeline from a web server file upload. On file upload, you get access to a multipart.File. Rather than saving that multipart.File to a standard OS file, my goal was to minimize the I/O and 1) read in the file and convert it to text and then 2) output the text file to disk. I wanted it to work with an io.Reader, not a io.ReaderAt like the typical implementation of PDF parsers are. So, that’s what I’ve done. If you’re still interested in implementing this pull request, I can look at resubmitting. |
|
Understood. If you can resubmit your text extraction code so that it is consistent with the existing extraction code for images, fonts and content and supply test code as well I am happy to merge in. Basically what that means is writing out to a file and not a reader. Thank you. |
|
Text extraction will definitely be an additional functionality at some point but |
using latest source, and update read_buf
* move booklet out of nup api * booklet cli * cleanup * note * cleanup
Thanks for the awesome library! I thought it would be useful to have the ability to extract text from a PDF file easily so I added it to pdflib. The charmap capability was a beast, but I got it working for the files I tested on. Thanks for considering this pull request