You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One quite common use case with PDF libraries, is to get the text form a PDF. This is often used for things like indexing documents in a search engine. There is a project in Rust that does this called pdf-extract but I'd love to see an alternative to this (for a couple of reasons)
I noticed rspdf has a way to extract XML text from a PDF. I was wondering whether it would also be possible to extract content as plaintext? Or even better: extract it as markdown!
Perhaps this is completely out of scope for the project. Maybe I could help out with this someday (have some plans in this regard) if you think it may be a good fit.
Cheers!
The text was updated successfully, but these errors were encountered:
Thanks, i'm glad your have interest about this project.
This project is primarily centered around extracting text and images and converting them to other formats at now, plain text. Markdown support is a potential future addition.
However, there are numerous bugs to address, particularly related to fonts. Consequently, the timeline for completion is uncertain
Hi there! Thanks for creating and sharing this :)
One quite common use case with PDF libraries, is to get the text form a PDF. This is often used for things like indexing documents in a search engine. There is a project in Rust that does this called
pdf-extract
but I'd love to see an alternative to this (for a couple of reasons)I noticed
rspdf
has a way to extract XML text from a PDF. I was wondering whether it would also be possible to extract content as plaintext? Or even better: extract it as markdown!Perhaps this is completely out of scope for the project. Maybe I could help out with this someday (have some plans in this regard) if you think it may be a good fit.
Cheers!
The text was updated successfully, but these errors were encountered: