You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First - what an awesome project! It really makes searching of huge document libraries possible.
Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.
I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly.
Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.
Please consider allowing to use epub2txt2 as backend for EPUB extraction.
Thanks!
The text was updated successfully, but these errors were encountered:
@phiresky OMG, that would be awesome! Any ideas, how the configuration would look like? E.g when I'm overriding Docx preprocessor, how would I specify it?
Starting with 1.0.0, it's possible to add custom adapters via the config file. If someone has a good suggestion for a file type please post it in show-your-adapter
First - what an awesome project! It really makes searching of huge document libraries possible.
Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.
I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly.
Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.
Please consider allowing to use epub2txt2 as backend for EPUB extraction.
Thanks!
The text was updated successfully, but these errors were encountered: