Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

Closed
mindreframer opened this issue Jul 4, 2022 · 4 comments
Closed

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

mindreframer opened this issue Jul 4, 2022 · 4 comments

Comments

@mindreframer
Copy link

First - what an awesome project! It really makes searching of huge document libraries possible.

Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.

I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly.
Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.

Please consider allowing to use epub2txt2 as backend for EPUB extraction.

Thanks!

@phiresky
Copy link
Owner

phiresky commented Jul 4, 2022

in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type

@mindreframer
Copy link
Author

@phiresky OMG, that would be awesome! Any ideas, how the configuration would look like? E.g when I'm overriding Docx preprocessor, how would I specify it?

@ghost
Copy link

ghost commented Oct 7, 2022

in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type

Such feature would completely eradicate the embarrassing freezing issue of searching through epub folders.

Any idea of the delivery time for the next release ?

Thanks for the great tool btw !

@phiresky
Copy link
Owner

Starting with 1.0.0, it's possible to add custom adapters via the config file. If someone has a good suggestion for a file type please post it in show-your-adapter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants