[FEATURE] replace pandoc with epub2txt2 for Epub search #138

mindreframer · 2022-07-04T14:11:02Z

First - what an awesome project! It really makes searching of huge document libraries possible.

Currently I have a lot of issues with Epub parsing, pandoc hangs forever with 100% CPU when parsing some EPUB files, sometimes bigger, but sometimes also on smaller ones. Currently I don't have a good workaround for this.

I tried parsing those files that cause issues with https://github.com/kevinboone/epub2txt2 and it returns the content instantly.
Also, judging by the amount of issues here with EPUB parsing, this could be a a good solution for many other issues.

Please consider allowing to use epub2txt2 as backend for EPUB extraction.

Thanks!

phiresky · 2022-07-04T14:25:22Z

in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type

mindreframer · 2022-07-04T15:04:12Z

@phiresky OMG, that would be awesome! Any ideas, how the configuration would look like? E.g when I'm overriding Docx preprocessor, how would I specify it?

ghost · 2022-10-07T16:32:20Z

in the next version (when i or someone finally manages to make it work), the preprocessors will be configurable per file type

Such feature would completely eradicate the embarrassing freezing issue of searching through epub folders.

Any idea of the delivery time for the next release ?

Thanks for the great tool btw !

phiresky · 2023-05-26T14:58:41Z

Starting with 1.0.0, it's possible to add custom adapters via the config file. If someone has a good suggestion for a file type please post it in show-your-adapter

phiresky closed this as completed May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

mindreframer commented Jul 4, 2022

phiresky commented Jul 4, 2022

mindreframer commented Jul 4, 2022

ghost commented Oct 7, 2022

phiresky commented May 26, 2023

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

[FEATURE] replace pandoc with epub2txt2 for Epub search #138

Comments

mindreframer commented Jul 4, 2022

phiresky commented Jul 4, 2022

mindreframer commented Jul 4, 2022

ghost commented Oct 7, 2022

phiresky commented May 26, 2023