-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] News Downloader - html filtering #6185
Comments
@NiLuJe Better to discuss that proxy thing here. ;-) #6205 mentioned https://flak.tedunangst.com/post/miniwebproxy which looks like an interesting alternative to Proxomitron and its modern semi-clone Privoxy. Definitely an interesting program. |
I think trying to put Go into KOReader is not really the optimal solution. There is LuaXML and it seems it would fit our use-case perfectly. It is just a matter of implementing it, and I am not that good with Lua. EDIT: It seems LuaXML is already implemented, so isn't it a matter of just creating a CSS element filter list? |
I didn't mean "interesting" in the "ship it" sense, quite the opposite. Apologies for any confusion. It was more thinking out loud in the "once upon a time I used Proxomitron" sense.
Probably, |
After a day of research, I have finally managed to find solution that is comparable in complexity to miniwebproxy. Luarock "htmlparser" does the task very well and I've managed to write simple filter just to check whether it would fit this usecase.
Are you (and other people important to the project) fine with including it with KOReader (it has LGPLv3 as license). Then, I could easily write in the function within the next week. The algo is generally not very fast (have not benchmarked it yet on my ereader) but on my laptop it is instantaneous for simpler sites, and a staggering time of 1 second for a Reuters article (result (from 255 kiB to 11 kiB)). Nonetheless, I still think this is an extremely worthy endevour. |
It looks acceptable to me; how about you @NiLuJe ? |
Also, the second idea by @georgew21 is also nice, but then I think it needs another issue as it is vastly different problem, but imho doable. |
I don't use the feature, so, can't really comment, but we certainly do bundle other stuff via luarocks, so I don't have any issue with that ;). |
c.f., how lua-Spore is handled (thirdparty/lua-Spore/CMakeLists.txt & Makefile.third in koreader-base). |
I think the second suggestion deserves new Issue, as it is whole beast upon itself and will require a widely different method (i.e. download/merge all feeds and then use epubDownloader which will need a different version if we want it to filter elements and have chapters). This gazette mode (let's call it like that) would be very handy for stuff like tweetRSS where individual feed of content is quite small. |
@lich-tex Please feel free to open a new issue. I intend to close this one as the CSS selector issue. |
Feature Request
Hello dear developers,
I find that News Downloader is inconvient. In many feeds, I need to turn a lot pages to find the text of article [on the other side, if i set to downlaod full aricle to false, i have only the description and not the full article). Also, it's annoying going back and after waiting to open each new artictle seperatly.
I would like to suggest 2 changes.
First, the option to download only specific tags of the wepbage (for example, div "body", mainarticle etc) - with the possibility of user to add more tags - so, if the plugin doesn't automatically get full article, the user can add some tags).
Saving all the articles of a feed in a single html file with contents table in the begining and hyper-links to each article.
Have a great day!
The text was updated successfully, but these errors were encountered: