-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NewsDownloader] Added an HTML filter through a CSS selector #6228
Conversation
(Good for me as far as Lua coding is concerned. I can't help much with htmlparser integration.) |
…the default of 1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, PC World is just the feed text. 🤦 Regardless, it seems to be working on Reuters. |
If you want a complex example that takes a bit longer I recommend my /r/worldnews atom feed with direct links to articles. It does seem to struggle with couple of more complex sites (interestingly the depth has no relation to the duration of parsing, oh the wonders of modern HTML). But, in the end, except for sites that just resign after being downloaded by something that is not a modern web browser the result is quite fine. |
@lich-tex I just meant that this + koreader/koreader-base#1110 seems to be functional. |
Cf. <koreader#6228> Includes: * Update to OpenSSH 8.3p1 koreader/koreader-base#1107 * Update zsync2 koreader/koreader-base#1108 * thirdparty/harfbuzz 2.6.7 koreader/koreader-base#1109 * Add thirdparty/lua-htmlparser for newsdownloader koreader/koreader-base#1110
Cf. <#6228> Includes: * Update to OpenSSH 8.3p1 koreader/koreader-base#1107 * Update zsync2 koreader/koreader-base#1108 * thirdparty/harfbuzz 2.6.7 koreader/koreader-base#1109 * Add thirdparty/lua-htmlparser for newsdownloader koreader/koreader-base#1110
Works exactly as expected, requires htmlparser luarock which I do not know now how to include.
I am open to any corrections/edits. This is my first Lua code so please be wary.
Solves #6185 . I have tested it on Kindle PW3 now, and it works surprisingly quickly with convoluted HTML of Reuters (by just throwing my local .luarocks to rocks directory 🤣 .
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)