Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NewsDownloader] Added an HTML filter through a CSS selector #6228

Merged
merged 11 commits into from
Jun 4, 2020

Conversation

hngt
Copy link
Contributor

@hngt hngt commented Jun 3, 2020

Works exactly as expected, requires htmlparser luarock which I do not know now how to include.

I am open to any corrections/edits. This is my first Lua code so please be wary.
Solves #6185 . I have tested it on Kindle PW3 now, and it works surprisingly quickly with convoluted HTML of Reuters (by just throwing my local .luarocks to rocks directory 🤣 .


This change is Reviewable

@poire-z
Copy link
Contributor

poire-z commented Jun 4, 2020

(Good for me as far as Lua coding is concerned. I can't help much with htmlparser integration.)

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working. ;-)
Screenshot_2020-06-04_12-24-23

@Frenzie
Copy link
Member

Frenzie commented Jun 4, 2020

Oh wait, PC World is just the feed text. 🤦 Regardless, it seems to be working on Reuters.

@hngt
Copy link
Contributor Author

hngt commented Jun 4, 2020

Oh wait, PC World is just the feed text. 🤦 Regardless, it seems to be working on Reuters.

If you want a complex example that takes a bit longer I recommend my /r/worldnews atom feed with direct links to articles. It does seem to struggle with couple of more complex sites (interestingly the depth has no relation to the duration of parsing, oh the wonders of modern HTML). But, in the end, except for sites that just resign after being downloaded by something that is not a modern web browser the result is quite fine.

@Frenzie
Copy link
Member

Frenzie commented Jun 4, 2020

@lich-tex I just meant that this + koreader/koreader-base#1110 seems to be functional.

Frenzie pushed a commit to koreader/koreader-base that referenced this pull request Jun 4, 2020
Frenzie added a commit to Frenzie/koreader that referenced this pull request Jun 4, 2020
Cf. <koreader#6228>

Includes:
* Update to OpenSSH 8.3p1 koreader/koreader-base#1107
* Update zsync2 koreader/koreader-base#1108
* thirdparty/harfbuzz 2.6.7 koreader/koreader-base#1109
* Add thirdparty/lua-htmlparser for newsdownloader koreader/koreader-base#1110
@Frenzie Frenzie linked an issue Jun 4, 2020 that may be closed by this pull request
plugins/newsdownloader.koplugin/feed_config.lua Outdated Show resolved Hide resolved
plugins/newsdownloader.koplugin/feed_config.lua Outdated Show resolved Hide resolved
plugins/newsdownloader.koplugin/feed_config.lua Outdated Show resolved Hide resolved
Frenzie added a commit that referenced this pull request Jun 4, 2020
Cf. <#6228>

Includes:
* Update to OpenSSH 8.3p1 koreader/koreader-base#1107
* Update zsync2 koreader/koreader-base#1108
* thirdparty/harfbuzz 2.6.7 koreader/koreader-base#1109
* Add thirdparty/lua-htmlparser for newsdownloader koreader/koreader-base#1110
@Frenzie Frenzie merged commit b741fce into koreader:master Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] News Downloader - html filtering
3 participants