-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Parsing HTML5 Pages #309
Comments
You haven't actually reported a bug or requested a feature. I can guess what the point is, but please modify the text of your issue to include a feature request or a bug report. Thanks! |
Title corrected accordingly |
Thanks! So the request is: support extracting feed items directly from HTML data? |
On Thu, 19 May 2022 06:35:32 -0700 Kurt McKee ***@***.***> wrote:
Thanks! So the request is: support extracting feed items directly
from HTML data?
Yes, but only on certain occasions, just like Liferea.
Of course, this leaves use with a limited options because we are
guessing an </article> entry.
Apparently, some websites that don't provide feeds, are useful when
treated as feeds, hence I think a very-specific guessing mechanism is
worth to have.
|
I think this is unacceptable. I want to close this issue (or change it). If someone has a problem with websites not providing web feeds (probably because they are unaware of this technology), contact the web admins. It's a better solution. What do you think, @kurtmckee? |
I would still like to see this in feedparser in the future, using the h-feed spec as a guide. For now, I'm fine with closing this issue. |
I didn't know there's a specs documentation for Thank you for sharing h-feed! |
It appears that the only Feed reader to handle
<article/>
tags is Liferea of Mr. Larse Windolf @lwindolf.Intoduction:
Subscribing To Html5 Websites That Have No Feed
First commit:
Add support for subscribing to HTML5 websites without RSS/Atom feeds by extracting article titles, links and descriptions
Last commit to daye:
if it exists and no article was foundImprove HTML5 extraction: extract
Test page:
https://miranda-ng.org/
https://www.brandenburg.de/
http://intertwingly.net/blog/
Frankly, this is one of the best features of Liferea to date, namely because novice users don't need to handle scrapping for pages with
<article/>
tag.The text was updated successfully, but these errors were encountered: