Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust Atom/RSS feed generator #24

Open
1 task
novoid opened this issue Sep 30, 2018 · 8 comments
Open
1 task

Robust Atom/RSS feed generator #24

novoid opened this issue Sep 30, 2018 · 8 comments
Labels
bug helpwanted priohigh quickfix small change necessary to fix this

Comments

@novoid
Copy link
Owner

novoid commented Sep 30, 2018

With my simple Atom/RSS feed generating code, I often had issues with some Atom/RSS aggregators in combination with special characters, encoding, and such. Images don't work so far within Atom/RSS feeds.

Maybe somebody want to volunteer to add a decent Atom/RSS support that generates rock solid Atom/RSS feeds so that people might follow the "full content" feed and not the "link only" feed which I am forced to recommend.

  • remove recommendation for simple feed
@novoid
Copy link
Owner Author

novoid commented Oct 4, 2020

Main issues I could identify so far were related to HTML snippets from Twitter and Youtube.

@novoid
Copy link
Owner Author

novoid commented Jan 13, 2021

Current feed is broken again.
Can be verified via Thunderbird (stefan2904 reported issue) or W3C Validator:

This feed does not validate.

    line 216, column 151: XML parsing error: <unknown>:216:151: not well-formed (invalid token) [help]

        ... up/ghxrq87/?utm_source=reddit&utm_medium=web2x&amp;context=3">this reddi ...
                                                     ^

In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

    line 5, column 99: Self reference doesn't match document location [help]

        ... g-all.atom_1.0.links-and-content.xml" />
                                                     ^

    line 7, column 26: Identifier "http://Karl-Voit.at/" is not in canonical form (the canonical form would be "http://karl-voit.at/") (2 occurrences) [help]

          <id>http://Karl-Voit.at/</id>
                                  ^

@novoid novoid added bug quickfix small change necessary to fix this priohigh and removed enhancement labels Jan 13, 2021
stefan2904 added a commit to stefan2904/lazyblorg that referenced this issue Jan 13, 2021
@stefan2904
Copy link
Contributor

@novoid I think merging of PR #58 accidentally closed this issue ... I don't think the PR fixes the overall robustness problems. ;-)

@novoid
Copy link
Owner Author

novoid commented Jan 13, 2021

Thanks @stefan2904 ,

I had to revert your commits because it resulted in unwanted replacements such as &lt; to &amp;lt;.

Note to myself: there is no distinction between HTML-content for the feeds and the blog articles. Therefore, with the current approach, there are following possibilities:

  1. Parse the generated feed data and do a smart search & replace of the characters like multiple "&" within URLs only.
    • Would require: list of characters to replace (and which cause errors); a parser which results only in replacements that are necessary.
  2. Another attempt to switch to CDATA for the feeds
  3. Evaluate and integrate any external feed library (a dependency I would like not to have when possible).

@novoid novoid reopened this Jan 13, 2021
@novoid
Copy link
Owner Author

novoid commented Dec 21, 2021

Note: maybe https://sr.ht/~brettgilio/org-webring/ could be part of the solution?

@novoid
Copy link
Owner Author

novoid commented Dec 21, 2021

Another workaround is in progress (althoug not particularily for this issue): replacing all external content (iframes from YouTube, Mastodon, Twitter) with images and links.

Update 2021-12-29: with the most recent implementation of img-images that are a a-href link, this approach will be started: I'll replace all my embedded Twitter/Mastodon/YouTube snippets with screenshots + links. While this doesn't fix the original issue with broken feed files, it avoids them because almost every feed issue is related to external content.

@btrummer
Copy link

Current XML parse error in https://karl-voit.at/feeds/lazyblorg-all.atom_1.0.links-and-teaser.xml:

Line 310:
<a href="https://duckduckgo.com/?t=ffab&q=impfskeptik+deutschsprachig+europa&amp;ia=web">und weitere</a>

The first '&' is not replaced with '&', causing an XML parse error in KDE kontact.

@novoid
Copy link
Owner Author

novoid commented Jan 2, 2022

Current XML parse error in https://karl-voit.at/feeds/lazyblorg-all.atom_1.0.links-and-teaser.xml:

Line 310: <a href="https://duckduckgo.com/?t=ffab&q=impfskeptik+deutschsprachig+europa&amp;ia=web">und weitere</a>

The first '&' is not replaced with '&', causing an XML parse error in KDE kontact.

Thanks for reporting. This is not related to the issue here. It's a new bug which is handled in #64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug helpwanted priohigh quickfix small change necessary to fix this
Projects
None yet
Development

No branches or pull requests

3 participants