Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong/missing encoding of ampersands in atom/rss feeds #5

Open
unriccio opened this issue May 25, 2017 · 4 comments
Open

Wrong/missing encoding of ampersands in atom/rss feeds #5

unriccio opened this issue May 25, 2017 · 4 comments
Labels

Comments

@unriccio
Copy link

Hi,
I got an "XML Parsing Error: not well-formed" using the pluto-generated feeds available on http://blogs.openstreetmap.org/ because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links/guids.

I don't know the details of that instance (version/release, environment, etc), but one of the developers suggested it should be a pluto issue.
Could you please check? Test case and details on gravitystorm/blogs.osm.org#28
(I checked past/closed issues about this but I couldn't find any)

Thanks.

@geraldb
Copy link
Member

geraldb commented May 25, 2017

@unriccio Thanks for reporting. Will look into the encoding of ampersands issue. Cheers.

@geraldb
Copy link
Member

geraldb commented Jan 26, 2020

I'm adding your original ticket / issue over here for easier reference:

I got some kind of "XML Parsing Error: not well-formed" using my favourite feedreader. I see the same error also when opening the atom/rss feeds with firefox and chromium.

It seems it's because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links and guid. Example follows: (sorry it's a spam entry but I think the issue still applies)

Original feed:

riccio@hactar:/tmp$ wget -q http://www.openstreetmap.org/diary/rss -O - | fgrep Beer | egrep "(link|guid)"
      <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&amp;%20Beer%20Garden/diary/41379</link>
      <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&amp;%20Beer%20Garden/diary/41379</guid>

Aggregator output:

riccio@hactar:/tmp$ wget -q https://blogs.openstreetmap.org/rss20.xml -O - | fgrep Beer | egrep "(link|guid)"
  <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</guid>
  <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</link>

Note: So %20&amp;%20 get changed to %20&%20 in the guid and link tag - is this correct? Why does it break the xml parsing? Needs to get checked.

@unriccio
Copy link
Author

unriccio commented Feb 3, 2020

Yep, correct.

The ampersand is used to escape entities (as indeed shown by "&"), so the parser will try to interpret "&%20" as if it was a proper entity.

@geraldb
Copy link
Member

geraldb commented Feb 3, 2020

Sorry for the long wait. I finally got around to check in detail. The error is in the feed templates (in the openstreetmap) repo that are missing xml escapes (CGI::escape_HTML) for guid and link that turns "unescaped" & back into escaped &amp;. I will try to send in a pull request later today and than close this ticket. Again thanks for reporting the error. Keep it up. Cheers. Prosit 2020!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants