Simplify feed ID generation and support URL fragments #1306

Scheirle · 2014-04-14T07:16:32Z

Changes:

Simplified the unique_id generation of a feed item
- The new ids contain now two / more, one at the beginning of the path section and one at the end.
- The new ids also support now url fragments (e.g. An entry located at http://test.de/index.html#fragment gets now the id tag:test.de,{date}:/index.html/fragment instead of tag:test.de,{date}:index.html)
Updated unittest output to match the new ids

TODO:

Update get_tag_uri: Changes get_tag_uri so that the '/' before a fragment gets only added if there is a fragment feedgenerator#4
Wait for Add updated date to feed using modified. Fix #1933 #1937 (no technical reason; it's just easier for everyone)
Wait for feedgenerator 1.9
Update the test output with the new ids

justinmayer · 2014-04-18T19:32:27Z

Hi @Scheirle. Thanks for the contribution. I hadn't realized that the feedgenerator library has its own get_tag_uri function. That would have made my job quite a bit easier. :^)

That said, my original implementation produces the precise tag I'm looking for:

<id>tag:hackercodex.com,2013-10-30:guide/python-development-environment-on-mac-osx/</id>

... whereas the implementation in this pull request produces:

<id>tag:hackercodex.com,2013-10-30:/guide/python-development-environment-on-mac-osx//</id>

What is the benefit of prepending and appending a slash at the beginning and end of the URL?

If the extra slashes were kept as implemented in this pull request, at the very least we'd have to ensure the double-slash behavior showcased above is somehow avoided.

Scheirle · 2014-04-19T06:51:45Z

What is the benefit of prepending and appending a slash at the beginning and end of the URL?

Technically there is no benefit by doing so, but following this guideline to create tag uris you only insert the date between the domain name an the path (Except from the protocol which gets removed.). Therefore all characters of the actual url are still there (especially the first slash).

The slash at the end comes from the lazy implementation of get_tag_uri, which always adds a slash before a possible fragment instead of only when a fragment is there.

To prevent double slashes we could strip the last characters if they are slashes:

...
unique_id=get_tag_uri(link.rstrip('/'), item.date),
...

But keep in mind if the url doesn't have a slash at the end. E.g http://test.de/index.html This modification still produces: tag:test.de,{date}:/index.html/.

In my opinion it is not worth it to strip the slashes manually, since the only purpose of the id is to be unique which is the case with or without double slashes. And since one reason for this pull request is to make the id generation as simple as possible I think we should rely on the get_tag_uri function.

But if you want me to update the pull request with the above modification I am pleased to do so.

Scheirle · 2014-07-12T19:05:23Z

@justinmayer any thoughts on this?

The pelican comment system depends on the support of url fragments, so it can generates valid feed entry ids.

justinmayer · 2016-04-04T19:46:16Z

@Scheirle: What do you think about modifying get_tag_uri such that it only adds a slash when it's appropriate to do so?

Please accept my sincere apologies for the absurdly long delay in responding!

Scheirle · 2016-04-04T19:59:28Z

What do you think about modifying get_tag_uri such that it only adds a slash when it's appropriate to do so?

Yeah that is probably the right way to do it.

justinmayer · 2016-04-07T00:57:50Z

Corresponding feedgenerator pull request merged!

justinmayer · 2016-04-08T16:16:48Z

Checked off another box in the above list. 🎊

Scheirle · 2016-09-25T09:10:37Z

Finally this can be merged!

feedgenerator 1.9 got released
I updated the test output

It would save work (Avoid further merge conflicts) if this gets merged before #1989 resolves it merge conflicts.

justinmayer

Excellent work, Bernhard!

justinmayer · 2016-10-04T15:09:44Z

I have tested this, and it behaves as expected. Any other @getpelican/reviewers care to review this before it is merged?

It gives back full content in RSS feeds. This was changed to summary-only in [1]. The option was added in [2]. Both between 3.6.3 and 3.7. The only changes in feeds between 3.6.3 (with hardcoded html5 as markdown's output format) and 3.7 are the following: * Atom feeds: full content of article moved from <summary/> to </content> and <summary/> becomes really summary. * Atom/RSS feeds: strings in <id/> / <guid/> slightly changed by [3]. The first should be good, the second can cause some RSS/Atom readers to find fake new items when the website will updated. [1]: getpelican/pelican#1989 [2]: getpelican/pelican#2051 [3]: getpelican/pelican#1306

This allows to smoothly move from old IDs format to new one implemented in [1]. [1]: getpelican/pelican#1306

ionelmc · 2017-08-14T13:21:25Z

Note that this causes a regression - different ids will be generated if you upgrade pelican!

ionelmc · 2017-08-14T13:24:12Z

This means all your items will get republished wherever they are consumed, an undesirable consequence (annoying feed spam).

Scheirle mentioned this pull request Apr 14, 2014

[pelican_comment_system] Added Avatars, Identicons and Comment Atom Feed getpelican/pelican-plugins#190

Merged

justinmayer added the needs more work label Apr 18, 2014

Scheirle mentioned this pull request Oct 8, 2015

Question: extract_toc doesn't work if pandoc isn't available but pandoc_reader is getpelican/pelican-plugins#582

Closed

Scheirle force-pushed the master branch 2 times, most recently from 780c93d to 09cabd1 Compare April 4, 2016 20:17

Scheirle mentioned this pull request Apr 4, 2016

Changes get_tag_uri so that the '/' before a fragment gets only added if there is a fragment getpelican/feedgenerator#4

Merged

Scheirle added 2 commits September 25, 2016 10:54

Simplified feed id generation and added support for url fragments

1eee6f0

Update feed test output to match the new feed id

eb435dc

Scheirle force-pushed the master branch from 09cabd1 to eb435dc Compare September 25, 2016 09:02

justinmayer added this to the 3.7 milestone Sep 25, 2016

justinmayer added awaiting review and removed needs more work labels Sep 27, 2016

justinmayer approved these changes Oct 4, 2016

View reviewed changes

ingwinlu approved these changes Oct 4, 2016

View reviewed changes

justinmayer removed the awaiting review label Oct 4, 2016

justinmayer changed the title ~~Simplified feed id generation and added support for url fragments~~ Simplify feed ID generation and support URL fragments Oct 4, 2016

justinmayer merged commit 5a332ed into getpelican:master Oct 4, 2016

Scheirle deleted the master branch October 6, 2016 05:11

Totktonada added a commit to whatifrussian/website that referenced this pull request Nov 21, 2016

Preserve published pages/articles RSS/Atom IDs

a562e18

This allows to smoothly move from old IDs format to new one implemented in [1]. [1]: getpelican/pelican#1306

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify feed ID generation and support URL fragments #1306

Simplify feed ID generation and support URL fragments #1306

Scheirle commented Apr 14, 2014 •

edited

Loading

justinmayer commented Apr 18, 2014

Scheirle commented Apr 19, 2014

Scheirle commented Jul 12, 2014

justinmayer commented Apr 4, 2016

Scheirle commented Apr 4, 2016

justinmayer commented Apr 7, 2016

justinmayer commented Apr 8, 2016

Scheirle commented Sep 25, 2016

justinmayer left a comment

justinmayer commented Oct 4, 2016 •

edited

Loading

ionelmc commented Aug 14, 2017

ionelmc commented Aug 14, 2017

Simplify feed ID generation and support URL fragments #1306

Simplify feed ID generation and support URL fragments #1306

Conversation

Scheirle commented Apr 14, 2014 • edited Loading

justinmayer commented Apr 18, 2014

Scheirle commented Apr 19, 2014

Scheirle commented Jul 12, 2014

justinmayer commented Apr 4, 2016

Scheirle commented Apr 4, 2016

justinmayer commented Apr 7, 2016

justinmayer commented Apr 8, 2016

Scheirle commented Sep 25, 2016

justinmayer left a comment

Choose a reason for hiding this comment

justinmayer commented Oct 4, 2016 • edited Loading

ionelmc commented Aug 14, 2017

ionelmc commented Aug 14, 2017

Scheirle commented Apr 14, 2014 •

edited

Loading

justinmayer commented Oct 4, 2016 •

edited

Loading