Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify feed ID generation and support URL fragments #1306

Merged
merged 2 commits into from
Oct 4, 2016

Conversation

Scheirle
Copy link
Member

@Scheirle Scheirle commented Apr 14, 2014

Changes:

  • Simplified the unique_id generation of a feed item
    • The new ids contain now two / more, one at the beginning of the path section and one at the end.
    • The new ids also support now url fragments (e.g. An entry located at http://test.de/index.html#fragment gets now the id tag:test.de,{date}:/index.html/fragment instead of tag:test.de,{date}:index.html)
  • Updated unittest output to match the new ids

TODO:

@justinmayer
Copy link
Member

Hi @Scheirle. Thanks for the contribution. I hadn't realized that the feedgenerator library has its own get_tag_uri function. That would have made my job quite a bit easier. :^)

That said, my original implementation produces the precise tag I'm looking for:

<id>tag:hackercodex.com,2013-10-30:guide/python-development-environment-on-mac-osx/</id>

... whereas the implementation in this pull request produces:

<id>tag:hackercodex.com,2013-10-30:/guide/python-development-environment-on-mac-osx//</id>

What is the benefit of prepending and appending a slash at the beginning and end of the URL?

If the extra slashes were kept as implemented in this pull request, at the very least we'd have to ensure the double-slash behavior showcased above is somehow avoided.

@Scheirle
Copy link
Member Author

What is the benefit of prepending and appending a slash at the beginning and end of the URL?

Technically there is no benefit by doing so, but following this guideline to create tag uris you only insert the date between the domain name an the path (Except from the protocol which gets removed.). Therefore all characters of the actual url are still there (especially the first slash).

The slash at the end comes from the lazy implementation of get_tag_uri, which always adds a slash before a possible fragment instead of only when a fragment is there.

To prevent double slashes we could strip the last characters if they are slashes:

...
unique_id=get_tag_uri(link.rstrip('/'), item.date),
...

But keep in mind if the url doesn't have a slash at the end. E.g http://test.de/index.html This modification still produces: tag:test.de,{date}:/index.html/.

In my opinion it is not worth it to strip the slashes manually, since the only purpose of the id is to be unique which is the case with or without double slashes. And since one reason for this pull request is to make the id generation as simple as possible I think we should rely on the get_tag_uri function.

But if you want me to update the pull request with the above modification I am pleased to do so.

@Scheirle
Copy link
Member Author

@justinmayer any thoughts on this?


The pelican comment system depends on the support of url fragments, so it can generates valid feed entry ids.

@justinmayer
Copy link
Member

@Scheirle: What do you think about modifying get_tag_uri such that it only adds a slash when it's appropriate to do so?

Please accept my sincere apologies for the absurdly long delay in responding!

@Scheirle
Copy link
Member Author

Scheirle commented Apr 4, 2016

What do you think about modifying get_tag_uri such that it only adds a slash when it's appropriate to do so?

Yeah that is probably the right way to do it.

@justinmayer
Copy link
Member

Corresponding feedgenerator pull request merged!

@justinmayer
Copy link
Member

Checked off another box in the above list. 🎊

@Scheirle
Copy link
Member Author

Finally this can be merged!

  • feedgenerator 1.9 got released
  • I updated the test output

It would save work (Avoid further merge conflicts) if this gets merged before #1989 resolves it merge conflicts.

@justinmayer justinmayer added this to the 3.7 milestone Sep 25, 2016
Copy link
Member

@justinmayer justinmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, Bernhard!

@justinmayer
Copy link
Member

justinmayer commented Oct 4, 2016

I have tested this, and it behaves as expected. Any other @getpelican/reviewers care to review this before it is merged?

@justinmayer justinmayer changed the title Simplified feed id generation and added support for url fragments Simplify feed ID generation and support URL fragments Oct 4, 2016
@justinmayer justinmayer merged commit 5a332ed into getpelican:master Oct 4, 2016
@Scheirle Scheirle deleted the master branch October 6, 2016 05:11
Totktonada added a commit to whatifrussian/website that referenced this pull request Nov 21, 2016
It gives back full content in RSS feeds. This was changed to
summary-only in [1]. The option was added in [2]. Both between 3.6.3 and
3.7.

The only changes in feeds between 3.6.3 (with hardcoded html5 as
markdown's output format) and 3.7 are the following:

* Atom feeds: full content of article moved from <summary/> to
  </content> and <summary/> becomes really summary.
* Atom/RSS feeds: strings in <id/> / <guid/> slightly changed by [3].

The first should be good, the second can cause some RSS/Atom readers to
find fake new items when the website will updated.

[1]: getpelican/pelican#1989
[2]: getpelican/pelican#2051
[3]: getpelican/pelican#1306
Totktonada added a commit to whatifrussian/website that referenced this pull request Nov 21, 2016
This allows to smoothly move from old IDs format to new one implemented
in [1].

[1]: getpelican/pelican#1306
@ionelmc
Copy link
Contributor

ionelmc commented Aug 14, 2017

Note that this causes a regression - different ids will be generated if you upgrade pelican!

@ionelmc
Copy link
Contributor

ionelmc commented Aug 14, 2017

This means all your items will get republished wherever they are consumed, an undesirable consequence (annoying feed spam).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants