Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not require rel=self for discovery #36

Open
cweiske opened this issue May 26, 2015 · 6 comments
Open

Do not require rel=self for discovery #36

cweiske opened this issue May 26, 2015 · 6 comments

Comments

@cweiske
Copy link
Contributor

cweiske commented May 26, 2015

The discovery phase currently requires that a document has two relation links:

  1. rel=hub
  2. rel=self

What is the reason for rel=self?

In my eyes, rel=hub should suffice since rel=self will be the URL itself. It should be made optional.

cc @aaronpk @tantek - http://indiewebcamp.com/irc/2015-03-18#t1426690743557

@themel
Copy link
Contributor

themel commented May 26, 2015

The problem is canonicalization/feed aliasing. Most feeds can be accessed
under many URLs (HTTP vs HTTPS, multiple hostnames, infinite spaces of
ignored query parameters). The publisher can't/won't ping all of them when
there's an update to the feed. The self link is an explicit promise to ping
the self link topic when the feed changes, and this is the topic that
subscribers should use. If we drop the self link requirement, we can either
let subscribers that ended up on a feed via a URL that is not the canonical
wait for updates in vain (bad) or make the hub's job much more difficult
because it needs to understand that a ping to http://example.com/feed.xml
might also affect subscribers to https://example.com/feed.xml?foo=bar. This
fits the overall "center complexity in the hub" design approach, but it
would probably lead to a worse user experience because it's hard to do this
kind of aliasing detection reliably.

I also expect the gains from this simplification to be small since adding
two links to a feed is basically the same amount of work as adding one link.

On Tue, May 26, 2015 at 2:00 PM, Christian Weiske notifications@github.com
wrote:

The discovery phase
http://pubsubhubbub.github.io/PubSubHubbub/pubsubhubbub-core-0.4.html#discovery
currently requires that a document has two relation links:

  1. rel=hub
  2. rel=self

What is the reason for rel=self?

In my eyes, rel=hub should suffice since rel=self will be the URL itself.
It should be made optional.

cc @aaronpk https://github.com/aaronpk @tantek
https://github.com/tantek


Reply to this email directly or view it on GitHub
#36.

@cweiske
Copy link
Contributor Author

cweiske commented May 26, 2015

Actually, adding the hub link in Apache is a single configuration line only:

Header append Link '<http://phubb.cweiske.de/hub.php>; rel="hub"'

Adding the self URL is difficult because it's a dynamic URL. So it's not the same amount of work; quite the contrary.

I understand the issue about the same file being available under multiple URLs. But if there is no self link, the publisher could have to take care that the URLs are only available under one URL.

@tantek
Copy link
Contributor

tantek commented May 26, 2015

I agree with not requiring rel=self.

re: canonicalization - there is prior art here we should be re-using, that is, rel=canonical - which is already well deployed and in use.

Thus here is a specific proposal.

Change: Publishers MUST have a rel=self link at their URL ("the URL")
To: Publishers SHOULD have a rel=self link, but MAY instead:

  • provide a rel=canonical link (which they might have already) OR
  • assume rel=self same as the URL

Thus consuming code:

  • looks for a rel=self link, if not found
  • looks for a rel=canonical link, if not found
  • uses the current URL

Regarding: "since adding two links to a feed is basically the same amount of work as adding one link." - absolutely not true in experience. Example 1: what @cweiske said. Example 2: watching numerous users try to add the TWO links required for OpenID and screwing one of them up (in contrast to people trivially adding one rel=me link required for IndieAuth).

Basically, requiring two links instead of one for the very common case unnecessarily increases publisher responsibility and fragility of the whole system.

@julien51
Copy link
Member

I'm very strongly against this because this would bring one more case of silent failure. There's http vs https, there's also case issues and a bunch of other examples. Feedburner is pretty famous for this and f you subscribed to this URL instead of this one, you'd never get pings.

The worst case is for redirects and in this specific case, the hub has no way of matching the ping-ed URL and the actual feed resource.

Again, this is a particularly bad idea because this will silently fail. A subscriber who subscribes to a URL different from the one that is actually pinged to the hub will never receive notifications, and never be able to tell why (because he cannot know which URL is being pinged). THAT makes the protocol fragile.

I'm all sorry for anyone working with Apache in general, but I don't think it's a good idea to base a spec on the difficulty of implementing something with a specific web server. I believe most web frameworks will make it trivial to add one Link header vs. 2 (or 100).

Now, if the whole debate is to say that "canonical" is better than "self", I'll let you fight around this. We can easily change the spec to tell to subscribers:

  • Use self if there is one
  • Use canonical if you can't find one
    And to publishers:
  • put either self of canonical.

@romkatv
Copy link

romkatv commented May 29, 2015

On Fri, May 29, 2015 at 9:34 AM, Julien Genestoux notifications@github.com
wrote:

Feedburner is pretty famous for this and f you subscribed to this URL
http://feeds.feedburner.com/TechCrunch/ instead of this one
http://feeds.feedburner.com/Techcrunch/, you'd never get pings.

Minor correction: subscribing to any of these will work:

This doesn't invalidate the point Julien is making. Topic aliasing is a
real problem. Correct self links are vital for ensuring that subscribers
are listening to the exact topics that the publisher is pinging.

Roman.

@julien51
Copy link
Member

I stand corrected, but that was a large painpoint for along time. I'm glad you guys fixed it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants