Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(post comments) Enabling PubSubHubBub for GitHub hosted blogs #11

Open
izuzak opened this issue Feb 17, 2011 · 27 comments
Open

(post comments) Enabling PubSubHubBub for GitHub hosted blogs #11

izuzak opened this issue Feb 17, 2011 · 27 comments

Comments

@izuzak
Copy link
Owner

izuzak commented Feb 17, 2011

This issue is reserved for comments on the blog post Enabling PubSubHubBub for GitHub hosted blogs. Leave a comment below and it will show up on the blog post's Web page. Thanks!

@sergeylukin
Copy link

As far as I can see now Github sends custom url query params in Web hooks as
well as it's own parameters. The
question is if services like pubsubhubbub.appspot.com and pingomatic.com know
processing these requests properly. I think these services need some monitoring
interface to make it possible debugging the pings.. Ivan what are you using now to
ping them?

@izuzak
Copy link
Owner Author

izuzak commented Aug 18, 2012

Hi @sergeylukin, I'm glad you find this idea interesting!

I just tested using http://requestb.in and -- you're absolutely right! GitHub now doesn't delete user-defined query params from post-commit Web Hooks! However, this change doesn't help a lot with regard to using these post-commit Web Hooks for pinging a PSHB hub, as you correctly suspected. The problem is that the PSHB hub expects the ping parameters in the ping HTTP request's BODY, not in the URL query part (see: http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.3.html#anchor9). And since there is no way to force GitHub to use a specific message body when making a request to Web Hooks, we still have to use a proxy which translates GitHub's request into a correctly formatted PSHB ping request, as I described in my blog post.

This proxy is still what I am using for pinging a PSHB hub (pubsubhubbub.appspot.com): when you (or GitHub) make a request to http://urlreq.appspot.com/pshbpinggae/http://ivanzuzak.info/atom.xml, the proxy will make a PSHB PING request (HTTP POST request to a PSHB hub) with the correct parameters in the POST request body, for my blog http://ivanzuzak.info/atom.xml. This proxy is also running on Google AppEngine so you can count on it being available almost 100% of the time.

I haven't used http://pingomatic.com/ so far, so I can't give you a useful answer. I will check it out, though. What I can do is to recommend that you have a look at the documentation for my proxy service -- https://github.com/izuzak/urlreq (see the API section in the README). This proxy service is very generic and you can use it to translate any HTTP request R1 into another HTTP request R2, using parameters defined in the URL query part of R1.

Thanks again for commenting, let me know if I missed something or you need help!

@sergeylukin
Copy link

Hi @izuzak, thanks for fast and detailed reply!

I've read the specification you linked to, and you're right, there they say that parameters should be passed in the body.
They also say: "If the notification was acceptable, the hub MUST return a 204 No Content response. If the notification is not acceptable for some reason, the hub MUST return an appropriate HTTP error response code (4xx and 5xx)."
So I decided to test it.

I sent following request:

POST /publish?hub.mode=publish&hub.url=http%3A%2F%2Fsergeylukin.com%2Ffeed&something=else HTTP/1.1
User-Agent: Fiddler
Host: pubsubhubbub.appspot.com
Content-length: 0

and received

HTTP/1.1 204 No Content
Cache-Control: no-cache
Content-Type: text/plain
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Sat, 18 Aug 2012 16:15:35 GMT
Server: Google Frontend
Content-Length: 0

Same way it worked for Ping-O-Matic:

Request

POST /ping/?title=The+personal+website+of+web+developer+Sergey+Lukin&blogurl=http%3A%2F%2Fsergeylukin.com&rssurl=http%3A%2F%2Fsergeylukin.com%2Ffeed&chk_weblogscom=on&chk_blogs=on&chk_feedburner=on&chk_newsgator=on&chk_myyahoo=on&chk_pubsubcom=on&chk_blogdigger=on&chk_weblogalot=on&chk_newsisfree=on&chk_topicexchange=on&chk_google=on&chk_tailrank=on&chk_postrank=on&chk_skygrid=on&chk_collecta=on&chk_superfeedr=on&something=else HTTP/1.1
User-Agent: Fiddler
Host: pingomatic.com
Content-length: 0

Response

HTTP/1.1 200 OK
Server: nginx
Date: Sat, 18 Aug 2012 16:19:42 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
Vary: Accept-Encoding
Set-Cookie: blogurl=http%3A%2F%2Fsergeylukin.com; expires=Wed, 12-Dec-2012 10:06:22 GMT; path=/; domain=.pingomatic.com
Set-Cookie: rssurl=http%3A%2F%2Fsergeylukin.com%2Ffeed; expires=Wed, 12-Dec-2012 10:06:22 GMT; path=/; domain=.pingomatic.com
Set-Cookie: title=The+personal+website+of+web+developer+Sergey+Lukin; expires=Wed, 12-Dec-2012 10:06:22 GMT; path=/; domain=.pingomatic.com
Set-Cookie: pinged=a%3A16%3A%7Bi%3A0%3Bs%3A14%3A%22chk_weblogscom%22%3Bi%3A1%3Bs%3A9%3A%22chk_blogs%22%3Bi%3A2%3Bs%3A14%3A%22chk_feedburner%22%3Bi%3A3%3Bs%3A13%3A%22chk_newsgator%22%3Bi%3A4%3Bs%3A11%3A%22chk_myyahoo%22%3Bi%3A5%3Bs%3A13%3A%22chk_pubsubcom%22%3Bi%3A6%3Bs%3A14%3A%22chk_blogdigger%22%3Bi%3A7%3Bs%3A14%3A%22chk_weblogalot%22%3Bi%3A8%3Bs%3A14%3A%22chk_newsisfree%22%3Bi%3A9%3Bs%3A17%3A%22chk_topicexchange%22%3Bi%3A10%3Bs%3A10%3A%22chk_google%22%3Bi%3A11%3Bs%3A12%3A%22chk_tailrank%22%3Bi%3A12%3Bs%3A12%3A%22chk_postrank%22%3Bi%3A13%3Bs%3A11%3A%22chk_skygrid%22%3Bi%3A14%3Bs%3A12%3A%22chk_collecta%22%3Bi%3A15%3Bs%3A14%3A%22chk_superfeedr%22%3B%7D; expires=Wed, 12-Dec-2012 10:06:22 GMT; path=/; domain=.pingomatic.com

b06


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
<h2>Pinging complete!</h2>
<p class="nomargin"><a href="">Bookmark this page</a> and come back to it 
later to automatically re-ping.</p>
...
</body>
</html>

Ivan, what do you think? Looks like it is working, but probably I'm missing something..

Btw, I get 503 error when trying to ping (http://urlreq.appspot.com/pshbpinggae/http://ivanzuzak.info/atom.xml)
Looks like it's out of quota or something..

@izuzak
Copy link
Owner Author

izuzak commented Aug 19, 2012

@sergeylukin -- whoa, that's awesome! It never occurred to me to actually test pubsubhubbub.appspot.com to see if they strictly follow the specification. I'm glad you did! :). Anyway, you're not missing anything -- I guess the reason why this is working is that most AppEngine apps use the webapp2 Python framework, and the webapp2 framework automatically parses both URL query parameters and parameters in the body of a POST request (if the request is of application/x-www-form-urlencoded content type) and combines both of these parameters in the same dictionary object. See the docs on the webapp2 framework that explain this:
http://webapp-improved.appspot.com/guide/request.html -- "The request object provides a get() method that returns values for arguments parsed from the query and from POST data.". And the exact place in the pubsubhubbub.appspot.com source code where this is used is:
https://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/main.py#2374

So, it looks like the proxy is not actually needed and you can just put everything into the query parameters of the GitHub Web Hook, as you originally suggested.

Also, thanks for letting me know that the proxy is down with an "Out of quota" -- some spammer has been draining the service without a reason. I'll blacklist him today and the service should be fully operational tomorrow.

Again, very nice work on your part and thanks for researching this! I'll update the blog post to include this info.

@sergeylukin
Copy link

Yes, those are great news! ..and actually another reason for using Github pages for static websites:)
However, I think it would be better if we could just modify (or at least add) the content of Request BODY sent by WebHooks as for now one can only know that query string parameters are being sent by WebHooks in POST Request and that query string parameters are accepted by PSHB by purely testing it and hope for the best. Before running my tests I was 99% sure that they will fail:) Besides that, it sounds to me more reasonable to pass parameters in the BODY when sending a POST request..
But anyways I'm glad it is working and that it is so easy to setup a PSHB support in a website hosted at Github Pages.
And I'm glad I participated in discovering that.
Ivan, thank you so much for inspiration and it was a pleasure to dig into the issue with you.

@izuzak
Copy link
Owner Author

izuzak commented Aug 19, 2012

Great news, indeed! And I strongly agree with you -- it would be better if one could specify the content of the POST request body, e.g. using some templating language and variables. Perhaps one day GitHubbers will come to the same conclusion :).

I'm really glad you discovered this -- keep up the great work!

@bcomnes
Copy link

bcomnes commented May 9, 2013

Is there a way to tell if this is working or not? I set things up over here https://github.com/bcomnes/bcomnes.github.io/blob/master/atom.xml
but I can't really tell if its working or not.

My feed does not seem to be updating in real time over at newsblur or google reader (or whats left of it).

@izuzak
Copy link
Owner Author

izuzak commented May 9, 2013

Hey @bcomnes! Glad you're trying this out.

One question first -- which approach are you using for pinging PSHB? The one which I described in the blog post (a webhook using the urlreq service as a proxy) or the one described in the comments here (webhook directly communicating with the pshb hub)?

Your atom.xml file looks alright to me, and I haven't updated my blog in a while so I don't know if the webhook setup is still working for my blog even.

But anyway, the way I'd test if everything was working as intended is:

  1. check that the GitHub webhook is firing after you update your feed

To do this, create a request bin http://requestb.in/. A request bin is basically a service that stores every requests it receives so that you can inspect them manually. So, create a bin, and then create a GitHub Service hoook with the URL of the bin as the URL of hook (bin URLs will look like this: http://requestb.in/1f0gsjk1). After you set that up, make an update to your feed (best if you create a new blog post) and visit the inspection page for the bin to see if the webhook fired and made a request to the bin (the URL of the inspection page for the bin is the same as the URL of the bin, just append ?inspect). If you see a request there - then the webhook is firing. If not - then something is broken in GitHub's infrastructure for firing webhooks on repository updates.

  1. check that the PSHB hub is fetching your feed when the webhook notifies it

After you made sure that webhooks are working in general, create another request bin. You will now use this bin to test if the PSHB hub is fetching feeds correctly. To do this, create another GitHub service hook, and specify this as the hook URL: http://pubsubhubbub.appspot.com/publish?hub.mode=publish&hub.url=URL_OF_THE_BIN_YOU_CREATED. Now, update your blog (create a new blog post), and the GitHub hook should trigger the PSHB hub to make a request to the bin you created (because it will think that it is fetching a feed that was just updated). So, inspect your bin and if there was no request made - then there is a problem with the PSHB hub and you should contact the maintainers of the hub.

If you checked these two things, then there is either a problem in Google Reader (e.g. it is no longer updating in real time), or in the PSHB hub (it is no longer correctly forwarding blog updates to subscribers). Checking which of these two is true is a bit more complicated than the first two checks. I'd not put too much faith into Google Reader since it will shut down anyway.

+You can do all of these things in another repo so that you don't make a mess of your blog feed (and irritate your followers).

Hope this help. Let me know what you find out or if I can help out somehow!

@bcomnes
Copy link

bcomnes commented May 9, 2013

http://requestb.in/ is an awesome tool! Thanks for pointing that out. And thanks for the detailed tips! Awsome.

I have the following GH hooks enabled:
http://pubsubhubbub.appspot.com/publish?hub.mode=publish&hub.url=http%3A%2F%2bre.tc%2atom.xml
http://pubsubhubbub.appspot.com/publish?hub.mode=publish&hub.url=http://requestb.in/pwcg1xpw

And they seem to be working. Request bin acknowledges these requests:

GET /pwcg1xpw HTTP/1.1
User-Agent: AppEngine-Google; (+http://code.google.com/appengine; appid: s~pubsubhubbub-hrd)
Host: requestb.in
Connection: close
Accept-Encoding: gzip

Also adding the bin url as a hook sent the commit information out which request bin collected perfectly.

I think I am going to have to assume the issue is with the RSS readers. I'll get in contact with newsblur, one of the few PuSH enabled rss readers I could find, to see if I can get any pointers as to why that service isnt getting PuSH updates on my feed.

It also occurred to me, since I am not pre-processing my site, there is a delay between the commit, and the actual site updating. The delay is only a matter of a few seconds, but could this delay create some kind of race condition between the hook and the actual updated atom feed being available? For example, using https://pubsubhubbub.appspot.com/publish to check the feed, it never seems to reflect the most recent post! Hrrmm. Maybe this does not matter either, I'm not sure. It looks like your PuSH feed is one story behind your latest blogpost, maybe indicating that it fetched the atom file before the new one was built yet? I might just be reading it wrong or have a misunderstanding of how this works.

@izuzak
Copy link
Owner Author

izuzak commented May 10, 2013

That's an excellent point, Bret! Never thought about the possibility of a race condition happening, and now that you've mentioned it - I do think that it is possible. So, it is possible that everything is actually working, but the PSHB hub is fetching your atom.xml file before it was updated by Jekyll with the new post (if Jekyll processing is slow for some reason). I've never had this problem with my feed, though.

The only way to make sure is to create a small PSHB subscriber application that subscribes to your feed and outputs the updates received from PSHB to the console. This is not a complicated thing to do, you should be able to do it in your favorite programming language relatively easily. Anyway, if this demo subscriber app does not receive any updates after you create a new post or if it receives an update without the latest post -- you'll know what's going on. And if it receives the latest post as expected -- then it's a problem with Google Reader and newsblur.

+Yeah, Request bin is brilliant :). I use it and hurl.it very often when debugging.

@sergeylukin
Copy link

Race condition is indeed possible, I even didn't think about it. All depends on how GH implemented webhooks calls and Jekyll builds.
Anyways, after GH upgraded to Liquid v2.3 without providing backward compatibility which broke the builds completely for those using {% literal %} tags, I trust more to local pre-processing. Also I'm looking into converting my blog to Docpad.

@bcomnes
Copy link

bcomnes commented May 11, 2013

A githubber in IRC also sees the possibility that since the GH-page builds in the background after the commit is made, post commit services could end up in a race condition with the page build. If only there was a way to delay the process just long enough so that the page build always wins.

Maybe that PuSH proxy method you had going prior to this prevented this kind of issue? Who knows.

http://push-bot.appspot.com was recommended to me as tool used to debug a new feed, and it also works quite well as it give you somewhat reliable feedback through the whole process of subscribing and receiving PuSH notifications. Basically you add an XMPP bot to a chat program and tell it to push you feed updates via IM.

I'll think about some possible solutions to this situation and run more tests.

@izuzak
Copy link
Owner Author

izuzak commented May 11, 2013

Yeah, it's possible that the proxy service introduced just the right amount of delay so that everything worked out OK. That's distributed systems for ya :)

Let me know what you come up with. It seems that making a similar proxy service that has a hard-coded delay (e.g. 10-20 secs) might be an easy way out. I was also hoping that GitHub's API would have an event that signaled the completion of pages rendering, but doesn't seem to exist: http://developer.github.com/v3/activity/events/

@bcomnes
Copy link

bcomnes commented Jul 5, 2013

So indeed, after talking with more folks and playing around, this race condition is real. The webhook and the site build are triggered at the same time.

I have not done a huge amount of probing, but it looks like it sometimes requires two commits (and thus two webhooks) to get PuSH to notice your new content in some of my tests, building on the GH servers.

I emailed GH support about this, and here was their response

I was wondering if there was a way to send a webhook after a gh-page build.

That's not possible currently, but we might add it in the future. Thanks for the suggestion.

I also was in touch with someone from Superfeedr, another company that runs a popular PuSH hub. I was told that it would be difficult to implement some kind of delay parameter on the hub side due to scaling complications, but that they will look into it some more.

So here we are. GH is opaque about this (hopefully someone internally cares about this stuff) and this may or may not ever be addressed, and other hubs are reluctant to accommodate this kind of need due to potential scaling difficulties (although, understandably adding delay parameters is ultimately a hack).

I think playing around with your urlreq app sounds like the best solution at the moment, until github gets their act together when it comes to their API interacting with their pages system.

Would it be difficult to add some kind of delay parameter to urlreq? I am forking your project to try and play around with some ideas I have, but would you be interested in contributions back to the project along those lines?

@izuzak
Copy link
Owner Author

izuzak commented Jul 6, 2013

Wow, really appreciate the effort you've put into digging through this issue, @bcomnes!

I believe that eventually GitHub will indeed add a webhook for gh-page builds, but it's hard to tell when that will happen since it's probably not an oft requested feature and they have other priorities. As for hub providers adding a delay - I doubt it will ever happen.

I have no problem with adding a delay to urlreq and think it would be the easiest way to resolve this for now. I'd welcome contributions for sure, if you have the time to hack on it. 👍 If not - let me know and I'll add it next week (Monday/Tuesday-ish).

Cheers! 🎆

@bcomnes
Copy link

bcomnes commented Jul 9, 2013

I would like to hack on it, as it seems learning about google app engine could prove very useful for some ideas I want to experiment with, but I am a total noob at python and the app engine platform itself, so it would probably take me a while to figure something like that out. If you are willing, adding a delay parameter would be awesome and would likely get done better and faster since you already know your way around urlreq.

@izuzak
Copy link
Owner Author

izuzak commented Jul 10, 2013

Sure, no problem -- I'll try to add it later this week. But one question though -- what's the expected span of the delays? If these are delays under a minute, then the solution is simpler than in the case where the delays are higher than a minute. The reason is that AppEngine normally has a 60 second limit on the time allowed to process a request. If delays higher than that are needed - I have to try out some other approaches from the AppEngine platform.

+I totally love AppEngine -- it has a nice free quota and good platform APIs. ✨

@bcomnes
Copy link

bcomnes commented Jul 10, 2013

I'm my experience, it's a matter of seconds. But I don't have that many
posts to process either. A 60ish second widow seems extremely generous.
I'll probably start with 5 or 10 seconds.

-Bret Comnes
Sent from my mobile

On Jul 10, 2013, at 6:14 AM, "Ivan Žužak" notifications@github.com wrote:

Sure, no problem -- I'll try to add it later this week. But one question
though -- what's the expected span of the delays? If these are delays under
a minute, then the solution is simpler than in the case where the delays
are higher than a minute. The reason is that AppEngine normally has a 60
second limithttps://developers.google.com/appengine/docs/python/#The_Request_Timeron
the time allowed to process a request. If delays higher than that are
needed - I have to try out some other approaches from the AppEngine
platform.

+I totally love AppEngine -- it has a nice free quota and good platform
APIs. [image: ✨]


Reply to this email directly or view it on
GitHubhttps://github.com//issues/11#issuecomment-20740884
.

izuzak added a commit to izuzak/urlreq that referenced this issue Jul 11, 2013
@izuzak
Copy link
Owner Author

izuzak commented Jul 11, 2013

I've added support for an optional delay parameter, which accepts values from 0 to 30. This defines the amount of time (in seconds) that urlreq will "wait" before issuing the request.

Example request without delay:
http://urlreq.appspot.com/req?method=POST&url=https%3a//api.github.com/markdown/raw&body=**Hello**%20_World_&Content-Type=text/plain

Example request with 5 second delay:
http://urlreq.appspot.com/req?method=POST&url=https%3a//api.github.com/markdown/raw&body=**Hello**%20_World_&Content-Type=text/plain&delay=5

The delay parameter works across all subservies on urlreq, including PSHBping.

@bcomnes
Copy link

bcomnes commented Jul 11, 2013

Wow fantastic! I'll play around with it this evening :) I really appreciate it.

@bcomnes
Copy link

bcomnes commented Jul 20, 2013

@izuzak, I finally got around to implementing this on my site and it works perfectly! Thank you so much for the workaround.

@izuzak
Copy link
Owner Author

izuzak commented Jul 20, 2013

@bcomnes Rad, glad it's working out for you 🎆

@bcomnes
Copy link

bcomnes commented Aug 8, 2013

@izuzak Now that you work at github, the first thing we are going to see is a post gh-page build webhook, right? ;)

Congratulations on the new job!

@izuzak
Copy link
Owner Author

izuzak commented Aug 9, 2013

@bcomnes Thanks man! I'll see what I can do 😉 🍸

@sergeylukin
Copy link

Congratulations on the new job! 👍

@izuzak
Copy link
Owner Author

izuzak commented Aug 11, 2013

@sergeylukin Thanks! 🍻 ⚡

@bcomnes
Copy link

bcomnes commented Mar 16, 2014

Thanks to the new webhook events, its looking like Github pages PuSH is now a reality! Super cool! No more funky hacks. Cheers to @izuzak (or whoever!) if you had something to do with this :) I wrote an updated set of instructions here: http://bret.io/2014/03/16/github-pages-pubsubhubbub-support-levels-up/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants