New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(post comments) Enabling PubSubHubBub for GitHub hosted blogs #11
Comments
As far as I can see now Github sends custom url query params in Web hooks as |
Hi @sergeylukin, I'm glad you find this idea interesting! I just tested using http://requestb.in and -- you're absolutely right! GitHub now doesn't delete user-defined query params from post-commit Web Hooks! However, this change doesn't help a lot with regard to using these post-commit Web Hooks for pinging a PSHB hub, as you correctly suspected. The problem is that the PSHB hub expects the ping parameters in the ping HTTP request's BODY, not in the URL query part (see: http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.3.html#anchor9). And since there is no way to force GitHub to use a specific message body when making a request to Web Hooks, we still have to use a proxy which translates GitHub's request into a correctly formatted PSHB ping request, as I described in my blog post. This proxy is still what I am using for pinging a PSHB hub (pubsubhubbub.appspot.com): when you (or GitHub) make a request to http://urlreq.appspot.com/pshbpinggae/http://ivanzuzak.info/atom.xml, the proxy will make a PSHB PING request (HTTP POST request to a PSHB hub) with the correct parameters in the POST request body, for my blog I haven't used http://pingomatic.com/ so far, so I can't give you a useful answer. I will check it out, though. What I can do is to recommend that you have a look at the documentation for my proxy service -- https://github.com/izuzak/urlreq (see the API section in the README). This proxy service is very generic and you can use it to translate any HTTP request R1 into another HTTP request R2, using parameters defined in the URL query part of R1. Thanks again for commenting, let me know if I missed something or you need help! |
Hi @izuzak, thanks for fast and detailed reply! I've read the specification you linked to, and you're right, there they say that parameters should be passed in the body. I sent following request:
and received
Same way it worked for Ping-O-Matic: Request
Response
Ivan, what do you think? Looks like it is working, but probably I'm missing something.. Btw, I get 503 error when trying to ping (http://urlreq.appspot.com/pshbpinggae/http://ivanzuzak.info/atom.xml) |
@sergeylukin -- whoa, that's awesome! It never occurred to me to actually test pubsubhubbub.appspot.com to see if they strictly follow the specification. I'm glad you did! :). Anyway, you're not missing anything -- I guess the reason why this is working is that most AppEngine apps use the webapp2 Python framework, and the webapp2 framework automatically parses both URL query parameters and parameters in the body of a POST request (if the request is of application/x-www-form-urlencoded content type) and combines both of these parameters in the same dictionary object. See the docs on the webapp2 framework that explain this: So, it looks like the proxy is not actually needed and you can just put everything into the query parameters of the GitHub Web Hook, as you originally suggested. Also, thanks for letting me know that the proxy is down with an "Out of quota" -- some spammer has been draining the service without a reason. I'll blacklist him today and the service should be fully operational tomorrow. Again, very nice work on your part and thanks for researching this! I'll update the blog post to include this info. |
Yes, those are great news! ..and actually another reason for using Github pages for static websites:) |
Great news, indeed! And I strongly agree with you -- it would be better if one could specify the content of the POST request body, e.g. using some templating language and variables. Perhaps one day GitHubbers will come to the same conclusion :). I'm really glad you discovered this -- keep up the great work! |
Is there a way to tell if this is working or not? I set things up over here https://github.com/bcomnes/bcomnes.github.io/blob/master/atom.xml My feed does not seem to be updating in real time over at newsblur or google reader (or whats left of it). |
Hey @bcomnes! Glad you're trying this out. One question first -- which approach are you using for pinging PSHB? The one which I described in the blog post (a webhook using the urlreq service as a proxy) or the one described in the comments here (webhook directly communicating with the pshb hub)? Your atom.xml file looks alright to me, and I haven't updated my blog in a while so I don't know if the webhook setup is still working for my blog even. But anyway, the way I'd test if everything was working as intended is:
To do this, create a request bin http://requestb.in/. A request bin is basically a service that stores every requests it receives so that you can inspect them manually. So, create a bin, and then create a GitHub Service hoook with the URL of the bin as the URL of hook (bin URLs will look like this: http://requestb.in/1f0gsjk1). After you set that up, make an update to your feed (best if you create a new blog post) and visit the inspection page for the bin to see if the webhook fired and made a request to the bin (the URL of the inspection page for the bin is the same as the URL of the bin, just append ?inspect). If you see a request there - then the webhook is firing. If not - then something is broken in GitHub's infrastructure for firing webhooks on repository updates.
After you made sure that webhooks are working in general, create another request bin. You will now use this bin to test if the PSHB hub is fetching feeds correctly. To do this, create another GitHub service hook, and specify this as the hook URL: http://pubsubhubbub.appspot.com/publish?hub.mode=publish&hub.url=URL_OF_THE_BIN_YOU_CREATED. Now, update your blog (create a new blog post), and the GitHub hook should trigger the PSHB hub to make a request to the bin you created (because it will think that it is fetching a feed that was just updated). So, inspect your bin and if there was no request made - then there is a problem with the PSHB hub and you should contact the maintainers of the hub. If you checked these two things, then there is either a problem in Google Reader (e.g. it is no longer updating in real time), or in the PSHB hub (it is no longer correctly forwarding blog updates to subscribers). Checking which of these two is true is a bit more complicated than the first two checks. I'd not put too much faith into Google Reader since it will shut down anyway. +You can do all of these things in another repo so that you don't make a mess of your blog feed (and irritate your followers). Hope this help. Let me know what you find out or if I can help out somehow! |
http://requestb.in/ is an awesome tool! Thanks for pointing that out. And thanks for the detailed tips! Awsome. I have the following GH hooks enabled: And they seem to be working. Request bin acknowledges these requests:
Also adding the bin url as a hook sent the commit information out which request bin collected perfectly. I think I am going to have to assume the issue is with the RSS readers. I'll get in contact with newsblur, one of the few PuSH enabled rss readers I could find, to see if I can get any pointers as to why that service isnt getting PuSH updates on my feed. It also occurred to me, since I am not pre-processing my site, there is a delay between the commit, and the actual site updating. The delay is only a matter of a few seconds, but could this delay create some kind of race condition between the hook and the actual updated atom feed being available? For example, using https://pubsubhubbub.appspot.com/publish to check the feed, it never seems to reflect the most recent post! Hrrmm. Maybe this does not matter either, I'm not sure. It looks like your PuSH feed is one story behind your latest blogpost, maybe indicating that it fetched the atom file before the new one was built yet? I might just be reading it wrong or have a misunderstanding of how this works. |
That's an excellent point, Bret! Never thought about the possibility of a race condition happening, and now that you've mentioned it - I do think that it is possible. So, it is possible that everything is actually working, but the PSHB hub is fetching your atom.xml file before it was updated by Jekyll with the new post (if Jekyll processing is slow for some reason). I've never had this problem with my feed, though. The only way to make sure is to create a small PSHB subscriber application that subscribes to your feed and outputs the updates received from PSHB to the console. This is not a complicated thing to do, you should be able to do it in your favorite programming language relatively easily. Anyway, if this demo subscriber app does not receive any updates after you create a new post or if it receives an update without the latest post -- you'll know what's going on. And if it receives the latest post as expected -- then it's a problem with Google Reader and newsblur. +Yeah, Request bin is brilliant :). I use it and hurl.it very often when debugging. |
Race condition is indeed possible, I even didn't think about it. All depends on how GH implemented webhooks calls and Jekyll builds. |
A githubber in IRC also sees the possibility that since the GH-page builds in the background after the commit is made, post commit services could end up in a race condition with the page build. If only there was a way to delay the process just long enough so that the page build always wins. Maybe that PuSH proxy method you had going prior to this prevented this kind of issue? Who knows. http://push-bot.appspot.com was recommended to me as tool used to debug a new feed, and it also works quite well as it give you somewhat reliable feedback through the whole process of subscribing and receiving PuSH notifications. Basically you add an XMPP bot to a chat program and tell it to push you feed updates via IM. I'll think about some possible solutions to this situation and run more tests. |
Yeah, it's possible that the proxy service introduced just the right amount of delay so that everything worked out OK. That's distributed systems for ya :) Let me know what you come up with. It seems that making a similar proxy service that has a hard-coded delay (e.g. 10-20 secs) might be an easy way out. I was also hoping that GitHub's API would have an event that signaled the completion of pages rendering, but doesn't seem to exist: http://developer.github.com/v3/activity/events/ |
So indeed, after talking with more folks and playing around, this race condition is real. The webhook and the site build are triggered at the same time. I have not done a huge amount of probing, but it looks like it sometimes requires two commits (and thus two webhooks) to get PuSH to notice your new content in some of my tests, building on the GH servers. I emailed GH support about this, and here was their response
I also was in touch with someone from Superfeedr, another company that runs a popular PuSH hub. I was told that it would be difficult to implement some kind of delay parameter on the hub side due to scaling complications, but that they will look into it some more. So here we are. GH is opaque about this (hopefully someone internally cares about this stuff) and this may or may not ever be addressed, and other hubs are reluctant to accommodate this kind of need due to potential scaling difficulties (although, understandably adding delay parameters is ultimately a hack). I think playing around with your urlreq app sounds like the best solution at the moment, until github gets their act together when it comes to their API interacting with their pages system. Would it be difficult to add some kind of delay parameter to urlreq? I am forking your project to try and play around with some ideas I have, but would you be interested in contributions back to the project along those lines? |
Wow, really appreciate the effort you've put into digging through this issue, @bcomnes! I believe that eventually GitHub will indeed add a webhook for gh-page builds, but it's hard to tell when that will happen since it's probably not an oft requested feature and they have other priorities. As for hub providers adding a delay - I doubt it will ever happen. I have no problem with adding a delay to urlreq and think it would be the easiest way to resolve this for now. I'd welcome contributions for sure, if you have the time to hack on it. 👍 If not - let me know and I'll add it next week (Monday/Tuesday-ish). Cheers! 🎆 |
I would like to hack on it, as it seems learning about google app engine could prove very useful for some ideas I want to experiment with, but I am a total noob at python and the app engine platform itself, so it would probably take me a while to figure something like that out. If you are willing, adding a delay parameter would be awesome and would likely get done better and faster since you already know your way around |
Sure, no problem -- I'll try to add it later this week. But one question though -- what's the expected span of the delays? If these are delays under a minute, then the solution is simpler than in the case where the delays are higher than a minute. The reason is that AppEngine normally has a 60 second limit on the time allowed to process a request. If delays higher than that are needed - I have to try out some other approaches from the AppEngine platform. +I totally love AppEngine -- it has a nice free quota and good platform APIs. ✨ |
I'm my experience, it's a matter of seconds. But I don't have that many -Bret Comnes On Jul 10, 2013, at 6:14 AM, "Ivan Žužak" notifications@github.com wrote: Sure, no problem -- I'll try to add it later this week. But one question +I totally love AppEngine -- it has a nice free quota and good platform — |
I've added support for an optional Example request without delay: Example request with 5 second delay: The |
Wow fantastic! I'll play around with it this evening :) I really appreciate it. |
@izuzak, I finally got around to implementing this on my site and it works perfectly! Thank you so much for the workaround. |
@bcomnes Rad, glad it's working out for you 🎆 |
@izuzak Now that you work at github, the first thing we are going to see is a post gh-page build webhook, right? ;) Congratulations on the new job! |
@bcomnes Thanks man! I'll see what I can do 😉 🍸 |
Congratulations on the new job! 👍 |
@sergeylukin Thanks! 🍻 ⚡ |
Thanks to the new webhook events, its looking like Github pages PuSH is now a reality! Super cool! No more funky hacks. Cheers to @izuzak (or whoever!) if you had something to do with this :) I wrote an updated set of instructions here: http://bret.io/2014/03/16/github-pages-pubsubhubbub-support-levels-up/ |
This issue is reserved for comments on the blog post Enabling PubSubHubBub for GitHub hosted blogs. Leave a comment below and it will show up on the blog post's Web page. Thanks!
The text was updated successfully, but these errors were encountered: