Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Initial import

  • Loading branch information...
commit dba2cd77e5c7478d3714caeb777fcfba626ac52a 0 parents
@hmarr authored
3  .gitignore
@@ -0,0 +1,3 @@
+*.pyc
+.*.swp
+results.htm
1,483 example/fixtures/df.xml
1,483 additions, 0 deletions not shown
318 example/fixtures/github.xml
@@ -0,0 +1,318 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US">
+ <id>tag:github.com,2008:/blog</id>
+ <link type="text/html" href="http://github.com/blog" rel="alternate" />
+
+ <title>The GitHub Blog</title>
+ <updated>2010-01-22T22:51:13-08:00</updated>
+ <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/github" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="github" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry>
+ <id>tag:github.com,2008:Post/592</id>
+ <published>2010-01-22T22:46:12-08:00</published>
+ <updated>2010-01-22T22:51:13-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/592-broadcasts" rel="alternate" />
+ <title>Broadcasts</title>
+ <content type="html">&lt;!-- -*-Markdown-*- --&gt;
+
+
+&lt;p&gt;Over the past couple of days, you may have noticed a new piece of UI, GitHub broadcasts:&lt;/p&gt;
+
+&lt;center&gt;&lt;img src="http://share.kyleneath.com/captures/skitched-20100122-223559.jpg" alt="broadcast image" /&gt;&lt;/center&gt;
+
+
+&lt;p&gt;We'll be using this feature to announce significant new feature additions and changes to GitHub. Once you've read the broadcast, click hide broadcast and it won't show up until we post a new broadcast. If you missed broadcasts in between logging in, we'll let you know how many you missed (&lt;em&gt;view (2) new broadcasts&lt;/em&gt;) which you can always see at this link: &lt;a href="http://github.com/blog/broadcasts"&gt;http://github.com/blog/broadcasts&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;In case you've been living under a rock, here's some of the new stuff we've launched at GitHub over the past couple of months:&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;&lt;a href="http://github.com/blog/591-explore-github"&gt;Explore GitHub&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="http://github.com/blog/584-notification-improvements"&gt;Notification Improvements&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="http://github.com/blog/577-improved-commit-diffs"&gt;Improved Commit Diffs&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="http://github.com/blog/571-new-repository-headers"&gt;New Repository Headers&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="http://github.com/blog/559-merge-commits-are-back-and-better-than-ever"&gt;Merge commits are back (and better than ever)&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+
+&lt;p&gt;Fun fact: we've launched over 1,200 commits and 130,000 line changes since December. (and that's only to the main GitHub application!)&lt;/p&gt;
+</content>
+ <author>
+ <name>kneath</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/591</id>
+ <published>2010-01-21T19:19:19-08:00</published>
+ <updated>2010-01-21T19:26:41-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/591-explore-github" rel="alternate" />
+ <title>Explore GitHub</title>
+ <content type="html">&lt;p&gt;This week &lt;a href="http://github.com/kneath"&gt;kneath&lt;/a&gt; and I (with some help from &lt;a href="http://thechangelog.com"&gt;The Changelog&lt;/a&gt;) rolled out &lt;a href="http://github.com/explore"&gt;Explore GitHub&lt;/a&gt; &amp;#8211; a new page showing trending repositories, repositories recently featured on The Changelog, and recent episodes of their weekly podcast.&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://github.com/explore"&gt;&lt;img src="http://share.kyleneath.com/captures/Dock-20100121-192200.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;The trending repos are updated every 20 minutes so have fun watching projects climb the charts as they&amp;#8217;re blogged and tweeter about throughout the day.&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/590</id>
+ <published>2010-01-21T13:54:37-08:00</published>
+ <updated>2010-01-21T13:54:47-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/590-paris-git-training" rel="alternate" />
+ <title>Paris Git Training</title>
+ <content type="html">&lt;p&gt;In Paris? Our very own &lt;a href="http://github.com/schacon"&gt;Scott Chacon&lt;/a&gt;, author of &lt;a href="http://progit.org/book/"&gt;Pro Git&lt;/a&gt; and internationally renowned Git expert, will be teaching a &lt;a href="http://trainings.sensiolabs.com/en/training/git"&gt;Git class&lt;/a&gt; on February 18th, 2010.&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://trainings.sensiolabs.com/en/training/git"&gt;&lt;img src="http://img.skitch.com/20100121-gamhsjqj25xjwqxx9heneiatu6.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;Check out &lt;a href="http://trainings.sensiolabs.com/en/training/git"&gt;the website&lt;/a&gt; for more information. We&amp;#8217;ll see you there!&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/589</id>
+ <published>2010-01-21T16:47:24-08:00</published>
+ <updated>2010-01-21T16:47:24-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/589-new-year-new-company" rel="alternate" />
+ <title>New Year, New Company</title>
+ <content type="html">&lt;p&gt;As of January 1, 2010 we&amp;#8217;re operating this site as GitHub, Inc.&lt;/p&gt;
+&lt;p&gt;What does that mean for you? In a word: nothing.&lt;/p&gt;
+&lt;p&gt;We originally incorporated Logical Awesome &lt;span class="caps"&gt;LLC&lt;/span&gt; prior to launching GitHub thinking that it would be just one of a number of brands we were going to release. GitHub took off in a serious way, so instead of continuing to confuse people with an obscure company name, we made the decision to change it to the obvious choice when we converted from a &lt;span class="caps"&gt;LLC&lt;/span&gt; to a C-Corp.&lt;/p&gt;
+&lt;p&gt;Here&amp;#8217;s to another awesome year with your favorite code host!&lt;/p&gt;</content>
+ <author>
+ <name>pjhyett</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/588</id>
+ <published>2010-01-20T11:59:35-08:00</published>
+ <updated>2010-01-20T12:01:29-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/588-closing-issues-with-phpunit" rel="alternate" />
+ <title>Closing Issues with PHPUnit</title>
+ <content type="html">&lt;p&gt;&lt;a href="http://github.com/raphaelstolt"&gt;raphaelstolt&lt;/a&gt; shows how to close GitHub Issues using &lt;a href="http://github.com/sebastianbergmann/phpunit"&gt;PHPUnit&lt;/a&gt; in his blog post, &lt;a href="http://raphaelstolt.blogspot.com/2010/01/closing-and-reopening-github-issues-via.html"&gt;Closing and reopening GitHub issues via PHPUnit tests&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;The secret? Implementing GitHub_TicketListener.&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://raphaelstolt.blogspot.com/2010/01/closing-and-reopening-github-issues-via.html"&gt;&lt;img src="http://img.skitch.com/20100120-er3nuye656kk77pd83wibtudw4.png"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;Check the &lt;a href="http://raphaelstolt.blogspot.com/2010/01/closing-and-reopening-github-issues-via.html"&gt;post&lt;/a&gt; for the full scoop!&lt;/p&gt;
+&lt;p&gt;(Hat tip: &lt;a href="http://www.phpdeveloper.org/news/13876"&gt;PHPDeveloper&lt;/a&gt;)&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/587</id>
+ <published>2010-01-19T14:20:18-08:00</published>
+ <updated>2010-01-19T14:24:30-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/587-github-drinkup-wellington-nz-tonight" rel="alternate" />
+ <title>GitHub Drinkup Wellington NZ Tonight!</title>
+ <content type="html">&lt;p&gt;If you happen to find yourself in Wellington, New Zealand (perhaps for LinuxConf AU) then stop by &lt;a href="http://www.themalthouse.co.nz"&gt;The Malt House&lt;/a&gt; tonight (Wednesday the 20th) after 7pm and join Scott Chacon and myself (Tom Preston-Werner) for some drinks. As always, first round&amp;#8217;s on us! Make sure to ask for a GitHub sticker, we have a ton to give away. See you there!&lt;/p&gt;
+&lt;p&gt;The Malt House&lt;br /&gt;
+48 Courtenay Pl&lt;br /&gt;
+Te Aro, Wellington 6011, New Zealand&lt;/p&gt;</content>
+ <author>
+ <name>mojombo</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/586</id>
+ <published>2010-01-18T20:05:33-08:00</published>
+ <updated>2010-01-18T20:10:38-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/586-github-rebase-34" rel="alternate" />
+ <title>GitHub Rebase #34</title>
+ <content type="html">&lt;p&gt;It&amp;#8217;s time for Rebase &lt;a href="http://xkcd.com/305/"&gt;#34&lt;/a&gt;! As always, &lt;a href="http://rebase.github.com/howto.html"&gt;suggestions are welcome&lt;/a&gt; if you have a neat project you&amp;#8217;d like to show off.&lt;/p&gt;
+&lt;p style="text-align:center;"&gt;&lt;a href="http://www.raucousrecords.com/rockabilly-cds_21/git-it-cd_3681.aspx"&gt;&lt;img src="http://cloud.github.com/downloads/rebase/rebase.github.com/JaguarsGit.jpg" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;h3&gt;Featured Project&lt;/h3&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/jquery/jquery"&gt;jquery&lt;/a&gt;&lt;/strong&gt; is a cross-browser JavaScript library that makes &lt;span class="caps"&gt;DOM&lt;/span&gt; manipulation, simple animation effects, and &lt;span class="caps"&gt;AJAX&lt;/span&gt; simple. It&amp;#8217;s used all over the web, such as on &lt;a href="http://wikipedia.org"&gt;Wikipedia&lt;/a&gt;, &lt;a href="http://whitehouse.gov"&gt;WhiteHouse.gov&lt;/a&gt;, &lt;s&gt;&lt;a href="http://www.tonightshowwithconanobrien.com/"&gt;The Tonight Show with Conan O&amp;#8217;Brien&lt;/a&gt;&lt;/s&gt; and of course, GitHub. Just this past week, version 1.4 &lt;a href="http://jquery14.com/day-01/jquery-14"&gt;was released&lt;/a&gt; and comes packed with too many performance enhancements to list here and new features to boot. Check out the &lt;a href="http://api.jquery.com/category/version/1.4/"&gt;docs&lt;/a&gt; for everything that&amp;#8217;s been added or changed for this release, and the new dynamic &lt;a href="http://api.jquery.com/browser/"&gt;&lt;span class="caps"&gt;API&lt;/span&gt; Browser&lt;/a&gt; too. There&amp;#8217;s also tutorials and videos about the new version and getting involved with the jQuery community at the &lt;a href="http://jquery14.com/"&gt;14 Days of jQuery&lt;/a&gt; celebration site, which is now almost half over! Now that the project is on GitHub, the best part is that you can &lt;a href="http://github.com/jquery/jquery"&gt;fork it&lt;/a&gt; to get started adding your own features or fixes. &lt;code&gt;jQuery.contains(document, "awesome")&lt;/code&gt;&lt;/p&gt;
+&lt;h3&gt;Notably New Projects&lt;/h3&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/thumblemonks/jubilator"&gt;jubilator&lt;/a&gt;&lt;/strong&gt; is a new slick way to browse repos that uses the &lt;a href="http://develop.github.com"&gt;GitHub &lt;span class="caps"&gt;API&lt;/span&gt;&lt;/a&gt;. It&amp;#8217;s chock full of some &lt;a href="http://rebase.github.com"&gt;previously-rebased&lt;/a&gt; projects such as &lt;a href="http://github.com/quirkey/sammy"&gt;Sammy&lt;/a&gt; and &lt;a href="http://github.com/janl/mustache.js"&gt;Mustache.js&lt;/a&gt;. The end result is a &lt;span class="caps"&gt;AJAX&lt;/span&gt; powered repo interface with &lt;a href="http://code.google.com/p/google-code-prettify/"&gt;syntax highlighting&lt;/a&gt; that&amp;#8217;s a lot easier to view files if you don&amp;#8217;t want to clone the project. &lt;a href="http://jubilator.thumblemonks.com/"&gt;Try it out&lt;/a&gt; on your repo!&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/bhattisatish/bird-show"&gt;bird-show&lt;/a&gt;&lt;/strong&gt; is a &lt;a href="http://liftweb.net/"&gt;Lift&lt;/a&gt; web app that interacts with the &lt;a href="http://www.flickr.com/services/api/"&gt;Flickr &lt;span class="caps"&gt;API&lt;/span&gt;&lt;/a&gt; to show pictures. Simple, yes, but this is a great starting point for newcomers to Lift (and &lt;a href="http://www.scala-lang.org/"&gt;Scala&lt;/a&gt;!) and comes complete with a &lt;a href="http://www.slideshare.net/dcbriccetti/birdshow-a-lift-app-for-showing-flickr-photos-2720594"&gt;presentation&lt;/a&gt; and &lt;a href="http://www.youtube.com/watch?v=kDVaeYWUwfs"&gt;tutorial video&lt;/a&gt; on the app. You can browse some wildlife photos &lt;a href="http://briccettiphoto.com/"&gt;here on the live site&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/kolber/stacey"&gt;stacey&lt;/a&gt;&lt;/strong&gt; is a lightweight &lt;span class="caps"&gt;PHP&lt;/span&gt; content management system that follows in &lt;a href="http://github.com/mojombo/jekyll"&gt;Jekyll&amp;#8217;s&lt;/a&gt; footsteps and works by generating static files off templates. Drop it onto your server with &lt;span class="caps"&gt;PHP&lt;/span&gt; installed, and you&amp;#8217;re all set to &lt;a href="http://staceyapp.com/documentation/creating-pages/"&gt;start making pages&lt;/a&gt;. It&amp;#8217;s baked in with a &lt;a href="http://staceyapp.com/documentation/editing-templates/"&gt;templating language&lt;/a&gt; that doesn&amp;#8217;t require any &lt;span class="caps"&gt;PHP&lt;/span&gt; knowledge to use efficiently. There&amp;#8217;s already a decent amount of &lt;a href="http://staceyapp.com/installation/"&gt;Stacey&lt;/a&gt; sites online and a &lt;a href="http://getsatisfaction.com/stacey"&gt;growing support forum&lt;/a&gt;. The codebase is quite accessible at around 500 lines, so &lt;a href="http://staceyapp.com"&gt;get hacking!&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/hgimenez/vz_analysis"&gt;vz_analysis&lt;/a&gt;&lt;/strong&gt; is a small &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt; script that runs through some of the available statistics about &lt;span class="caps"&gt;GDP&lt;/span&gt; and inflation from the author&amp;#8217;s home country of Venezuela. It&amp;#8217;s also a nice little example of using &lt;a href="http://had.co.nz/ggplot2/"&gt;ggplot2&lt;/a&gt;, a great library that can produce nearly any kind of common stats object for your graphs. Check out the &lt;a href="http://github.com/hgimenez/vz_analysis/blob/master/r/econ.r"&gt;source&lt;/a&gt; and maybe see how this can apply to some other &lt;a href="http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php"&gt;open datasets&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;&lt;a href="http://github.com/cmatei/yalfs"&gt;yalfs&lt;/a&gt;&lt;/strong&gt; is yet another Lisp from scratch. Yes, it wouldn&amp;#8217;t be a Rebase without a fun toy programming project, and this one is special since the author is &lt;a href="http://yalfs.blogspot.com/"&gt;blogging the project&amp;#8217;s goals and progress&lt;/a&gt;. This implementation is done in straight C, and it already has some basic datatypes implemented. So it might be a long way from tail recursion, but this is a neat project to keep an eye on if you&amp;#8217;re a language or functional programming geek.&lt;/p&gt;</content>
+ <author>
+ <name>qrush</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/585</id>
+ <published>2010-01-18T15:29:29-08:00</published>
+ <updated>2010-01-18T15:30:00-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/585-github-for-the-rest-of-us-screencast" rel="alternate" />
+ <title>GitHub for the Rest of Us Screencast</title>
+ <content type="html">&lt;p&gt;Know someone who could use a clear, hands-on introduction to GitHub? &lt;a href="http://github.com/nettuts"&gt;nettuts&lt;/a&gt; has them covered.&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://net.tutsplus.com/videos/screencasts/terminal-git-and-github-for-the-rest-of-us-screencast"&gt;&lt;img src="http://img.skitch.com/20100118-mbutj6e8irbau8ms7uach741i1.png"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;Their &lt;a href="http://net.tutsplus.com/videos/screencasts/terminal-git-and-github-for-the-rest-of-us-screencast/"&gt;Terminal, Git, and GitHub for the Rest of Us: Screencast&lt;/a&gt; (which compliments their excellent &lt;a href="http://net.tutsplus.com/tutorials/other/getting-the-hang-of-github/"&gt;Getting the Hang of GitHub&lt;/a&gt; tutorial) is a great introduction to Git and GitHub.&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/584</id>
+ <published>2010-01-17T19:48:37-08:00</published>
+ <updated>2010-01-21T17:55:03-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/584-notification-improvements" rel="alternate" />
+ <title>Notification Improvements</title>
+ <content type="html">&lt;!-- -*-Markdown-*- --&gt;
+
+
+&lt;p&gt;&lt;strong&gt;Update 1/19/2010:&lt;/strong&gt; Page build notifications have been added to the list of notifications you can turn on/off and have moved to the Notifications section as well. Thanks!&lt;/p&gt;
+
+&lt;p&gt;Today we rolled out updates to the messaging &amp;amp; notification systems for GitHub. We've added a couple new features and improved the existing messaging system.&lt;/p&gt;
+
+&lt;h3&gt;Notification Center&lt;/h3&gt;
+
+&lt;p&gt;If you check the &lt;a href="/account"&gt;account settings&lt;/a&gt; page, you'll notice a new tab called &lt;a href="/account/notifications"&gt;Notification Center&lt;/a&gt; that holds preferences for all the emails you get from GitHub.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src="http://share.kyleneath.com/captures/skitched-20100117-193031.gif" alt="Notification Center" /&gt;&lt;/p&gt;
+
+&lt;h3&gt;Commit Comment Notifications&lt;/h3&gt;
+
+&lt;p&gt;We've enabled two new email notifications by default: comments on your commits (where you are listed as an author or committer) and comments on commits in your repositories. We understand some commits can get noisy, so we added the ability to turn notifications off on a per-commit basis.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src="http://share.kyleneath.com/captures/skitched-20100117-221429.gif" alt="Commit Comments" /&gt;&lt;/p&gt;
+
+&lt;h3&gt;Improved email subjects&lt;/h3&gt;
+
+&lt;p&gt;All of our email notifications were previously "[GitHub] user sent you a message." They should be considerably more descriptive and thread friendly now.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src="http://share.kyleneath.com/captures/default.data-20100117-194222.gif" alt="Email Subjects" /&gt;&lt;/p&gt;
+
+&lt;h3&gt;Notifications vs Messages&lt;/h3&gt;
+
+&lt;p&gt;We've separated out pull requests and issue comments into a new category called notifications. Your inbox should now be filled solely with actual messages. Notifications will show up next to your username (in grey), while the inbox count only refers to actual messages.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src="http://share.kyleneath.com/captures/Commit_cbd162f5526b2bf0e32340964e612acc9f4d29ca_to_kneath_s_watchtower_-_GitHub-20100117-194430.gif" alt="Sample Userbox" /&gt;&lt;/p&gt;
+
+&lt;p&gt;Another small update is that we've added a &lt;em&gt;mark all as read&lt;/em&gt; button to the inbox and notification sections.&lt;/p&gt;
+
+&lt;p&gt;We know there's always room for improvement with our messaging and notifications — but hopefully this is a step in the right direction. Hope you enjoy!&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;P.S.&lt;/strong&gt;: I'd like to take a moment to thank the fine folks who made &lt;a href="http://mocksmtpapp.com"&gt;MockSMTP&lt;/a&gt;. It made testing these new email notifications &lt;em&gt;so much easier.&lt;/em&gt; If you're looking for testing app-generated emails locally, check it out.&lt;/p&gt;
+</content>
+ <author>
+ <name>kneath</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/583</id>
+ <published>2010-01-15T15:38:40-08:00</published>
+ <updated>2010-01-15T15:38:48-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/583-tinymce-on-github" rel="alternate" />
+ <title>TinyMCE on GitHub</title>
+ <content type="html">&lt;p&gt;&lt;a href="http://tinymce.moxiecode.com/"&gt;TinyMCE&lt;/a&gt;, an &lt;span class="caps"&gt;HTML&lt;/span&gt; &lt;span class="caps"&gt;WYSIWYG&lt;/span&gt; editor written in JavaScript, has moved to GitHub: &lt;a href="http://github.com/tinymce"&gt;http://github.com/tinymce&lt;/a&gt;&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://github.com/tinymce"&gt;&lt;img src="http://img.skitch.com/20100115-m19rkf3wb3panm2fhnw5gb6h9u.png"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;Their &lt;a href="http://blog.moxiecode.com/2010/01/14/tinymce-moved-to-github/"&gt;announcement&lt;/a&gt; lists the main reasons they chose GitHub:&lt;/p&gt;
+&lt;ol&gt;
+ &lt;li&gt;Community&lt;/li&gt;
+ &lt;li&gt;Speed&lt;/li&gt;
+ &lt;li&gt;Ease of access&lt;/li&gt;
+ &lt;li&gt;Flexibility&lt;/li&gt;
+&lt;/ol&gt;
+&lt;p&gt;Welcome to the party!&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/581</id>
+ <published>2010-01-14T11:19:30-08:00</published>
+ <updated>2010-01-14T11:27:29-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/581-pages-jekyll-to-v0-5-7-rewrites" rel="alternate" />
+ <title>Pages: Jekyll to v0.5.7 + Rewrites</title>
+ <content type="html">&lt;p&gt;I have a few quick &lt;a href="http://pages.github.com/"&gt;Pages&lt;/a&gt; updates worth noting. &lt;a href="http://wiki.github.com/mojombo/jekyll"&gt;Jekyll&lt;/a&gt; has been upgraded from &lt;strong&gt;v0.5.4&lt;/strong&gt; to &lt;strong&gt;v0.5.7&lt;/strong&gt;. You can find a list of changes in the &lt;a href="http://github.com/mojombo/jekyll/blob/v0.5.7/History.txt"&gt;history file&lt;/a&gt;. And, by popular demand, requests for &lt;code&gt;you.github.com/some/file&lt;/code&gt; are now rewritten to &lt;code&gt;you.github.com/some/file.html&lt;/code&gt; (e.g., &lt;a href="http://schacon.github.com/2008/10/02/a-githubber-now"&gt;this&lt;/a&gt; and &lt;a href="http://schacon.github.com/2008/10/02/a-githubber-now.html"&gt;that&lt;/a&gt;).&lt;/p&gt;</content>
+ <author>
+ <name>rtomayko</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/580</id>
+ <published>2010-01-13T14:01:35-08:00</published>
+ <updated>2010-01-14T13:13:07-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/580-github-drinkup-sydney" rel="alternate" />
+ <title>GitHub Drinkup Sydney</title>
+ <content type="html">&lt;p&gt;&lt;strong&gt;&lt;span class="caps"&gt;IMPORTANT&lt;/span&gt; &lt;span class="caps"&gt;UPDATE&lt;/span&gt;:&lt;/strong&gt; &lt;em&gt;We had the wrong address listed in the original post. The event is at King St Wharf, &lt;strong&gt;not&lt;/strong&gt; Newtown.&lt;/em&gt;&lt;/p&gt;
+&lt;p&gt;Join Tom and Scott next Monday at 6pm as they stop in Sydney to bring you the joy of the American drinkup on their way to NZ. Discuss whether code flushes the same direction in the southern hemisphere and if kangaroos really do prefer Git. Crocodile wrestling and dropbear hunting likely to ensue.&lt;/p&gt;
+&lt;p&gt;Our sysadmin guys, &lt;a href="http://anchor.com.au/"&gt;Anchor&lt;/a&gt;, are co-hosting the meetup, so come say hi to them too!&lt;/p&gt;
+&lt;p&gt;The Facts:&lt;/p&gt;
+&lt;p&gt;6pm, Monday, Jan 18th&lt;br /&gt;
+&lt;a href="http://maps.google.com/maps/place?cid=14348182878035497946&amp;q=james+squire&amp;cd=1&amp;cad=src:pplink&amp;ei=BIhPS4WcFYGYowTHiu2MDQ&amp;sig2=mGRIRYeH9GZm512z8EOmXQ"&gt;James Squire Brewhouse&lt;/a&gt;&lt;br /&gt;
+22 Promenade King St Wharf, Australia&lt;/p&gt;
+&lt;p&gt;&lt;iframe width="425" height="350" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="http://maps.google.com/maps?q=james+squire&amp;amp;cd=1&amp;amp;ei=PohPS5uPHp20tQPh0NyiDQ&amp;amp;sig2=mGRIRYeH9GZm512z8EOmXQ&amp;amp;ie=UTF8&amp;amp;hl=en&amp;amp;view=map&amp;amp;cid=14348182878035497946&amp;amp;ved=0CBcQpQY&amp;amp;hq=james+squire&amp;amp;hnear=&amp;amp;t=h&amp;amp;ll=-33.865712,151.202688&amp;amp;spn=0.006236,0.00912&amp;amp;z=16&amp;amp;iwloc=A&amp;amp;output=embed"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;small&gt;&lt;a href="http://maps.google.com/maps?q=james+squire&amp;amp;cd=1&amp;amp;ei=PohPS5uPHp20tQPh0NyiDQ&amp;amp;sig2=mGRIRYeH9GZm512z8EOmXQ&amp;amp;ie=UTF8&amp;amp;hl=en&amp;amp;view=map&amp;amp;cid=14348182878035497946&amp;amp;ved=0CBcQpQY&amp;amp;hq=james+squire&amp;amp;hnear=&amp;amp;t=h&amp;amp;ll=-33.865712,151.202688&amp;amp;spn=0.006236,0.00912&amp;amp;z=16&amp;amp;iwloc=A&amp;amp;source=embed" style="color:#0000FF;text-align:left"&gt;View Larger Map&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;</content>
+ <author>
+ <name>luckiestmonkey</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/579</id>
+ <published>2010-01-13T11:49:35-08:00</published>
+ <updated>2010-01-13T11:49:54-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/579-flash-in-javascript" rel="alternate" />
+ <title>Flash in JavaScript</title>
+ <content type="html">&lt;p&gt;This is pretty great: &lt;a href="http://github.com/tobeytailor/gordon"&gt;Gordon&lt;/a&gt; is an open source Flash implementation in JavaScript by &lt;a href="http://github.com/tobeytailor"&gt;tobeytailor&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;Demos: &lt;a href="http://paulirish.com/work/gordon/demos/"&gt;http://paulirish.com/work/gordon/demos/&lt;/a&gt;&lt;/p&gt;
+&lt;div align="center"&gt;&lt;a href="http://paulirish.com/work/gordon/demos"&gt;&lt;img src="http://img.skitch.com/20100113-gc3h7hwadx1wdqn4pb6rhbjk8p.png"/&gt;&lt;/a&gt;&lt;/div&gt;
+&lt;p&gt;It even works on the iPhone: &lt;a href="http://twitpic.com/xxmi2"&gt;http://twitpic.com/xxmi2&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;I can&amp;#8217;t wait to try it out. Thanks Tobias and Paul!&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/578</id>
+ <published>2010-01-13T11:02:54-08:00</published>
+ <updated>2010-01-13T11:26:49-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/578-random-repo" rel="alternate" />
+ <title>Random Repo</title>
+ <content type="html">&lt;p&gt;&lt;a href="http://github.com/joshthecoder"&gt;joshthecoder&lt;/a&gt; said, &amp;quot;It&amp;#8217;d be great to jump to a random repository on GitHub. Like the &amp;#8220;Random article&amp;#8221; feature on Wikipedia.&amp;quot;&lt;/p&gt;
+&lt;p&gt;I said, &amp;#8220;Yes, it would.&amp;#8221;&lt;/p&gt;
+&lt;p&gt;Tada: &lt;a href="http://github.com/repositories/random"&gt;Random Repository&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;Drag that link to your bookmark bar and have some fun.&lt;/p&gt;</content>
+ <author>
+ <name>defunkt</name>
+ </author>
+ </entry>
+ <entry>
+ <id>tag:github.com,2008:Post/577</id>
+ <published>2010-01-12T03:55:41-08:00</published>
+ <updated>2010-01-21T20:48:45-08:00</updated>
+ <link type="text/html" href="http://github.com/blog/577-improved-commit-diffs" rel="alternate" />
+ <title>Improved Commit Diffs</title>
+ <content type="html">&lt;p&gt;We recently rolled out a bunch of improvements to commit pages to make reviewing diffs a bit more pleasant.&lt;/p&gt;
+&lt;h3&gt;Diffstats&lt;/h3&gt;
+&lt;p&gt;Diffstat style histograms of insertions and deletions for each file are now displayed on commit pages. This is useful for getting a high level feel for the impact of a commit:&lt;/p&gt;
+&lt;div&gt;
+&lt;p&gt;&lt;a href="http://github.com/antirez/redis/commit/163f4b8cb25ca46c1482bb6b4ca9c4c9d0f15dd4"
+&gt;&lt;img width='546' src="http://img.skitch.com/20100110-ccencna7kdyk1xmyfbn3et82n.png" alt="sexy diffstat"&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;/div&gt;
+&lt;p&gt;The diffstat display is similar in spirit to the output generated by &lt;code&gt;git diff --stat&lt;/code&gt;: a numeric representing the total number of changed lines (insertions + deletions) followed by a simple visualization of the insertion to deletion ratio.&lt;/p&gt;
+&lt;h3&gt;Rename Detection&lt;/h3&gt;
+&lt;p&gt;Git doesn&amp;#8217;t track file renames, but it does support heuristic detection of renamed files when performing &lt;code&gt;diff&lt;/code&gt; and &lt;code&gt;log&lt;/code&gt; operations. We&amp;#8217;ve enabled it. The file list now displays a single line for renames instead of separate file add/remove lines:&lt;/p&gt;
+&lt;div&gt;
+&lt;p&gt;&lt;a href="http://github.com/ry/node/commit/a5df0f6a65edda02aa70732de0321fae188d03ae"
+&gt;&lt;img width='546' src="http://img.skitch.com/20100110-ncxuu6ff5gqkb3m8sk4fsu4b61.png" alt="diffstat + rename detection"&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;/div&gt;
+&lt;p&gt;While it&amp;#8217;s nice to see renames reported as such in the file list, the larger benefit comes with the actual diff. Without rename detection, commits with even a small number of renamed files can generate large and noisy diffs. The entire file contents is displayed twice: first with all deleted lines and then again with all added lines. These same diffs are reduced down to pure signal with rename detection enabled because only the lines modified between the two files are shown:&lt;/p&gt;
+&lt;div&gt;
+&lt;p&gt;&lt;a href="http://github.com/ry/node/commit/a5df0f6a65edda02aa70732de0321fae188d03ae#diff-2"
+&gt;&lt;img width='546' src="http://img.skitch.com/20100110-8ewn1kyf5qy4rtyq4w368rie77.png" alt="diffs + rename detection"&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;/div&gt;
+&lt;p&gt;See the &lt;code&gt;-M&lt;/code&gt; option to &lt;code&gt;git-diff(1)&lt;/code&gt; for information on using rename detection from the command line.&lt;/p&gt;
+&lt;h3&gt;Added / Removed Files&lt;/h3&gt;
+&lt;p&gt;Previously, files added or removed in a commit were shown in the file list at the top of commit pages but the actual diffs were omitted. This was a simple guard against &lt;em&gt;Insanely Large Diffs That Crashed Browsers&lt;/em&gt; but had a few notable drawbacks:&lt;/p&gt;
+&lt;ul&gt;
+ &lt;li&gt;It was easy to miss important changes introduced by added or removed files when reviewing commits.&lt;/li&gt;
+ &lt;li&gt;It wasn&amp;#8217;t possible to comment on specific lines in added or removed files.&lt;/li&gt;
+ &lt;li&gt;It didn&amp;#8217;t always avoid large diffs. Consider cases like &lt;span class="caps"&gt;SQL&lt;/span&gt; database dumps where each line of a large generated file is modified as part of an otherwise tiny commit. Omitting added/removed files gave no guarantee that diffs would not exceed a reasonable size.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;According to &lt;a href="http://corte.si//posts/code/devsurvey/index.html"&gt;Aldo Cortesi&amp;#8217;s GitHub project analysis&lt;/a&gt;, the average commit touches about 4 files and 19 lines of code. We felt that commit pages needed to do a better job showing all pertinent information on these common case commits, so from now on you&amp;#8217;ll see diffs for added and removed files:&lt;/p&gt;
+&lt;div&gt;
+&lt;p&gt;&lt;a href="http://github.com/defunkt/rip/commit/d21786afe30fac67f006ae033d3b69f204f39c82#bin/ripenv-list-P11"
+&gt;&lt;img width='546' src="http://img.skitch.com/20100110-f3u28yk44bgsec1aqpw191ssf.png" alt="comment on added files"&gt;&lt;/a&gt;&lt;/p&gt;
+&lt;/div&gt;
+&lt;h3&gt;Large Diffs&lt;/h3&gt;
+&lt;p&gt;Displaying added/removed files left the problem of how to deal with very large diffs. What we came up with is a set of rules for omitting portions of large diffs that ensures a sane upper bound on overall diff size. It works something like this:&lt;/p&gt;
+&lt;ul&gt;
+ &lt;li&gt;Diffs are not shown for any individual file with more than 300 changed lines (this includes modified files as well as added/removed files).&lt;/li&gt;
+ &lt;li&gt;No more than 150 total file diffs are displayed.&lt;/li&gt;
+ &lt;li&gt;No more than 3,000 total changed lines are shown across all diffs.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;While we expect to tune these numbers over the coming weeks, the result so far has been diffs that show more of what you typically want to see and less of what you don&amp;#8217;t.&lt;/p&gt;</content>
+ <author>
+ <name>rtomayko</name>
+ </author>
+ </entry>
+</feed>
367 example/fixtures/register.atom
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
+<id>tag:theregister.co.uk,2005:feed/theregister.co.uk/</id>
+<title>The Register</title>
+<link rel="self" type="application/atom+xml" href="http://www.theregister.co.uk/headlines.atom"/>
+<link rel="alternate" type="text/html" href="http://www.theregister.co.uk/"/>
+<rights>Copyright © 2010, Situation Publishing</rights>
+<author>
+<name>Team Register</name>
+<email>webmaster@theregister.co.uk</email>
+<uri>http://www.theregister.co.uk/odds/about/contact/</uri>
+</author>
+<icon>http://www.theregister.co.uk/Design/graphics/icons/favicon.png</icon>
+<subtitle>Biting the hand that feeds IT — sci/tech news and views for the world</subtitle>
+<logo>http://www.theregister.co.uk/Design/graphics/Reg_default/The_Register_r.png</logo>
+<updated>2010-01-24T18:59:18Z</updated>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/24/body_scanner_fail/</id>
+<updated>2010-01-24T11:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/24/body_scanner_fail/"/>
+<title type="html">Full-body scanner blind to bomb parts</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Todger, yes. Combustibles, no&lt;/h4&gt; &lt;p&gt;Most of the uproar over full-body scanners has focused on privacy concerns. There's one larger question, however, that hasn't received much scrutiny by the chattering classes: do the damnable things work?…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/24/election_tech/</id>
+<updated>2010-01-24T10:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/24/election_tech/"/>
+<title type="html">Technology vs policy: Election smackdown!</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Twitterized panel says voters will put policy first&lt;/h4&gt; &lt;p&gt;Technology or policy - which will be more influential come this year’s General Election Campaign?…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/697/wp01-webthreats-080303-uk.pdf?td=rss"&gt;Web threats: Why conventional protection doesn't work&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/24/microsoft_windows_mobile_sinofsky/</id>
+<updated>2010-01-24T06:02:01Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/24/microsoft_windows_mobile_sinofsky/"/>
+<title type="html">Microsoft re-org hints at Windows and Mobile merge</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Time to join the grown-up OS&lt;/h4&gt; &lt;p&gt;Windows Mobile could be destined for life inside Microsoft's main Windows operation following a re-organization of the division it currently calls home.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/859/atth0s1n.pdf?td=rss"&gt;The power of collaboration within unified communications&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/23/uk_mobile_internet_data/</id>
+<updated>2010-01-23T10:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/23/uk_mobile_internet_data/"/>
+<title type="html">Brits left cold by mobile internet</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Apart from iPhone owners&lt;/h4&gt; &lt;p&gt;More than three-quarters of Britons don't use their phones to access the internet, a study has found. Worse, almost 40 per cent of smartphone owners - the very folk you'd expect &lt;em&gt;would&lt;/em&gt; want to surf the web on the move - have never done so, or gave it a go once but won't do so again.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/23/review_sennheiser_rs_160_wireless_headphones/</id>
+<updated>2010-01-23T09:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/23/review_sennheiser_rs_160_wireless_headphones/"/>
+<title type="html">Sennheiser RS 160 wireless headphones</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;The wandering audiophiles' choice?&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Review&lt;/strong&gt;  Although Sennheiser does make a few to Bluetooth headsets to keep the mobile phone market happy, it has never been terribly keen on this technology. Indeed, the company has just launched a new range of headphones that use its own ‘Kleer’ wireless protocol.…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/23/page_and_brin_stock_sell_off/</id>
+<updated>2010-01-23T01:08:19Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/23/page_and_brin_stock_sell_off/"/>
+<title type="html">'Larry and Sergey' to offload 10m Google shares</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;AKA $5.5 billion&lt;/h4&gt; &lt;p&gt;Google co-founders Sergey Brin and Larry Page each plan to sell 5 million shares of their common stock in the company over the next five years.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/23/dnssec_deadline_failure/</id>
+<updated>2010-01-23T00:51:39Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/23/dnssec_deadline_failure/"/>
+<title type="html">80% of fed sites miss DNS Security deadline</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Where's your spoof protection?&lt;/h4&gt; &lt;p&gt;The vast majority of US federal agencies have failed to meet a December 31 deadline to deploy new technology that would make it significantly harder for attackers to spoof their websites, according to &lt;i&gt;Network World&lt;/i&gt;.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/23/ballner_defaces_macbook_pro/</id>
+<updated>2010-01-23T00:10:12Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/23/ballner_defaces_macbook_pro/"/>
+<title type="html">Steve Ballmer defaces fanboi MacBook</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;'Need a new one?'&lt;/h4&gt; &lt;p&gt;Microsoft CEO Steve Ballmer has defaced the MacBook Pro of a young fanboi, and his sarcastic scrawling was caught on video.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/oracle_phillips_yavaughnie_wilkins/</id>
+<updated>2010-01-22T23:34:45Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/oracle_phillips_yavaughnie_wilkins/"/>
+<title type="html">Oracle prez admits 8 1/2 years with billboard woman</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Phillips married to another&lt;/h4&gt; &lt;p&gt;Oracle president Charles Phillips has admitted he had a "serious" eight-and-a-half-year relationship with the woman who appears beside him in &lt;a href="http://www.theregister.co.uk/2010/01/21/oracle_philips_poster/"&gt;billboards littering&lt;/a&gt; New York, San Francisco, and Atlanta.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/697/wp01-webthreats-080303-uk.pdf?td=rss"&gt;Web threats: Why conventional protection doesn't work&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/twitter_account_hijacking/</id>
+<updated>2010-01-22T21:35:44Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/twitter_account_hijacking/"/>
+<title type="html">Amateur goof makes Twitter account hijacking a snap</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Just add XML&lt;/h4&gt; &lt;p&gt;Twitter is sitting on an amateur configuration blunder that makes it trivial for attackers to take control of user accounts, a researcher said Friday.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/nasa_global_warming_warmest_decade/</id>
+<updated>2010-01-22T21:07:17Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/nasa_global_warming_warmest_decade/"/>
+<title type="html">NASA pegs Noughties as hottest decade on record</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Global warming 'unabated,' say spacemen&lt;/h4&gt; &lt;p&gt;The past decade was the warmest ever on record, showing that global warming is "continuing unabated," according to a new report from NASA.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/netalyzr_debuts/</id>
+<updated>2010-01-22T20:35:51Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/netalyzr_debuts/"/>
+<title type="html">Boffins birth uber 'net neutrality' dowser</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Eye on your ISP&lt;/h4&gt; &lt;p&gt;Researchers with the International Computer Science Institute in Berkeley, California have unveiled a completed version of their &lt;a target="_blank" href="http://netalyzr.icsi.berkeley.edu/"&gt;Netalyzr service&lt;/a&gt;, a tool designed to detect when your ISP is interfering with your net connection.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/tweets_in_space/</id>
+<updated>2010-01-22T20:33:12Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/tweets_in_space/"/>
+<title type="html">Web2.0rhea infects International Space Station</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;'We r now LIVE tweeting'&lt;/h4&gt; &lt;p&gt;A US astronaut has made a giant leap in social-networking history by sending the first tweet from space.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/859/atth0s1n.pdf?td=rss"&gt;The power of collaboration within unified communications&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/fujitsu_names_prez/</id>
+<updated>2010-01-22T19:00:54Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/fujitsu_names_prez/"/>
+<title type="html">Fujitsu puts systems chief at helm</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Be like Big Blue&lt;/h4&gt; &lt;p&gt;Since last September, struggling IT supplier Fujitsu has been trying to right itself and do so without a president, with double-duty falling on the shoulders of Michiyoshi Mazuka, the company's chairman. But Fujitsu has now tapped an executive who has run several Fujitsu divisions and who currently runs its systems business, hoping to steer the company back to profitability.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/facebook_custom_data_center/</id>
+<updated>2010-01-22T18:58:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/facebook_custom_data_center/"/>
+<title type="html">Facebook busts ground on first custom data center</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Follows Google and Microsoft into chillerless club&lt;/h4&gt; &lt;p&gt;Facebook is building its first custom-designed data center, after years of leasing data center space from third parties.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/tsa_screener_joke/</id>
+<updated>2010-01-22T18:17:52Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/tsa_screener_joke/"/>
+<title type="html">TSA screener plants powder baggie in flier's luggage</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Not everyone gets the joke&lt;/h4&gt; &lt;p&gt;A screener for the US Transportation Security Administration lost his job after pretending to plant a plastic bag of white powder in the carry-on luggage of a passenger at the Philadelphia International Airport.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/859/atth0s1n.pdf?td=rss"&gt;The power of collaboration within unified communications&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/gg2_microsoft_training_365/</id>
+<updated>2010-01-22T16:55:49Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/gg2_microsoft_training_365/"/>
+<title type="html">A Geeks Guide&lt;small&gt;&lt;sub&gt;2&lt;/sub&gt;&lt;/small&gt;... Microsoft Training 365</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Save over £580 at Reg Books&lt;/h4&gt; &lt;p&gt;&lt;img src="http://books.theregister.co.uk/gg2/images/geek-banner.png" alt="A Geeks Guide 2 Microsoft Training 365" align="right" border="0" height="100" width="292"&gt;&lt;strong class="trailer"&gt;Geeks Guide&lt;sub&gt;2&lt;/sub&gt;&lt;/strong&gt; This week we’re offering &lt;em&gt;Reg&lt;/em&gt; readers a huge introductory discount on a Microsoft Training 365 Technical Subscription. For the next 28 days you will &lt;strong&gt;save over £580&lt;/strong&gt; on this IT pro and developer training hotbed by ordering the subscription via the &lt;a href="http://books.theregister.co.uk/mst365/index.asp?advert=gg2"&gt;Register Books website&lt;/a&gt;.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/aurora_exploit_known_months/</id>
+<updated>2010-01-22T16:46:06Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/aurora_exploit_known_months/"/>
+<title type="html">MS knew of Aurora exploit four months before Google attacks</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;China light on the matter&lt;/h4&gt; &lt;p&gt;Microsoft first knew of the bug used in the infamous Operation Aurora IE exploits as long ago as August, four months before the vulnerability was used in exploits against Google and other hi-tech firms in December, it has emerged.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/sun_schwartz_signoff/</id>
+<updated>2010-01-22T16:43:26Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/sun_schwartz_signoff/"/>
+<title type="html">Schwartz puts comforting arm around stricken Sun</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Uses other arm for Oracle fist-pump&lt;/h4&gt; &lt;p&gt;Employees at Sun Microsystems concerned at the prospect of yet more lay-offs at the company will not have been comforted by a company-wide memo from president and CEO Jonathan Schwartz yesterday.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/bt_infinity_p2p/</id>
+<updated>2010-01-22T15:55:34Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/bt_infinity_p2p/"/>
+<title type="html">BT to throttle P2P for faster broadband</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Ah well, there's always the post&lt;/h4&gt; &lt;p&gt;Hopes that BT's new faster broadband technology might improve peer-to-peer downloads have faded with the firm's confirmation that subscribers will be subject to the same restricitions as those on less expensive tariffs.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/morris_film_trailer/</id>
+<updated>2010-01-22T15:49:36Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/morris_film_trailer/"/>
+<title type="html">Chris Morris jihad film good to go</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;&lt;em&gt;Four Lions&lt;/em&gt; trailer live&lt;/h4&gt; &lt;p&gt;Chris Morris, the genius-man behind &lt;cite&gt;The Day Today,&lt;/cite&gt; &lt;cite&gt;Brass Eye&lt;/cite&gt; and &lt;cite&gt;Blue Jam&lt;/cite&gt;, has spent the last few years researching and making a film about some hapless British would-be terrorists.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/nokia_strike/</id>
+<updated>2010-01-22T15:16:35Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/nokia_strike/"/>
+<title type="html">Indian Nokia workers ready to up tools</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;'Misunderstanding' puts 2000 out on strike&lt;/h4&gt; &lt;p&gt;A two day strike at Nokia India is ending with most staff heading back to work, having been asked nicely and told that management will explain.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/nsa_dismissal/</id>
+<updated>2010-01-22T14:33:08Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/nsa_dismissal/"/>
+<title type="html">NSA beats warrantless wiretap rap</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Because it took in a nation of millions&lt;/h4&gt; &lt;p&gt;A Federal judge has dismissed a complaint against the National Security Agency's (NSA) Bush-era warrantless wiretapping programme, prompting suggestions the US government is now able to mount mass surveillance operations unhindered by the courts or constitution.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/virginia_doc_office_meteorite/</id>
+<updated>2010-01-22T14:30:11Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/virginia_doc_office_meteorite/"/>
+<title type="html">Crusty fireball space mango wrecks US doctor's office</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;'Fresh, pretty' meteorite blasts startled medics&lt;/h4&gt; &lt;p&gt;A "mango-sized" meteorite crashed into a doctor's office in Virginia this week at more than 200 mph, according to reports. The space rock smashed through the roof, an internal wall and an upper floor before shattering into several pieces on a concrete slab.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/somerset_police_computers/</id>
+<updated>2010-01-22T14:13:06Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/somerset_police_computers/"/>
+<title type="html">Avon &amp;amp; Somerset cop computers titsup?</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Tory MP questions use of 30 temp workers&lt;/h4&gt; &lt;p&gt;A South-West Tory MP has well and truly put the cat among the proverbials with a question in the House of Commons about what he claimed is the "collapse of Avon and Somerset police's computer system".…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/number_10_paf_database_petition/</id>
+<updated>2010-01-22T13:59:24Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/number_10_paf_database_petition/"/>
+<title type="html">UK government rebuffs cries for free postcode database</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;PAF remains fee-based, despite non-profit grumbles&lt;/h4&gt; &lt;p&gt;Number 10 has responded negatively to a &lt;a target="_blank" href="http://petitions.number10.gov.uk/nfppostcodes/#detail"&gt;petition&lt;/a&gt; signed by over 2,000 people that asked the government to convince the Royal Mail to offer a free postcode database to non-profit and community websites.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/859/atth0s1n.pdf?td=rss"&gt;The power of collaboration within unified communications&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/tor_security_update/</id>
+<updated>2010-01-22T13:46:56Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/tor_security_update/"/>
+<title type="html">Tor software updated after hackers crack into systems</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Miscreants remain anonymous&lt;/h4&gt; &lt;p&gt;Privacy-conscious users of the Tor anonymiser network have been urged to upgrade their software, following the discovery of a security breach.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/697/wp01-webthreats-080303-uk.pdf?td=rss"&gt;Web threats: Why conventional protection doesn't work&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/o2_mobile_landline/</id>
+<updated>2010-01-22T13:20:46Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/o2_mobile_landline/"/>
+<title type="html">O2 halts Mobile Landline signups</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;You're through to O2... we're busy&lt;/h4&gt; &lt;p&gt;O2 has suspended its popular 'Mobile Landline' service temporarily. The service provides business with a landline number that reroutes to an O2 SIM, without connection charges, an attractive proposition for SMEs.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/videogame_rickets/</id>
+<updated>2010-01-22T13:19:31Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/videogame_rickets/"/>
+<title type="html">Rickets rise linked to excessive gaming</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Too much PlayStation, not enough sunshine&lt;/h4&gt; &lt;p&gt;The number of British kids suffering from the deficiency disease rickets is soaring, medical experts have claimed. The cause: too many hours indoors playing videogames.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/airport_staff_vetting_objections/</id>
+<updated>2010-01-22T13:03:18Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/airport_staff_vetting_objections/"/>
+<title type="html">Airport scanner staff object to vetting</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Nothing to hide - but they'd rather not be checked&lt;/h4&gt; &lt;p&gt;Security staff at Heathrow airport are reportedly furious at the suggestion that any of them would ever use pics taken from the new body scanners for lewd or lascivious purposes.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/linux_developers_pay/</id>
+<updated>2010-01-22T12:51:27Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/linux_developers_pay/"/>
+<title type="html">Linux coders do it for money</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;No such thing as a free (software) launch&lt;/h4&gt; &lt;p&gt;Around 75 per cent of Linux developers raked in cash from their code crunching in the past year.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/government_multicloud/</id>
+<updated>2010-01-22T12:34:11Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/government_multicloud/"/>
+<title type="html">UK.gov stacking up gang of clouds</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Ain't nothin' but a G-thang&lt;/h4&gt; &lt;p&gt;The government's CIO has said he expects the public sector to use commercial cloud computing services as well as the 'G Cloud'.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/859/atth0s1n.pdf?td=rss"&gt;The power of collaboration within unified communications&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/hulc_protonex_fuel_cell/</id>
+<updated>2010-01-22T12:32:26Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/hulc_protonex_fuel_cell/"/>
+<title type="html">Super-soldier exoskeleton to get 3-day fuel cell powerpack</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Mech-troopers will be able to plug in gadgetry, too&lt;/h4&gt; &lt;p&gt;A radical powered exoskeleton under development for use by the US military is to be fitted with fuel-cell power supplies which will increase its endurance from hours to days - and furnish juice for the burgeoning load of electronics carried by modern soldiers, too.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/review_desktop_pc_hp_touchsmart_600/</id>
+<updated>2010-01-22T12:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/review_desktop_pc_hp_touchsmart_600/"/>
+<title type="html">HP TouchSmart 600</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Touchscreen iMac wannabee&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Review&lt;/strong&gt;  Every now and again, reviewing a new PC can be a pleasure rather than a chore. The feeling is not the result of blistering performance or a full set of ticks next to a spec list, but the natural response to using high-quality kit that works exactly as you want it to. HP’s TouchSmart 600 looks beautiful and reveals an approach to product design that suggests the company has thought carefully about every component and feature, and how best to implement them.…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/ofcom_freeview_hd_drm_consultation/</id>
+<updated>2010-01-22T11:48:42Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/ofcom_freeview_hd_drm_consultation/"/>
+<title type="html">Ofcom opens debate on Freeview HD DRM to punters</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;The BBC wants it - do you?&lt;/h4&gt; &lt;p&gt;Ofcom has begun asking the public whether the BBC should be allowed to apply DRM to Freeview HD broadcasts.…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/lenovo_lephone/</id>
+<updated>2010-01-22T11:48:13Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/lenovo_lephone/"/>
+<title type="html">Lenovo talks up LePhone launch</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Into China in May - UK later this year?&lt;/h4&gt; &lt;p&gt;Lenovo has reiterated its plan - announced at the Consumer Electronics Shows earlier this month - to launch an Android-based smartphone later this year.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/697/wp01-webthreats-080303-uk.pdf?td=rss"&gt;Web threats: Why conventional protection doesn't work&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/dowsing_rod_bomb_detector_bust/</id>
+<updated>2010-01-22T11:42:04Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/dowsing_rod_bomb_detector_bust/"/>
+<title type="html">Police arrest MD of dowsing-rod 'bomb detector' firm</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Sold 'magic' bomb wands to Iraqi gov for $56,000 each&lt;/h4&gt; &lt;p&gt;A British businessman who has made millions selling dowsing-rod "explosives detectors" to the Iraqi security forces has been arrested on suspicion of fraud.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/byd_e6/</id>
+<updated>2010-01-22T11:38:27Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/byd_e6/"/>
+<title type="html">Billionaire-funded e-car gets showroom date</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;The e6, from BYD and Warren Buffet&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Leccy Tech&lt;/strong&gt;  The electric e6 MPV manufactured by Chinese firm BYD and &lt;a href="http://www.reghardware.co.uk/2008/12/20/byd_f3dm/"&gt;partly funded&lt;/a&gt; by American billionaire Warren Buffet will be launched later this year.…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/irish_board_hack/</id>
+<updated>2010-01-22T11:32:58Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/irish_board_hack/"/>
+<title type="html">Irish board hack prompts password reset</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Users thrown into scramble to change up login credentials&lt;/h4&gt; &lt;p&gt;Popular Irish web discussion forum boards.ie has reset user passwords in response to a hack attack that compromised member login credentials.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/mandybill_file_sharing_payoff/</id>
+<updated>2010-01-22T11:04:09Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/mandybill_file_sharing_payoff/"/>
+<title type="html">Lords mull Hail Mary penance for file sharers</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Peers discuss costs, ambulance chasing&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Mandybill&lt;/strong&gt;  The Lords this week discussed new compensation for copyright holders this week - including a voluntary 'Hail Mary fine' payable by file sharers, instead of suspension - but nobody noticed.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/814/oracle-814.pdf?td=rss"&gt;What is your recession sales strategy?&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/winmo_7_september/</id>
+<updated>2010-01-22T10:48:54Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/winmo_7_september/"/>
+<title type="html">MS to release WinMo 7 to phone makers in September</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Mole also reiterates Zune, Xbox Live link claims&lt;/h4&gt; &lt;p&gt;Windows Mobile 7 will be handed over to smartphone makers in September, to allow them to ready devices based on the operating system for Q4 2010 or Q1 2011.…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/taoist_trucker_jailed/</id>
+<updated>2010-01-22T10:36:26Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/taoist_trucker_jailed/"/>
+<title type="html">Poetic justice for HK Taoist truck driver</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Sent down to judge's words of (real) Zen master&lt;/h4&gt; &lt;p&gt;The Hong Kong truck driver who duped an aspiring model into having ritual sex with him has been jailed for six years and nine months, HK's &lt;em&gt;The Standard&lt;/em&gt; reports.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/697/wp01-webthreats-080303-uk.pdf?td=rss"&gt;Web threats: Why conventional protection doesn't work&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/boring_storage/</id>
+<updated>2010-01-22T10:35:39Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/boring_storage/"/>
+<title type="html">Storage is boring, right?</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Understanding the Sherpa layer of IT&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Workshop&lt;/strong&gt;  In the spirit of calling a spade a spade, it is fair to say that computer storage is generally perceived to be quite dull – in Douglas Adams terms it would qualify as ‘mostly harmless’.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/china_google/</id>
+<updated>2010-01-22T10:27:29Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/china_google/"/>
+<title type="html">China swings at Clinton as Schmidt fudges exit plan</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;When I say pull out, what I mean is....&lt;/h4&gt; &lt;p&gt;China has hit back at criticism from Google and &lt;a href="http://www.theregister.co.uk/2010/01/21/clinton_google/" target="_blank"&gt;Hillary Clinton&lt;/a&gt; accusing them of cultural imperialism by insisting that their view of freedom of information is somehow universal.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/hp_smartbook_outed/</id>
+<updated>2010-01-22T10:09:32Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.reghardware.co.uk/2010/01/22/hp_smartbook_outed/"/>
+<title type="html">FCC website outs HP smartbook</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;AirLife to feature always-on HSDPA 3G&lt;/h4&gt; &lt;p&gt;HP has received US Federal Communications Commission approval for a device dubbed the Compaq AirLife 100 and listed on the FCC website as a "smartbook".…&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/meyer_talks_sockets/</id>
+<updated>2010-01-22T09:02:01Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/meyer_talks_sockets/"/>
+<title type="html">AMD preps for two-fisted two-socket catfight</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Sock it to Intel&lt;/h4&gt; &lt;p&gt;AMD is hoping that its next move in the server space will bring it success in the massive two-socket market.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/equality_bill/</id>
+<updated>2010-01-22T09:02:01Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/equality_bill/"/>
+<title type="html">The Equality Bill: Hidden agenda?</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Swerving discrimination means amendment pile-up&lt;/h4&gt; &lt;p&gt;&lt;strong&gt;Opinion&lt;/strong&gt;  Employment lawyers and HR professionals would be well advised to keep a close eye on the progress of the Equality Bill, currently being debated in the House of Lords. Initially intended as a legislative sweep-up, it now proposes major policy changes.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/public_ict_pressure/</id>
+<updated>2010-01-22T08:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/public_ict_pressure/"/>
+<title type="html">Public sector ICT 'under pressure'</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;You don't say&lt;/h4&gt; &lt;p&gt;The Society of IT Management has said that public sector ICT professionals are being squeezed between a cut in budgets and increased demand for their services.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/892/legoland.pdf?td=rss"&gt;Case Study: WhatsUp keeps Legoland turnstyles ringing&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/quantum_stornext_4/</id>
+<updated>2010-01-22T08:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/quantum_stornext_4/"/>
+<title type="html">Quantum spruces up StorNext filesystem</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;DXi influence spreads&lt;/h4&gt; &lt;p&gt;Quantum has pushed out a major StorNext release, adding integrated filesystem deduplication and replication.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+<entry>
+<id>tag:theregister.co.uk,2005:story/2010/01/22/springsource_oracle_sun/</id>
+<updated>2010-01-22T07:02:02Z</updated>
+<link rel="alternate" type="text/html" href="http://go.theregister.com/feed/www.theregister.co.uk/2010/01/22/springsource_oracle_sun/"/>
+<title type="html">Spring daddy looks to Oracle's MySQL commitments</title>
+<summary type="html" xml:base="http://www.theregister.co.uk/">&lt;h4&gt;Comfort guaranteed&lt;/h4&gt; &lt;p&gt;Rod Johnson, open source Java pioneer and general manager of VMware's new SpringSource division, has welcomed the commitment European regulators appear to have squeezed from Oracle over the future of MySQL.…&lt;/p&gt; &lt;p&gt;&lt;p&gt;&lt;a href="http://whitepapers.theregister.co.uk/paper/view/696/smartprotection-whitepaper.pdf?td=rss"&gt;Offloading malware protection to the cloud&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;</summary>
+</entry>
+</feed>
111 example/searchfeeds.py
@@ -0,0 +1,111 @@
+#!/usr/bin/env python
+
+import sys
+import os
+import time
+from operator import itemgetter
+from heapq import nlargest
+from mongoengine.document import Document
+from mongoengine import fields, connect
+import feedparser
+
+sys.path.insert(0, '..')
+
+import mongosearch
+
+
+class BlogPost(Document):
+ """A sample blog post document that will be indexed and searched. The title
+ is more important than the content so should be weighted higher.
+ """
+ title = fields.StringField()
+ content = fields.StringField()
+
+
+def get_feed_entries(feed_path):
+ """Parse the individual items out of a locally-stored RSS feed.
+ """
+ document = feedparser.parse(feed_path)
+
+ entries = {}
+ for entry in document.entries:
+ guid = entry.get('guid') or entry.get('link')
+ if guid in entries:
+ continue
+
+ # Use content if summary is not present
+ summary = entry.get('summary')
+ if not summary:
+ summary = entry.get('content', [{}])[0].get('value', '')
+
+ entries[guid] = (entry.title, summary)
+
+ return entries
+
+def quit_with_usage():
+ print >> sys.stderr, 'Usage: %s <query>' % sys.argv[0]
+ sys.exit(1)
+
+def main():
+ try:
+ query = ' '.join(sys.argv[1:])
+ except IndexError:
+ quit_with_usage()
+
+ if not query.strip():
+ quit_with_usage()
+
+ connect('mongosearch-example')
+
+ # Ensure that no data exists from a previous run of this example
+ BlogPost.drop_collection()
+
+ # Create an index for the blog post and add the fields to be indexed
+ index = mongosearch.SearchIndex(BlogPost)
+ index.add_field('title', html=True, weight=1.5)
+ index.add_field('content', html=True)
+
+ # In this example we are loading our test data from downloaded RSS feeds
+ # in the 'data' directory
+ feeds = ['df.xml', 'register.atom', 'github.xml']
+ feed_paths = [os.path.join('fixtures', feed) for feed in feeds]
+ for feed_path in feed_paths:
+ # Parse the feed and save it to the DB
+ entries = get_feed_entries(feed_path)
+ for guid, entry in entries.items():
+ post = BlogPost(title=entry[0], content=entry[1])
+ post.save()
+
+ # Index the collection
+ t0 = time.time()
+ index.generate_index()
+ print 'Indexing took %s seconds' % (time.time() - t0)
+
+ # Query the collection
+ t0 = time.time()
+ results = index.search(query)
+ top_matches = nlargest(10, results.iteritems(), itemgetter(1))
+ time_taken = time.time() - t0
+ print 'Querying took %s seconds' % time_taken
+
+ # Write the results to results.htm as HTML
+ outfile = open('results.htm', 'w')
+ outfile.write('<html><head><style>body{font-size: 70%;}</style>')
+ outfile.write('<meta http-equiv="Content-Type" content="text/html; ')
+ outfile.write('charset=UTF-8"/>')
+ outfile.write('</head><body>')
+ outfile.write('<h1>Search results for "%s"</h1>' % query)
+ outfile.write('<p><em>Query took %s seconds</em></p>' % time_taken)
+ for doc_id, score in top_matches:
+ doc = BlogPost.objects(id=doc_id).first()
+ outfile.write('<h2>[%s] %s</h2>' % (score, doc.title.encode('utf8')))
+ outfile.write('<p>%s</p>' % doc.content.encode('utf8'))
+ outfile.write('<br />')
+ outfile.write('</body></html>')
+
+ print 'Processed %s items' % BlogPost.objects.count()
+
+ print 'Results saved in results.htm'
+
+if __name__ == '__main__':
+ main()
184 mongosearch.py
@@ -0,0 +1,184 @@
+import re
+from itertools import groupby
+from operator import itemgetter
+from math import log
+
+import lxml.html
+from Stemmer import Stemmer
+from mongoengine.document import Document, EmbeddedDocument
+from mongoengine import fields
+
+STOP_WORDS = (
+ "a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,"
+ "be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,"
+ "ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,"
+ "i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,"
+ "my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,"
+ "say,says,she,should,since,so,some,than,that,the,their,them,then,there,"
+ "these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,"
+ "which,while,who,whom,why,will,with,would,yet,you,your"
+).split(',')
+
+
+class SearchTerm(EmbeddedDocument):
+ """A term linked to its weight - one of these is stored for each term in
+ each document. The weight
+ """
+ term = fields.StringField(db_field='t')
+ weight = fields.FloatField(db_field='w')
+ meta = {'allow_inheritance': False}
+
+
+class SearchIndex(object):
+
+ SEARCH_JS = """
+ function() {
+ var results = {};
+ // Iterate over each document to calculate the document's score
+ db[collection].find(query).forEach(function(doc) {
+ var score = 0;
+ // Iterate over each term in the document, calculating the
+ // score for the term, which will be added to the doc's score
+ doc[~terms].forEach(function(term) {
+ // Only look at the term if it is part of the query
+ if (options.queryTerms.indexOf(term[~terms.term]) != -1) {
+ // The meat of the BM25 ranking function
+ // (See http://en.wikipedia.org/wiki/Okapi_BM25)
+ //
+ // term.w (weight) is equivalent to the term's
+ // frequency in the document
+ //
+ // f(qi, D) * (k1 + 1)
+ var dividend = term[~terms.weight] * (options.k + 1);
+ // |D| / avgdl
+ var relDocSize = doc.length / options.avgDocLength;
+ // (1 - b + b * |D| / avgdl)
+ var divisor = 1.0 - options.b + options.b * relDocSize;
+ // f(qi, D) + k1 * (1 - b + b * |D| / avgdl)
+ divisor = term[~terms.weight] + divisor * options.k
+ // Divide the top half by the bottom half
+ var termScore = dividend / divisor;
+ // Then scale by the inverse document frequency
+ termScore *= options.idfs[term[~terms.term]];
+ // The document's score is the sum of its terms scores
+ score += termScore;
+ }
+ });
+ results[doc[~doc_id]] = score;
+ });
+ return results;
+ }
+ """
+
+ def __init__(self, document, use_term_index=True):
+ self.document = document
+ # Make index document for the document provided
+ index_meta = {
+ 'allow_inheritance': False,
+ 'collection': '%sindex' % document._meta['collection'],
+ }
+ if use_term_index:
+ index_meta['indexes'] = ['terms.term']
+
+ class DocumentIndex(Document):
+ doc_id = fields.StringField(primary_key=True)
+ terms = fields.ListField(fields.EmbeddedDocumentField(SearchTerm))
+ length = fields.IntField()
+ meta = index_meta
+
+ self.document_index = DocumentIndex
+ self.fields = {}
+
+ def add_field(self, name, weight=1.0, html=False):
+ self.fields[name] = {'weight': weight, 'html': html}
+
+ def get_queryset(self, document):
+ return document.objects
+
+ def generate_index(self):
+ """Generate the index for the indexed collection. This will remove any
+ existing index, and regenerate everything from scratch.
+ """
+ # Reset the index as we are regenerating it from scratch
+ self.document_index.drop_collection()
+ # Add an index entry for each document
+ for doc in self.get_queryset(self.document):
+ self.add_to_index(doc)
+
+ def add_to_index(self, doc):
+ """Add an individual document to the index.
+ """
+ terms = []
+ for field_name, field_settings in self.fields.items():
+ # Make sure the value is actually a string
+ if isinstance(doc[field_name], basestring):
+ if field_settings['html']:
+ field_terms = self._prepare_html(doc[field_name])
+ else:
+ field_terms = self._prepare_text(doc[field_name])
+
+ # Add terms for this field to the document's terms
+ weight = field_settings['weight']
+ for term in field_terms:
+ terms.append((term, weight))
+
+ doc_len = len(terms)
+
+ terms.sort(key=itemgetter(0))
+ unique_terms = []
+ for term, like_terms in groupby(terms, itemgetter(0)):
+ # Combine the weights of like terms
+ weight = sum(itemgetter(1)(t) for t in like_terms)
+ unique_terms.append(SearchTerm(term=term, weight=weight))
+
+ doc_index = self.document_index(doc_id=unicode(doc.id),
+ terms=unique_terms, length=doc_len)
+ doc_index.save()
+
+ def _prepare_html(self, html):
+ """Strips tags, entities, etc, then tokenizes and stems content.
+ """
+ text = lxml.html.fromstring(html).text_content()
+ return self._prepare_text(text)
+
+ def _prepare_text(self, text):
+ """Extracts and stems the words from some given text.
+ """
+ words = re.findall('[a-z0-9\']+', text.lower())
+ words = [word for word in words if word not in STOP_WORDS]
+ stemmer = Stemmer('english')
+ stemmed_words = stemmer.stemWords(words)
+ return stemmed_words
+
+ def search(self, query, html=False):
+ """Search the index using a text query.
+ """
+ # Tokenize query
+ if html:
+ query_terms = self._prepare_html(query)
+ else:
+ query_terms = self._prepare_text(query)
+
+ # Calculate the inverse document frequency for each term
+ idfs = {}
+ num_docs = self.document_index.objects.count()
+ for term in query_terms:
+ term_docs = self.document_index.objects(terms__term=term).count()
+ idfs[term] = log((num_docs - term_docs + 0.5) / (term_docs + 0.5))
+
+ # Get the average document length
+ avg_doc_length = self.document_index.objects.average('length')
+
+ # Only look for documents that actually contain the terms
+ query = self.document_index.objects(terms__term__in=query_terms)
+ options = {
+ 'idfs': idfs,
+ 'avgDocLength': avg_doc_length,
+ 'queryTerms': query_terms,
+ # BM25 variables
+ 'k': 2.0,
+ 'b': 0.75,
+ }
+ results = query.exec_js(self.SEARCH_JS, 'doc_id', 'terms', **options)
+ return results
+
4 requirements.txt
@@ -0,0 +1,4 @@
+PyStemmer>=1.1.0
+lxml>=2.2.4
+feedparser>=4.1
+mongoengine>=0.3
Please sign in to comment.
Something went wrong with that request. Please try again.