feed.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
;;
;;  This softeware is Copyright (c) 2009 A.F. Haffmans 
;;
;;    This file is part of cl-bliky.
;;
;;    cl-bliky is free software: you can redistribute it and/or modify
;;    it under the terms of the GNU General Public License as published by
;;   the Free Software Foundation, either version 3 of the License, or
;;    (at your option) any later version.
;;
;;    cl-bliky is distributed in the hope that it will be useful,
;;    but WITHOUT ANY WARRANTY; without even the implied warranty of
;;    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;    GNU General Public License for more details.
;;
;;    You should have received a copy of the GNU General Public License
;;    along with cl-bliky.  If not, see <http://www.gnu.org/licenses/>.
;;
;;

-->

<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?>
<?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
<!-- rss version='2.0' xmlns:atom="http://www.w3.org/2005/Atom" -->
  <channel>
    <title> Mohegan SkunkWorks </title>
    <link>  http://fons.github.com/ </link>
    <description> This web log is concerned mostly with my interest in programming languages, algorithms, machine learning, search or other software endeavors. &lt;BR&gt;&lt;BR&gt; </description>
    <pubDate> Sat, 26 Jun 2010 20:01:59 EST </pubDate>
    
    <item>
      <title> finding things in a mongo database  </title>
      <link> http://fons.github.com/finding-things-in-a-mongo-database-.html </link>
      <description> &lt;p&gt;The generic method &lt;em&gt;dbfind&lt;/em&gt; provides the basic query interface in cl-mongo. It's defined as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(defgeneric db.find (collection  kv &key) )
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;db.find&lt;/em&gt; returns documents in the collection which satisfy the query pattern 
specified by the bson document kv. If you use the keyword &lt;em&gt;:all&lt;/em&gt; db.find will return all documents. If you're looking for all documents with specific (key value) pair you'd specify that those as the query pattern :&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; (db.find "foo" (kv (kv "k" 3) (kv "l" 4)))
&lt;/code&gt;&lt;/pre&gt; </description> 
      <content:encoded><![CDATA[<p>The generic method <em>dbfind</em> provides the basic query interface in cl-mongo. It's defined as:</p>

<pre><code>(defgeneric db.find (collection  kv &key) )
</code></pre>

<p><em>db.find</em> returns documents in the collection which satisfy the query pattern 
specified by the bson document kv. If you use the keyword <em>:all</em> db.find will return all documents. If you're looking for all documents with specific (key value) pair you'd specify that those as the query pattern :</p>

<pre><code> (db.find "foo" (kv (kv "k" 3) (kv "l" 4)))
</code></pre><p><em>db.find</em> returns a header and a list of documents. A single call to db.find returns at most a 100 
documents or whatever mongodb has set as the limit to return in a single call. To get more documents 
you'd need to query the database with the iterator object returned by <em>db.find</em>. 
The convenience function <em>iter</em> does exactly that. So, in order to return <em>all</em> documents 
in collection "foo" you'd call :</p>

<pre><code>  (pp (iter (db.find "foo" :all)))
</code></pre>

<p><em>pp</em> is a pretty printer provided by <em>cl_mongo</em>. Alternatively, you can use the <em>docs</em> function to
convert the results of the query in a list of documents for further processing. Both <em>pp</em> and <em>docs</em> 
will properly clean up the iterator object. </p>

<p><em>db.find</em> accepts a number of keywords. The keyword <em>:limit</em> defines the maximum number 
of documents returned, with 1 as it's default. However when the query pattern is <em>:all</em> it will in fact 
be 0, and hence <em>db.find</em> will return the maximum number of documents allowable by mongodb.
In all other cases, since the default value of <em>:limit</em> is 1, db.find 
is the equivalent of <em>findOne</em> in the mongo documentation.</p>

<p>Unsurprisingly, you can specify the number of documents to skip in this query with  <em>:skip</em>.
It's default is 0.</p>

<p>The <em>:selector</em> keyword allows you to select which keys to return. For example, 
if you just want to return the <em>"_id"</em> field for the objects in foo, you'd specify :</p>

<pre><code>  (pp (iter (db.find "foo" :all :selector "_id" )))
</code></pre>

<p>The keyword <em>:mongo</em>  allows you to specify a connection other than the default connection.</p>

<p>The keyword <em>:options</em> is used to specify query options and will be covered elsewhere.</p> ]]></content:encoded>
      <guid> http://fons.github.com/finding-things-in-a-mongo-database-.html </guid>      
      <pubDate> Sat, 26 Jun 2010 20:00:53 EST </pubDate>
    </item>
    
    <item>
      <title> key-value pairs in cl-mongo </title>
      <link> http://fons.github.com/key-value-pairs-in-cl-mongo.html </link>
      <description> In a key-value database like &lt;a href=""&gt;mongodb&lt;/a&gt; the fundamental data element is the key value pair. 
Key-value pairs in languages like javascript or python as have a seperate representation like :&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; 
{ key : value }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In lisp a natural enough equivalent is the association list or alternatively the dotted list.
Since the javascript expression also creates a key-value object, I wanted to mimick the same semantics in lisp.
In order to stay a close as possible to the java script syntax, and to also create a disticnt type 
I could switch on when dispatching generic function calls, I created the two types.&lt;/p&gt; </description> 
      <content:encoded><![CDATA[In a key-value database like <a href="">mongodb</a> the fundamental data element is the key value pair. 
Key-value pairs in languages like javascript or python as have a seperate representation like :</p>

<pre><code> 
{ key : value }
</code></pre>

<p>In lisp a natural enough equivalent is the association list or alternatively the dotted list.
Since the javascript expression also creates a key-value object, I wanted to mimick the same semantics in lisp.
In order to stay a close as possible to the java script syntax, and to also create a disticnt type 
I could switch on when dispatching generic function calls, I created the two types.</p><p>The first type is the a pair structure :</p>

<pre><code>(defstruct (pair
         (:conc-name pair-)
         (:constructor pair (key value)))
 key value)
</code></pre>

<p>In addition I created a pair container :</p>

<pre><code> (defclass kv-container () 
      ((container :initform (make-array 2 :fill-pointer 0 :adjustable t) :accessor container)))
</code></pre>

<p>The interface is the kv generic function defined below :</p>

<pre><code>(defgeneric kv (a &rest rest) 
     (:documentation " This a helper function for key-value pairs and sets of key-value pairs.  
       In a key-value pair like (kv key value) the key has to be a string and the
       value something which is serializable. 
       key-value pairs can be combined using kv as well : (kv (kv key1 val1) (kv key2 val2)).
       This combination of key-value pairs is equivalent to a document without a unique id.
        The server will assign a unique is if a list of key-value pairs is saved."))
</code></pre>

<p>In addition, the following macro provides an even shorter name :</p>

<pre><code>(defmacro $ (&rest args)
      `(kv ,@args))
</code></pre>

<p>The generic function <em>kv</em> or the <em>$</em> macro provides the interface to create sets of key-value pairs. 
For example, to find all items with a particular key-value pais , I'd use :</p>

<pre><code>(db.find "foo" (kv "key" "value") :limit 0)
</code></pre>

<p>Looking for an element value with a value for "key" larger than 30 I'd use :</p>

<pre><code>(pp (db.find "foo" (kv "key" (kv "$gt" 3))))
</code></pre> ]]></content:encoded>
      <guid> http://fons.github.com/key-value-pairs-in-cl-mongo.html </guid>      
      <pubDate> Wed, 23 Jun 2010 16:20:59 EST </pubDate>
    </item>
    
    <item>
      <title> literate programming intro to cl-mongo </title>
      <link> http://fons.github.com/literate-programming-intro-to-cl-mongo.html </link>
      <description> &lt;p&gt;After the jump you'll find the source code of the
&lt;a href="http://www.github.com/fons/cl-mongo"&gt;cl-mongo&lt;/a&gt; 
&lt;a href="http://blip.tv/file/3680363"&gt;demo&lt;/a&gt; 
I gave the &lt;a href="http://www.10gen.com/event_mongony_10may21"&gt;mongonyc&lt;/a&gt; event.
The code is in 
&lt;a href="http://en.wikipedia.org/wiki/Literate_programming"&gt;literate programming&lt;/a&gt;
style and available as &lt;a href="http://gist.github.com/445859"&gt;gist&lt;/a&gt;.&lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p>After the jump you'll find the source code of the
<a href="http://www.github.com/fons/cl-mongo">cl-mongo</a> 
<a href="http://blip.tv/file/3680363">demo</a> 
I gave the <a href="http://www.10gen.com/event_mongony_10may21">mongonyc</a> event.
The code is in 
<a href="http://en.wikipedia.org/wiki/Literate_programming">literate programming</a>
style and available as <a href="http://gist.github.com/445859">gist</a>.</p><script src="http://gist.github.com/445859.js?file=mongonyc-example.cl"></script> ]]></content:encoded>
      <guid> http://fons.github.com/literate-programming-intro-to-cl-mongo.html </guid>      
      <pubDate> Sun, 20 Jun 2010 10:35:02 EST </pubDate>
    </item>
    
    <item>
      <title> mongonyc presentation. </title>
      <link> http://fons.github.com/mongonyc-presentation.html </link>
      <description> On may 21st this year the folks at &lt;a href="http://www.10gen.com/"&gt;10gen&lt;/a&gt; hosted 
&lt;a href="http://www.10gen.com/event_mongony_10may21"&gt;mongonyc&lt;/a&gt;.
This conference the first mongodb conference 
here in New York. I give a brief &lt;a href="http://blip.tv/file/3680363"&gt;lightning talk&lt;/a&gt;
on  &lt;a href="http://www.github.com/fons/cl-mongo"&gt;cl-mongo&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;img src="http://github.com/fons/blog-images/raw/master/20100521/badge-mongonyc-large.png" alt="mongonyc presetber badge" align="right"&gt;
The contents of the presentation can be found 
&lt;a href="http://github.com/fons/presentations/tree/master/20100521/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It was great to see folks from various start-ups here in the City turn up for this. 
I was especially interested in how mongodb was used. </description> 
      <content:encoded><![CDATA[On may 21st this year the folks at <a href="http://www.10gen.com/">10gen</a> hosted 
<a href="http://www.10gen.com/event_mongony_10may21">mongonyc</a>.
This conference the first mongodb conference 
here in New York. I give a brief <a href="http://blip.tv/file/3680363">lightning talk</a>
on  <a href="http://www.github.com/fons/cl-mongo">cl-mongo</a> </p>

<p><img src="http://github.com/fons/blog-images/raw/master/20100521/badge-mongonyc-large.png" alt="mongonyc presetber badge" align="right">
The contents of the presentation can be found 
<a href="http://github.com/fons/presentations/tree/master/20100521/">here</a>.</p>

<p>It was great to see folks from various start-ups here in the City turn up for this. 
I was especially interested in how mongodb was used.There's obviously it's use at <a href="http://blip.tv/file/3704098">foursquare</a>, but there was also a short talk on a
<a href="http://blip.tv/file/3680611">clever way</a> to use mongodb to monitor distributed log files at a hedge fund.</p> ]]></content:encoded>
      <guid> http://fons.github.com/mongonyc-presentation.html </guid>      
      <pubDate> Fri, 18 Jun 2010 15:31:29 EST </pubDate>
    </item>
    
    <item>
      <title> cl-mongo now on multiple lisps. </title>
      <link> http://fons.github.com/cl-mongo-now-on-multiple-lisps.html </link>
      <description> &lt;p&gt;&lt;a href="http://github.com/fons/cl-mongo"&gt;cl-mongo&lt;/a&gt;    now runs in the following lisp images : &lt;a href="http://www.sbcl.org/"&gt;sbcl&lt;/a&gt;,  &lt;a href="http://clisp.cons.org/"&gt;clisp&lt;/a&gt;, &lt;a href="http://www.franz.com/products/allegrocl/"&gt;allegro common lisp&lt;/a&gt; and &lt;a href="http://trac.clozure.com/ccl"&gt;clozure common lisp&lt;/a&gt;. I was not able to get it working with &lt;a href="http://common-lisp.net/project/armedbear/"&gt;armed bear common lisp&lt;/a&gt;.I originally used sbcl to develop cl-mongo, but it is my goal to be able to run it on most lisps. &lt;br&gt; &lt;br&gt; To get &lt;a href="http://github.com/fons/cl-mongo"&gt;cl-mongo&lt;/a&gt; to run with clisp I needed to make only one fairly minor change in the networking code.&lt;br&gt; &lt;br&gt; When you send an insert to mongodb, it does not respond back. When you send a query request you obviously do get a response back. Since I'm using the same socket connection for writes and reads I need to have some mechanism in place to wait for a response, when one is expected. A blocking read on the socket is out of the question. That would destroy performance. &lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p><a href="http://github.com/fons/cl-mongo">cl-mongo</a>    now runs in the following lisp images : <a href="http://www.sbcl.org/">sbcl</a>,  <a href="http://clisp.cons.org/">clisp</a>, <a href="http://www.franz.com/products/allegrocl/">allegro common lisp</a> and <a href="http://trac.clozure.com/ccl">clozure common lisp</a>. I was not able to get it working with <a href="http://common-lisp.net/project/armedbear/">armed bear common lisp</a>.I originally used sbcl to develop cl-mongo, but it is my goal to be able to run it on most lisps. <br> <br> To get <a href="http://github.com/fons/cl-mongo">cl-mongo</a> to run with clisp I needed to make only one fairly minor change in the networking code.<br> <br> When you send an insert to mongodb, it does not respond back. When you send a query request you obviously do get a response back. Since I'm using the same socket connection for writes and reads I need to have some mechanism in place to wait for a response, when one is expected. A blocking read on the socket is out of the question. That would destroy performance. </p><p>I 'm using <a href="http://common-lisp.net/project/usocket/api-docs.shtml">usocket</a> as  my socket api. usocket's api description recommends <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_listen.htm">listening</a> on the socket stream. This worked fine under sbcl. <br><a href="http://clisp.cons.org/impnotes/non-block-io.html">clisp's implementation of listen</a> however does not support this sort of functionality for binary streams. Instead I use <a href="http://clisp.cons.org/impnotes/socket.html#so-status">clisp's socket status</a>.<br> <br> clozure common lisp was not happy with my use of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_call_n.htm">call-next-method</a> in cl-mongo's db.find implementation. Rather than rely on clos to string subsequent calls together, I used an explicit call to db.find. Lastly, clozure has it's own implementation of cwd, so I 'm no longer exporting  this for clozure. I use cwd to return the name of the current database in cl-mongo.<br> <br> Armed bear common lisp doesn't compile the <a href="http://common-lisp.net/project/babel/">babel package.</a>I use this package to generate utf-8 compatible strings, and that's pretty essential as mongodb expects all strings to be utf-8 encoded.<br> <br> Finally, allegro lisp had no problems compiling and running cl-mongo. </p> ]]></content:encoded>
      <guid> http://fons.github.com/cl-mongo-now-on-multiple-lisps.html </guid>      
      <pubDate> Sat, 13 Mar 2010 17:39:02 EST </pubDate>
    </item>
    
    <item>
      <title> Adding clojure to an existing slime setup in emacs </title>
      <link> http://fons.github.com/adding-clojure-to-an-existing-slime-setup-in-emacs.html </link>
      <description> &lt;p&gt;The current &lt;a href="http://github.com/technomancy/swank-clojure"&gt;recommended setup of emacs and slime with clojure&lt;/a&gt;  is to have &lt;a href="http://tromey.com/elpa/"&gt;elpa&lt;/a&gt; handle all the dependencies. As an alternative, you can start a swank server using  either the &lt;a href="http://github.com/technomancy/leiningen/tree/master/lein-swank/"&gt;swank plug-in&lt;/a&gt; for the &lt;a href="http://github.com/technomancy/leiningen"&gt;leiningen&lt;/a&gt; build tool, or the &lt;a href="http://github.com/talios/clojure-maven-plugin"&gt;swank plug-in&lt;/a&gt; for the &lt;a href="http://maven.apache.org/"&gt;maven&lt;/a&gt; build tool. &lt;br&gt; &lt;br&gt; All of this advice is good, but I've been using slime with sbcl and emacs for years and I don't want to start from scratch just to add clojure. In addition, rather than hand things off to a tool like elpa, I'd like to install things myself, so I get to understand how the various pieces work together.&lt;br&gt; &lt;br&gt; I'm going to show how you too can use use the &lt;a href="http://common-lisp.net/project/slime/"&gt;current cvs head for slime&lt;/a&gt;, and the current git repos for &lt;a href="http://github.com/richhickey/clojure"&gt;clojure&lt;/a&gt;, &lt;a href="http://richhickey.github.com/clojure-contrib/"&gt;clojure-contrib&lt;/a&gt; and &lt;a href="git://github.com/technomancy/swank-clojure.git"&gt;swank clojure&lt;/a&gt; to run clojure with slime and lisp. I"ll provide a few helpful links to get more information on slime and swank. As it turns out, there 's currently a bit of incompatibility between clojure and the slime package, but it's minor and easy to work around. &lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p>The current <a href="http://github.com/technomancy/swank-clojure">recommended setup of emacs and slime with clojure</a>  is to have <a href="http://tromey.com/elpa/">elpa</a> handle all the dependencies. As an alternative, you can start a swank server using  either the <a href="http://github.com/technomancy/leiningen/tree/master/lein-swank/">swank plug-in</a> for the <a href="http://github.com/technomancy/leiningen">leiningen</a> build tool, or the <a href="http://github.com/talios/clojure-maven-plugin">swank plug-in</a> for the <a href="http://maven.apache.org/">maven</a> build tool. <br> <br> All of this advice is good, but I've been using slime with sbcl and emacs for years and I don't want to start from scratch just to add clojure. In addition, rather than hand things off to a tool like elpa, I'd like to install things myself, so I get to understand how the various pieces work together.<br> <br> I'm going to show how you too can use use the <a href="http://common-lisp.net/project/slime/">current cvs head for slime</a>, and the current git repos for <a href="http://github.com/richhickey/clojure">clojure</a>, <a href="http://richhickey.github.com/clojure-contrib/">clojure-contrib</a> and <a href="git://github.com/technomancy/swank-clojure.git">swank clojure</a> to run clojure with slime and lisp. I"ll provide a few helpful links to get more information on slime and swank. As it turns out, there 's currently a bit of incompatibility between clojure and the slime package, but it's minor and easy to work around. </p><h2>A few words on slime 's architecture</h2><p><a href="http://bc.tech.coop/blog/081209.html">Bill Clementson's blog entry on slime</a> brings together quite a few resources on slime. It has this illuminating illustration of <a href="http://common-lisp.net/project/slime/">slime's</a> architecture taken from Tobias Rittweiler's <a href="http://common-lisp.net/~trittweiler/talks/slime-talk-2008.pdf">slime talk</a>. <img src="http://bc.tech.coop/blog/images/slime-swank.jpg" alt="Slime/Swank Architecture" align="right">  </p><p>As you can see slime's a client-server architecture. Each lisp needs to implement the swank protocol in order to talk to the emacs client.<a href="git://github.com/technomancy/swank-clojure.git">Swank clojure</a>  provides a swank implementation for clojure. Typically you start a 'swank' session of your lisp through emacs. But you can also choose to start up a lisp, load it's swank module, start a swank server and connect to it from emacs. That's the method used by the leiningen and maven plugins mentioned earlier. This picture underscores that there are various pieces that work together through a shared protocol. </p><h2>Emacs and Slime</h2><p>I installed <a href="http://common-lisp.net/project/slime/">slime from it's cvs repository</a> in ~/Tools/slime.Slime's _./contrib_ sub-directory contains addons to enhance slime's basic functionality.  In particular _slime-fancy.el_  groups a large set of these enhancements together so you don't have to initialize each and everyone of them. I"ll get back to this later on.By the way, _./contrib_ also contains swank implementations for ruby, mit-scheme and kawa. Unfortunately none of these work satisfactorily.<br><br> The <a href="http://common-lisp.net/project/slime/doc/html/">slime manual</a> has a section on how to <a href="http://common-lisp.net/project/slime/doc/html/Multiple-Lisps.html#Multiple-Lisps">setup slime for use with multiple lisps</a>. This involves adding entries like this  <br />
</p><pre><code>       (name (progam progam-args..) &amp;key coding-system init init-function env)  
</code></pre><p>to <em>slime-lisp-implementations</em>. <em>name</em> is the symbol used to identify the program. <em>(program program-args ...)</em> is used to start up the lisp in question. The function specified by the keyword :init is used to instruct the lisp to start its swank server. </p><p>An entry in the emacs initialization file to enable multiple lisps with slime might look like this: </p><pre><code>     (add-to-list 'load-path "/home/fons/Tools/slime/")  
     (add-to-list 'load-path "/home/fons/Tools/slime/contrib")  
 
     (require 'slime)  
     (slime-setup '(slime-fancy slime-asdf))  
 
     (setq slime-multiprocessing t)  
     (set-language-environment "UTF-8")  
     (setq slime-net-coding-system 'utf-8-unix)  
 
     (setq slime-lisp-implementations  
       '((clisp   ("/usr/bin/clisp" "-K full"))  
         (sbcl    ("/usr/bin/sbcl"))  
         (abcl    ("~/Tools/bin/abcl"))  
         (ccl     ("~/Tools/bin/ccl"))))  
 
     (setf slime-default-lisp 'sbcl)  
     (global-set-key [f6] 'slime)  
 
</code></pre><p>First slime's directories are added to emacs' <em>load-path</em>. After loading slime and adding a bunch of useful add-ons by loading slime-fancy, a few lisps are added to <em>slime-implementations</em>. Some of them run from standard linux locations. Others are started with a shell script. <em>slime-default-lisp</em> i set to sbcl so that's the default list that's going to used when I just do <code>M-x slime</code> (or hit f6 since I've bound this command to this key). </p><p>The other lisps are started by <a href="http://common-lisp.net/project/slime/doc/html/Multiple-Lisps.html#Multiple-Lisps">invoking slime with a negative prefix argument</a>, <code>M-- M-x slime</code> and selecting the name of the lisp in the <em>slime-defaults-list</em>. </p><p>If you 've never really worked with slime it might be good idea to set up at least a few lisps just to test out your setup and get you in the right mood for the additional hacking which needs to be done.<br />
</p><h2>swank-clojure</h2><p>I cloned the <a href="git://github.com/technomancy/swank-clojure.git">swank clojure</a> git repository and build it. I put all the clojure related specif components in a directory called "~/Tools/swank-clojure-enablers". The swank-clojure-autoload file is obviously not part of the git repository and needs to be generated. After adding this </p><pre><code>(defun swank-clojure-autoloads nil  
(interactive)  
(let ((generated-autoload-file "~/Tools/swank-clojure-enablers/swank-clojure-autoload.el"))  
(update-directory-autoloads "~/Tools/swank-clojure-enablers"))) </code></pre><p>to my init file <code>M-x swank-clojure-autoloads</code> can be used to generate swank-clojure's <a href="http://www.gnu.org/software/emacs/elisp/html_node/Autoload.html">autoload</a> file. As you can see in the source file for swank-clojure.el <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Defining-Advice.html#Defining-Advice">defadvice</a> is used to add clojure to <em>slime-lisp-implementations</em> : </p><pre><code>(defadvice slime-read-interactive-args (before add-clojure)  
;; Unfortunately we need to construct our Clojure-launching command  
;; at slime-launch time to reflect changes in the classpath. Slime  
;; has no mechanism to support this, so we must resort to advice.  
(require 'assoc)  
(aput 'slime-lisp-implementations 'clojure  
(list (swank-clojure-cmd) :init 'swank-clojure-init)))  
 
</code></pre><p>What's basically happening here is that the keyword 'clojure is added to <em>slime-lisp-implementations</em>. The function <em>swank-clojure-cmd</em> starts up clojure and the function <em>swank-clojure-init</em> instructs clojure to start up a swank server. </p><p>All the heavy lifting as far as customizing your clojure environment is done in the <em>swank-clojure-init</em> function. It basically sets up the class path and <em>swank-clojure.el</em> provides hooks to customize the start up phase. </p><h2>Setting the class-path through swank-clojure-classpath.</h2><p>The first time clojure is started it checks to see if the jar files are installed so that clojure can be run with swank. If it doesn't 'see' the jars installed it will download a precompiled version of <em>clojure.jar</em>, <em>clojure-contrib.jar</em> and <em>clojure-swank.jar</em> to <em>~/.swank-clojure</em>. <br> <br>  My preference is to use the latest and greatest versions of the various libraries.  I also don't like to have jar files put in locations where I can't really see them. I'd rather know explicitly what my dependencies are. Luckily the class-path can be customized through the <em>swank-clojure-classpath</em> variable. After I put some links in <em>~/Tools/swankclojure-enablers/</em> to the required jar files, I added this to my emacs initialization file : </p><pre><code>   (setq swank-clojure-classpath (directory-files "~/Tools/swank-clojure-enablers" t ".jar$"))  
</code></pre><p>This highlights an interesting difference with the other lisps.  I use asdf to load libraries into an already running lisp image. I don't need to set asdf's paths before doing starting the lisp repl. in fact I can change the asdf search path after the repl has started. </p><p>With clojure you're obviously a bit constraint in that you can't change the class path or load a library into the jvm at run time. <br> <br>  <em>swank-clojure</em> provides <em>swank-clojure-project</em> as an alternative way to load your class dependencies at startup. It will kill your clojure repl, set the class path for your project and restart the clojure swank server. </p><h2>Dealing with the hanging repl</h2><p>If you start up clojure at this stage and enter something like <code>(+ 1 2)</code> at the cursor  you"ll notice that the repl "hangs". </p><p>Swank-clojure cannot handle some recent changes introduced to one of to slime addons. The details can be found in the<br />
<a href="http://groups.google.com/group/swank-clojure/browse_thread/thread/6736c851f23d81d8#">in the swank-clojure group</a>  and <a href="http://thread.gmane.org/gmane.lisp.slime.devel/9178">slime mailing list</a> The bottom line seems to be that the clojure reader doesn't handle the same collection of characters as other lisp readers. </p><p>Luckily this is limited to the <em>autodoc</em> component in <em>.../slime/contrib</em>. <em>autodoc</em> provides fancy markup in the echo area for the arguments of lisp functions. </p><p>If you can live without that, you can unload the autodoc component after slime-fancy is loaded. Alternatively you can just use the slime-repl package when you initialize slime. </p><h2>To summarize</h2><p>This than is more or less my setup : </p><pre><code> (add-to-list 'load-path "/home/fons/Tools/swank-clojure-enablers")  
 (add-to-list 'load-path "/home/fons/Tools/slime/")  
 (add-to-list 'load-path "/home/fons/Tools/slime/contrib")  
 
 (require 'swank-clojure-autoloads)  
 (require 'slime)  
 (slime-setup '(slime-fancy slime-asdf))  
 
 (unload-feature 'slime-autodoc t)  
 
 (setq slime-multiprocessing t)  
 (set-language-environment "UTF-8")  
 (setq slime-net-coding-system 'utf-8-unix)  
 
 (setq slime-lisp-implementations  
 	    '((clisp   ("/usr/bin/clisp" "-K full"))  
	      (sbcl    ("/usr/bin/sbcl"))  
          (abcl    ("~/Tools/bin/abcl"))  
          (ccl     ("~/Tools/bin/ccl"))))  
 
 (setq swank-clojure-classpath (directory-files "~/Tools/swank-clojure-enablers" t ".jar$"))  
 (setf slime-default-lisp 'sbcl)  
 (global-set-key [f6] 'slime)  
 
 (defun swank-clojure-autoloads nil  
 (interactive)  
 (let ((generated-autoload-file "~/Tools/swank-clojure-enablers/swank-clojure-autoloads.el"))  
 (update-directory-autoloads "~/Tools/swank-clojure-enablers")))  
</code></pre><p>Notice that I unload <em>autodoc</em> after slime is loaded. </p> ]]></content:encoded>
      <guid> http://fons.github.com/adding-clojure-to-an-existing-slime-setup-in-emacs.html </guid>      
      <pubDate> Fri, 12 Mar 2010 15:03:57 EST </pubDate>
    </item>
    
    <item>
      <title> Connections in CL-MONGO </title>
      <link> http://fons.github.com/connections-in-cl-mongo.html </link>
      <description> &lt;p&gt;I revamped the way connections to a mongo database are handled in cl-mongo. In the new implementation each connection is referenced by a unique name. Each connection is stored in a connection registry. Database calls in cl-mongo default to using the :default connection. The connection parameters used by the default connection for host, port and db are accessible through the &lt;em&gt;mongo-default-..&lt;/em&gt; special variables. &lt;br&gt; &lt;br&gt;&lt;code&gt;defgeneric mongo (&amp;key host port db name)&lt;/code&gt; gets the connection referred to by the :name keyword. The default for :name is :default. If no connection with that name exists, a new connection will be created.&lt;br&gt; &lt;br&gt;  &lt;code&gt;defun mongo-show ()&lt;/code&gt; will show all connections currently in the registry. &lt;br&gt;  &lt;br&gt;&lt;code&gt;(defgeneric mongo-swap (left right)&lt;/code&gt; will swap two connections. This is useful if you want to use a different &lt;em&gt;default&lt;/em&gt; connection, or want to change the parameters on an existing named connection. &lt;br&gt; &lt;br&gt;&lt;code&gt;defgeneric mongo-close ( name )&lt;/code&gt; is used to close a connection. The special keyword :all can be used to close all connections in the registry. &lt;br&gt; &lt;br&gt;  &lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p>I revamped the way connections to a mongo database are handled in cl-mongo. In the new implementation each connection is referenced by a unique name. Each connection is stored in a connection registry. Database calls in cl-mongo default to using the :default connection. The connection parameters used by the default connection for host, port and db are accessible through the <em>mongo-default-..</em> special variables. <br> <br><code>defgeneric mongo (&amp;key host port db name)</code> gets the connection referred to by the :name keyword. The default for :name is :default. If no connection with that name exists, a new connection will be created.<br> <br>  <code>defun mongo-show ()</code> will show all connections currently in the registry. <br>  <br><code>(defgeneric mongo-swap (left right)</code> will swap two connections. This is useful if you want to use a different <em>default</em> connection, or want to change the parameters on an existing named connection. <br> <br><code>defgeneric mongo-close ( name )</code> is used to close a connection. The special keyword :all can be used to close all connections in the registry. <br> <br>  </p><h2>Opening connections</h2><p><br><br> The generic function <em>mongo</em> is used to create a connection. <em>mongo</em> takes four keywords as an argument. <em>:name</em>  is the <em>name</em> of the connection and the other three parameterkeywords <em>:host :port</em>  and <em>:db</em> are used to establish a connection to a mongodb instance.Each of the four keywords has a default value which will be used if none in provided by the caller. <br> <br> The first thing <em>mongo</em> does is see whether a connection with <em>name</em> exists in the <em>connection registry</em>. If not, it will create one with the parameters provided through the <em>:host :port</em>  and <em>:db</em> keywords. The <em>connection registry</em> is a hash table with the name as key and the mongo connection object as the value.<br> <br> The default for <em>:name</em> is the keyword <em>:default</em>. The default connection parameters can be set through the dynamic variables *mongo-default-host*, *mongo-default-port* and *mongo-default-db*. They are currently set to "localhost", the default mongo port and "admin" respectively.So the first time <code>(mongo)</code> is executed a connection using default values for the connection parameters is created. <br> <br> Unless a mongo connection is provided through the :mongo keyword,database operations, like <em>db.use</em> or <em>db.find</em> all execute (mongo) to get a connection.So a call to these functions creates default connection as a side-effect.<br> <br> There is no restriction on the <em>type</em> of the :name parameter. Typically I would expect a symbol or a unique id to be used. Strings can be used as well. Each connection is assigned a unique id. This id, in distinction to the name, cannot be changed. <br> <br> It's important to realize that if a connection name exists in the registry the associated connection object is returned and other keywords are ignored.So if a connection named <em>:alt</em> exists in the  *mongo-registry* the :port keyword in <code>(mongo :name :alt :port 12388 )</code> are ignored. In addition, the network parameters of the connection object returned by *(mongo :name <name>  )*  are read-only. The reason they're read-only is that changing these parameters doesn't really change the actual connection. I also wanted a <em>mongo</em> call to be fast. Using the <em>(mongo..)</em> method to manage connections would impact the speed of the look-up. <em>(mongo-swap..)</em>  is the method to use when you want to change connection parameters. </p><h2>Swapping connections</h2><p>Using mongo over the LISP REPL is different from say a java-script client. A java-script client is typically tied to  a specific mongo connection. Connections to multiple servers can be established using the REPL by calling <em>(mongo..)</em> with different connection parameters. However, there 's only one <em>:default</em> connection. The <em>:default</em> connection is privileged in that it is used "by default" in all other mongo database calls. You could pass in an alternative connection using the <em>:mongo</em> keyword but that obviously a bit cumbersome. The alternative is to swap connections and tie a different connection to the <em>:default</em> keyword.<br> <br><code>(mongo-swap left right)</code> provides this functionality. <em>(mongo-swap left right)</em> swaps the names of the connections and returns them in the order they were passed in. So if you want to make you're <em>:alt</em> connection the default connection, you call <code>(mongo-swap :default :alt)</code> and the <em>:alt</em> is now the referred to by the keyword <em>:default</em> and the connection previously referred to by keyword <em>:default</em> is now referred to by the keyword <em>:alt</em>.<br> <br>  An other use of <em>(mongo-swap..)</em> is to change the parameters of particular connection. Suppose you want to have the connection bound to :alt refer to a different host. You can do this :      (mongo_close (mongo-swap :alt (mongo :name :tmp :host <newhost> ))) </p><p>This will create mongo connection named :temp which is swapped with the current :alt connection and the old :alt connection object (now bound to :tmp) is closed. </p><h2>Inspecting connections</h2><p>There are two ways to take a look at the <em>connection registry</em> : </p><pre><code>  (mongo-close)  
  (show :connections) </code></pre><p>The second command is basically wrapped around the first one. <em>show</em> is a general <em>shell</em> command which takes a variety of keywords and returns database data like the server status. </p><h2>Closing connections</h2><p>Use <code>(mongo-close &lt;name&gt; )</code> to close a connection named <name> . If you pass in the keyword <em>:all</em> all connections will be closed. <em>(mongo-close..)</em> will also remove the connection from the registry. </p> ]]></content:encoded>
      <guid> http://fons.github.com/connections-in-cl-mongo.html </guid>      
      <pubDate> Sun, 28 Feb 2010 19:18:52 EST </pubDate>
    </item>
    
    <item>
      <title> Generating Documentation for cl-mongo  </title>
      <link> http://fons.github.com/generating-documentation-for-cl-mongo-.html </link>
      <description> &lt;p&gt;I want to be able to keep the documentation of &lt;a href="http://www.github.com/fon/cl-mongo"&gt;cl-mongo&lt;/a&gt; current when I am adding new features to the package.  &lt;br&gt; &lt;br&gt;   The original documentation was generated 'by hand'. I went through the code and made sure I documented the exported classes, methods and functions. This is not really satisfactory. Ideally you want the documentation and the code tightly integrated. Javadoc style documentation, which can be generated from source code annotations would be ideal. &lt;br&gt; &lt;br&gt; There are various lisp packages available which will pull out the source code comments for the public interface defined in the packages file.&lt;br&gt; One such package is Edi Weitz's &lt;a href="http://weitz.de/documentation-template/"&gt;documentation template&lt;/a&gt;. This package generates an HTML file with the API description based on embedded comments, which is exactly what I was looking for.  &lt;br&gt; The package does have its quirks/features. It doesn't seem to like embedded HTML or markdown formatting in the lisp code comments, so the API descriptions appear somewhat 'flat'. &lt;br&gt; In addition I can't define an order on the way the API components are presented and consequently things jump around a-bit.&lt;br&gt; It also hard-codes licensing information and URLs which are not appropriate for me. The way I dealt with this was to take the generated HTML file and to search and replace the licensing information and URLs.  &lt;br&gt; &lt;br&gt;    The next challenge was to integrate the HTML file generated by &lt;em&gt;documentation-template&lt;/em&gt; with the &lt;em&gt;README.md&lt;/em&gt;. This I accomplished by stripping out the HTML header and appending the resulting HTML file to a &lt;em&gt;markdown&lt;/em&gt; formatted file.&lt;br&gt; &lt;br&gt; The result doesn't look bad and more importantly gives me a way to easily keep my documentation up-to-date. &lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p>I want to be able to keep the documentation of <a href="http://www.github.com/fon/cl-mongo">cl-mongo</a> current when I am adding new features to the package.  <br> <br>   The original documentation was generated 'by hand'. I went through the code and made sure I documented the exported classes, methods and functions. This is not really satisfactory. Ideally you want the documentation and the code tightly integrated. Javadoc style documentation, which can be generated from source code annotations would be ideal. <br> <br> There are various lisp packages available which will pull out the source code comments for the public interface defined in the packages file.<br> One such package is Edi Weitz's <a href="http://weitz.de/documentation-template/">documentation template</a>. This package generates an HTML file with the API description based on embedded comments, which is exactly what I was looking for.  <br> The package does have its quirks/features. It doesn't seem to like embedded HTML or markdown formatting in the lisp code comments, so the API descriptions appear somewhat 'flat'. <br> In addition I can't define an order on the way the API components are presented and consequently things jump around a-bit.<br> It also hard-codes licensing information and URLs which are not appropriate for me. The way I dealt with this was to take the generated HTML file and to search and replace the licensing information and URLs.  <br> <br>    The next challenge was to integrate the HTML file generated by <em>documentation-template</em> with the <em>README.md</em>. This I accomplished by stripping out the HTML header and appending the resulting HTML file to a <em>markdown</em> formatted file.<br> <br> The result doesn't look bad and more importantly gives me a way to easily keep my documentation up-to-date. </p> ]]></content:encoded>
      <guid> http://fons.github.com/generating-documentation-for-cl-mongo-.html </guid>      
      <pubDate> Wed, 17 Feb 2010 19:04:03 EST </pubDate>
    </item>
    
    <item>
      <title> submitting my blog to technorati.com </title>
      <link> http://fons.github.com/submitting-my-blog-to-technoraticom.html </link>
      <description> &lt;p&gt;The purpose of this post is to claim my blog on &lt;a href="www.technorati.com"&gt;technorati&lt;/a&gt;. For that to work I need to include the claim code they provided to me in this little email     This is an automatically-generated email. &lt;/p&gt;&lt;pre&gt;&lt;code&gt;Thank you for submitting your blog claim on Technorati. Technorati will need to verify that you  
are an author of the site http://www.mohegan-skunkworks.com/ by looking for a unique code.  
We have just  assigned the claim token 2624RGHKBMNQ to this claim.  
Please visit http://technorati.com/account/ for more details, including how to use the claim token.  
 
Thank you.  
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;/quote&gt; &lt;br&gt; &lt;br&gt;  Let's see if this works. &lt;/p&gt; </description> 
      <content:encoded><![CDATA[<p>The purpose of this post is to claim my blog on <a href="www.technorati.com">technorati</a>. For that to work I need to include the claim code they provided to me in this little email     This is an automatically-generated email. </p><pre><code>Thank you for submitting your blog claim on Technorati. Technorati will need to verify that you  
are an author of the site http://www.mohegan-skunkworks.com/ by looking for a unique code.  
We have just  assigned the claim token 2624RGHKBMNQ to this claim.  
Please visit http://technorati.com/account/ for more details, including how to use the claim token.  
 
Thank you.  
</code></pre><p></quote> <br> <br>  Let's see if this works. </p> ]]></content:encoded>
      <guid> http://fons.github.com/submitting-my-blog-to-technoraticom.html </guid>      
      <pubDate> Sat, 13 Feb 2010 10:29:17 EST </pubDate>
    </item>
    
    <item>
      <title> cl-mongo </title>
      <link> http://fons.github.com/cl-mongo.html </link>
      <description> &lt;P&gt;&lt;A HREF="http://www.mongodb.org"&gt;mongo&lt;/A&gt; is a scalable, high-performance, open source, schema-free, document-oriented database. I was introduced to mongo at the &lt;A HREF="http://www.meetup.com/mysqlnyc/messages/8346727/"&gt;new-york mysql meetup&lt;/A&gt;. Two things made mongo look attractive: inter-operability and document centric storage. &lt;BR&gt; &lt;BR&gt; I'm familiar with the &lt;A HREF="http://common-lisp.net/project/elephant/"&gt;elephant&lt;/A&gt; persistence framework in lisp. However elephant objects are not readable (as far as I know) in languages other than lisp.That makes inter-operating with other platforms difficult. A traditional rdms requires some sort of schema if you want to use it effectively. Mongo on the other hand is optimized for the the kind of free form document storage I'm looking for. &lt;BR&gt; &lt;BR&gt; Mongo comes with set of drivers but was missing was a lisp driver. This looked like a good project to get better acquainted with lisp and mongo.&lt;BR&gt; &lt;BR&gt; So I set out to try to write one and the result is &lt;A HREF="http://github.com/fons/cl-mongo"&gt;cl-mongo&lt;/A&gt;. At this stage it's close to having the capabilities I'm looking for, but &lt;A HREF="http://github.com/fons/cl-mongo"&gt;cl-mongo&lt;/A&gt; is obviously a work in progress. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P><A HREF="http://www.mongodb.org">mongo</A> is a scalable, high-performance, open source, schema-free, document-oriented database. I was introduced to mongo at the <A HREF="http://www.meetup.com/mysqlnyc/messages/8346727/">new-york mysql meetup</A>. Two things made mongo look attractive: inter-operability and document centric storage. <BR> <BR> I'm familiar with the <A HREF="http://common-lisp.net/project/elephant/">elephant</A> persistence framework in lisp. However elephant objects are not readable (as far as I know) in languages other than lisp.That makes inter-operating with other platforms difficult. A traditional rdms requires some sort of schema if you want to use it effectively. Mongo on the other hand is optimized for the the kind of free form document storage I'm looking for. <BR> <BR> Mongo comes with set of drivers but was missing was a lisp driver. This looked like a good project to get better acquainted with lisp and mongo.<BR> <BR> So I set out to try to write one and the result is <A HREF="http://github.com/fons/cl-mongo">cl-mongo</A>. At this stage it's close to having the capabilities I'm looking for, but <A HREF="http://github.com/fons/cl-mongo">cl-mongo</A> is obviously a work in progress. </P><P><BR>Mongo stores documents in a collection in a database. Internally, as far as I can tell from the protocol, the combination of database and collection name make up a unique namespace on the server in which documents can be stored.<BR> <BR> Each document itself is a set of key-value pairs, with the keys typed as utf8 encoded strings. The value side supports a variety of types, from the usual primitives (float, int, Boolean) up to regular expressions and code.   <BR> <BR> A document has a unique id, associated with reserved key word "_id". If this id is not client generated the server will provide one. The server supports a whole host of database commands. These are also structured as sets of key-value pairs, but such commands and their response aren't necessarily documents. <BR> <BR> One of the main design ideas is to recognize different layers at which the communication operates: </P><UL><LI>layer 0 : byte-level at which applications exchange data.</LI><LI>layer 1 : mongo serialization protocol and mongo specific types.</LI><LI>layer 2 : native code types.</LI><LI>layer 3 : mongo container types and operations on them.</LI><LI>layer 4 : syntactic sugar.</LI></UL><P>An example of a mongo specific type is mongo's representation of an array as a collection of key value pairs with the keys being stringified indexes like so:<BR>
<CODE> { '1' : elem1 } , { '2' : elem2 } </CODE>  An obvious choice for it's native code type counterpart is the list. <BR> <BR>  Since mongo documents are collections of key-value pairs an associative array like a hash-table serves quite well as a mongo container type. In addition to the hash-table there is a mongo document which is just a hash table with a unique id attached. <BR> <BR>  As far as the basic api was concerned I wanted to stay as close as possible to the db shell for the java-script client, as detailed in the mongo documentation. That way it's possible to use the commands from the mongo reference manual when using cl-mongo from the repl. </P> ]]></content:encoded>
      <guid> http://fons.github.com/cl-mongo.html </guid>      
      <pubDate> Tue, 02 Feb 2010 18:59:51 EST </pubDate>
    </item>
    
    <item>
      <title> Fourth NYSA Machine Learning Seminar </title>
      <link> http://fons.github.com/fourth-nysa-machine-learning-seminar.html </link>
      <description> &lt;P&gt;Friday I attended the 4&lt;SUP&gt; th&lt;/SUP&gt;  Machine Learning Symposium organized by the New York Academy of Sciences &lt;A HREF="http://www.nyas.org/"&gt;(NYSA)&lt;/A&gt;.  &lt;BR&gt; &lt;BR&gt;  The &lt;A HREF="http://www.nyas.org/events/Detail.aspx?cid=533f8dfe-d778-4c52-ba1b-3241bc9c8ca2"&gt;Symposium program&lt;/A&gt; consisted of four main talks given by local experts in the area of machine learning, interspersed with four graduate student talks, a poster session and a terrific lunch. &lt;BR&gt; &lt;BR&gt;  Since I'm not really hindered by any overwhelming expertise in this area I'll confine my self to a few breezy impressions of the main talks.  &lt;BR&gt; &lt;BR&gt;  The first one was given by Bob Bell, from AT&amp;T Bell Labs and a member of the team which won the &lt;A HREF="http://www.netflixprize.com/"&gt;Netflix prize.&lt;/A&gt;&lt;BR&gt; &lt;BR&gt;  &lt;/P&gt;&lt;P&gt;What made the contest challenging was not only the huge size of the data set  but also the fact that 99 % of the data was missing. In addition there were significant differences between training and test data. Regardless of whether a 10 % improvement of a movie rating system should be worthy of a million dollar prize, it provided a great way to test classifiers against real world data.&lt;BR&gt; &lt;BR&gt;  One thing that stood out for me  was that a relative small amount of users was responsible for 'most' of the ratings. He mentioned that they identified one user responsible for 5400 ratings on one particular day. This &lt;EM&gt;could&lt;/EM&gt; an data error on the Netflix side, where the time stamp was somehow misapplied. On the other hand it sounds like someone was trying to deliberately affect a large swath of ratings.  &lt;BR&gt; &lt;BR&gt;  The final classifier incorporated breakthroughs made by different teams in the earlier stages of this multi-year competition.One such breakthrough was to consider the previous genres of the movies someone has rated to determine future recommendations. That must seem rather obvious in retrospect. The other was a clever way called &lt;A HREF="http://portal.acm.org/citation.cfm?id=1557072"&gt;Collaborative Filtering&lt;/A&gt; which takes into account the time-dependency of people's movie preferences. &lt;BR&gt; &lt;BR&gt;  An ensemble of previously validated classifiers was used to construct the final classifier and the calculation to get the final result submitted to Netflix took almost a month, primarily because a power failure forced a restart of the calculation engine. In fact the use of an ensemble of classifiers of mentioned as one of the main lessons learned from the contest. The other was the power matrix factorization (i.e. treating users and preferences as independent parameters and using matrix to link the two) as a computational tool. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>Friday I attended the 4<SUP> th</SUP>  Machine Learning Symposium organized by the New York Academy of Sciences <A HREF="http://www.nyas.org/">(NYSA)</A>.  <BR> <BR>  The <A HREF="http://www.nyas.org/events/Detail.aspx?cid=533f8dfe-d778-4c52-ba1b-3241bc9c8ca2">Symposium program</A> consisted of four main talks given by local experts in the area of machine learning, interspersed with four graduate student talks, a poster session and a terrific lunch. <BR> <BR>  Since I'm not really hindered by any overwhelming expertise in this area I'll confine my self to a few breezy impressions of the main talks.  <BR> <BR>  The first one was given by Bob Bell, from AT&amp;T Bell Labs and a member of the team which won the <A HREF="http://www.netflixprize.com/">Netflix prize.</A><BR> <BR>  </P><P>What made the contest challenging was not only the huge size of the data set  but also the fact that 99 % of the data was missing. In addition there were significant differences between training and test data. Regardless of whether a 10 % improvement of a movie rating system should be worthy of a million dollar prize, it provided a great way to test classifiers against real world data.<BR> <BR>  One thing that stood out for me  was that a relative small amount of users was responsible for 'most' of the ratings. He mentioned that they identified one user responsible for 5400 ratings on one particular day. This <EM>could</EM> an data error on the Netflix side, where the time stamp was somehow misapplied. On the other hand it sounds like someone was trying to deliberately affect a large swath of ratings.  <BR> <BR>  The final classifier incorporated breakthroughs made by different teams in the earlier stages of this multi-year competition.One such breakthrough was to consider the previous genres of the movies someone has rated to determine future recommendations. That must seem rather obvious in retrospect. The other was a clever way called <A HREF="http://portal.acm.org/citation.cfm?id=1557072">Collaborative Filtering</A> which takes into account the time-dependency of people's movie preferences. <BR> <BR>  An ensemble of previously validated classifiers was used to construct the final classifier and the calculation to get the final result submitted to Netflix took almost a month, primarily because a power failure forced a restart of the calculation engine. In fact the use of an ensemble of classifiers of mentioned as one of the main lessons learned from the contest. The other was the power matrix factorization (i.e. treating users and preferences as independent parameters and using matrix to link the two) as a computational tool. </P><P><BR><BR>Avrim Blum, from Carnegie Mellon followed with a discussion of a new clustering method he has discovered. Unsupervised learning is obviously in important area of machine learning. We don't alway have the benefit of fully analyzed training data. It would be a real breakthrough to have data 'speak for itself' in a meaningful way, without a priori constraints. Humans (or perhaps all animals in general) are good at classifying data across large conceptual categories. We obviously have no problem taking a pile of articles and splitting them into different piles based just on a few keywords.<BR><BR>One approach to detect clusters using a machine would be to find a point which is at a  minimum distance from a set of data points. Clearly there would be many such points and each would correspond to the center of gravity of a cluster. Computationally this is a very hard problem(think the speaker mentioned it was in fact <A HREF="http://en.wikipedia.org/wiki/NP-hard">NP-hard.</A><BR><BR>     <BR>
 From what understood (which is always a good qualification) Blum's work states that large data sets can always be broken down into clusters by considering a characteristic distance d<SUB> crit</SUB> ,  such that all points within d<SUB> crit</SUB>  will belong to the same cluster and points belonging to the another cluster will be around 5 × d<SUB> crit</SUB>  away. This would partition the data set into (say) two clusters. You can break these down also, using the same algorithm (but obviously with different values of d<SUB> crit</SUB> ). This way you end up with a hierarchy of clusters. <BR> <BR>   Ok, all done right ? Just tell me how to find d<SUB>crit</SUB> and I'm on may way to construct the giant cluster break down of the universe. Unfortunately, prof Blum was silent on the actual construction of d<SUB>crit</SUB>. So I think his result is that if we <EM>have</EM> evidence of a characteristic measure along the lines describes above, we in fact have a robust partitioning of the data space. That's an not insignificant result. In addition I think it shouldn't be too difficult to take his work and apply it to a large corpus of text and try to establish d<SUB>crit</SUB> for it. <BR><BR><BR>
 After lunch and the poster session, <A HREF="http://svmlight.joachims.org">Thorsten Joachims</A> from Cornell continued with a talk on the application of Support Vector Machines (SVM's) for predicting structured outputs. SVM's are a form of supervised learning where a characteristic data set is used to 'train' a classifier. A simple classifier would label data in only one of two ways. In this talk, SVM's are used to classify data across multiple categories. Using such an approach the results of ambiguous search terms for search engines (like SVM or Windows) could be grouped into categories of similar results. The breakthrough here I believe is a way to handle the computational complexity inherent in the use of SVM's for multi-classification. In addition the algorithm is structured such that only domain specific pieces need to be plugged in. <BR> <BR>   The last talk of the day was by Phil Long from Google gave on "On noise-tolerant learning using Linear classifiers". The speaker's uncompromising mathematical rigor obscured somewhat the obvious practical implications of his research, at least for me. On the other, the applause at the end of his talk appeared to me at least to be almost as boisterous as it had been sedate for previous speakers.This leaves me with the impression that at least the rest of the audience had thoroughly enjoyed his talk.<BR> <BR>  What I was able to rescue from my notes and memory was that noise in data can be identified by assuming that the distribution of noisy data points is in fact either not random, or does not follow the same distribution as the real data (so called malicious noise). In addition boosting schemes like AdaBoost, which focus on misclassified data can be very sensitive to noise. <BR> <BR>  All in all not a bad way to spent time away from the office while watching Yankee fan's lay siege to downtown Manhattan. </P> ]]></content:encoded>
      <guid> http://fons.github.com/fourth-nysa-machine-learning-seminar.html </guid>      
      <pubDate> Sat, 07 Nov 2009 09:49:51 EST </pubDate>
    </item>
    
    <item>
      <title> Embedding Equations in a Blog Post </title>
      <link> http://fons.github.com/embedding-equations-in-a-blog-post.html </link>
      <description> &lt;P&gt;I'm using a &lt;A HREF="http://github.com/fons/cl-bliky"&gt;home-grown blogging engine&lt;/A&gt; which converts pages formatted in &lt;A HREF="http://daringfireball.net/projects/markdown/"&gt;markdown&lt;/A&gt; to static html pages served from my &lt;A HREF="http://github.com/fons/fons.github.com"&gt;github account.&lt;/A&gt; If I want to include mathematical equations in my blog post my options are to use inline html code or to use one of the online &lt;A HREF="http://www.latex-project.org/"&gt;Latex&lt;/A&gt;&lt;A HREF="http://www.google.com/search?q=online+latex+equation+editor"&gt; equation editors.&lt;/A&gt;&lt;BR&gt; &lt;BR&gt; My requirements are simple: I want to be able to use the usual cast of mathematical symbols inlined in my main text as well format large equation blocks.  In this post I'll compare and contrast inline html with online editors provided by &lt;A HREF="http://www.sitmo.com/latex/"&gt;SITMO&lt;/A&gt;, &lt;A HREF="http://www.codecogs.com/components/equationeditor/equationeditor.php"&gt;CodeCogs&lt;/A&gt; and &lt;A HREF="http://www.texify.com/"&gt;Textify&lt;/A&gt;. &lt;BR&gt; &lt;BR&gt; To save you the trouble of having to wade through miles of text, I'll start of with my &lt;/P&gt;&lt;H2&gt;Conclusions&lt;/H2&gt;&lt;P&gt;Use HTML for inlining symbols and equations. Although those will never look as good in html as in Latex, the overall format of your text will suffer less.&lt;BR&gt; &lt;BR&gt; For equation blocks you have the choice between two online editors : &lt;A HREF="http://www.sitmo.com/latex/"&gt;SITMO&lt;/A&gt;, &lt;A HREF="http://www.codecogs.com/components/equationeditor/equationeditor.php"&gt;CodeCogs&lt;/A&gt;. When it comes to embedding latex code into your html both editors are comparable. When you're looking for more variety with regard to fonts or markup languages CodeCogs is your only alternative.&lt;BR&gt; &lt;BR&gt; If you want to use the url link to embed a large code block you probably would want to use a url shortner. &lt;A HREF="" HTTP:="http:" TINY.URL="tiny.url"&gt;Tiny url&lt;/A&gt; is a good option here.As an alternative both editors can generate a png image for you to embed.&lt;BR&gt; &lt;BR&gt;&lt;A HREF="" HTTP:="http:" WWW.TEXIFY.COM="www.texify.com" LINKS.PHP="links.php"&gt;Textify&lt;/A&gt; has a nice clean interface, but I can't embed the links it generates. My only alternative would appear to be to run my own instance of this service which is obviously not something I need or want to do. Furthermore, although it cleverly provides a shortened url, it uses &lt;A HREF="" HTTP:="http:" BIT.LY="bit.ly"&gt;bit.ly&lt;/A&gt; which unfortunately doesn't handle complex latex url's well.. &lt;BR&gt; &lt;BR&gt; Read on to find out how I reached these conclusions . &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>I'm using a <A HREF="http://github.com/fons/cl-bliky">home-grown blogging engine</A> which converts pages formatted in <A HREF="http://daringfireball.net/projects/markdown/">markdown</A> to static html pages served from my <A HREF="http://github.com/fons/fons.github.com">github account.</A> If I want to include mathematical equations in my blog post my options are to use inline html code or to use one of the online <A HREF="http://www.latex-project.org/">Latex</A><A HREF="http://www.google.com/search?q=online+latex+equation+editor"> equation editors.</A><BR> <BR> My requirements are simple: I want to be able to use the usual cast of mathematical symbols inlined in my main text as well format large equation blocks.  In this post I'll compare and contrast inline html with online editors provided by <A HREF="http://www.sitmo.com/latex/">SITMO</A>, <A HREF="http://www.codecogs.com/components/equationeditor/equationeditor.php">CodeCogs</A> and <A HREF="http://www.texify.com/">Textify</A>. <BR> <BR> To save you the trouble of having to wade through miles of text, I'll start of with my </P><H2>Conclusions</H2><P>Use HTML for inlining symbols and equations. Although those will never look as good in html as in Latex, the overall format of your text will suffer less.<BR> <BR> For equation blocks you have the choice between two online editors : <A HREF="http://www.sitmo.com/latex/">SITMO</A>, <A HREF="http://www.codecogs.com/components/equationeditor/equationeditor.php">CodeCogs</A>. When it comes to embedding latex code into your html both editors are comparable. When you're looking for more variety with regard to fonts or markup languages CodeCogs is your only alternative.<BR> <BR> If you want to use the url link to embed a large code block you probably would want to use a url shortner. <A HREF="" HTTP:="http:" TINY.URL="tiny.url">Tiny url</A> is a good option here.As an alternative both editors can generate a png image for you to embed.<BR> <BR><A HREF="" HTTP:="http:" WWW.TEXIFY.COM="www.texify.com" LINKS.PHP="links.php">Textify</A> has a nice clean interface, but I can't embed the links it generates. My only alternative would appear to be to run my own instance of this service which is obviously not something I need or want to do. Furthermore, although it cleverly provides a shortened url, it uses <A HREF="" HTTP:="http:" BIT.LY="bit.ly">bit.ly</A> which unfortunately doesn't handle complex latex url's well.. <BR> <BR> Read on to find out how I reached these conclusions . </P><H2>HTML</H2><P>HTML supports <A HREF="http://www.chami.com/tips/internet/050798i.html">special characters</A> as well as <A HREF="http://cat.xula.edu/tutorials/html/subandsup">subscripts and superscripts.</A> . Let's give it a shot, starting with embedded equations. </P><H4>Embedding</H4><P>Here's an example of the use of HTML to embed equations and symbols. I'll be using the same example through out.<BR> <BR> </P><P> ... a classification error &#949;<SUB>i</SUB>  is the average of all instances of the test data where h<SUB> t</SUB> (x<SUB> i</SUB> ) &#8800; y<SUB> i</SUB> . This yields the ratio <SUP> n</SUP>  &#8260; <SUB>  N </SUB>  ..... where we integrate &#8747; <SUB> 0</SUB> <SUP>&#960;</SUP>... </P>  <H4>Block Equation</H4><P>This is the latex of the block equation I'm going to try to render in html : </P><PRE><CODE>    D_{t+1} = \frac{D_{t}}{Z_{t}} \times  
                            \begin{cases}  
                                   & e^{-\alpha_{t} }   \text{ if }  y_{i} = h_{t}(x_{i}) \\  
                                   & e^{\alpha_{t} }   \text{ if }  y_{i} \neq  h_{t}(x_{i})  
                            \end{cases}  
                                        = \frac{D_{t}}{Z_{t}} \times e^{(\alpha_{t} y_{i} h_{t}(x_{i}))}  
</CODE></PRE><P><BR><BR> It's the equation used in the <A HREF="http://en.wikipedia.org/wiki/AdaBoost">AdaBoost</A> algorithm. <BR> Here's what the equation looks like when I try to render it directly in HTML : <BR><BR>
</P><P>      D<SUB> t+1</SUB>  = <SUP> D<SUB> t</SUB> </SUP>  &#8260; <SUB>  Z<SUB> t</SUB>  </SUB>  × {<SUP>  e <SUP>  (&#945; <SUB> t</SUB> )</SUP>   if y<SUB> i</SUB>  = h<SUB> i</SUB> (x<SUB> i</SUB> )</SUP>  <SUB>  e <SUP>  (-&#945; <SUB> t</SUB> )</SUP>   if y<SUB> i</SUB>  &#8800; h<SUB> i</SUB> (x<SUB> i</SUB> )  </SUB>  = <SUP> D<SUB> t</SUB> </SUP>  &#8260; <SUB>  Z<SUB> t</SUB>  </SUB>  × e<SUP> ( &#945; <SUB> t</SUB> y<SUB> i</SUB>  h<SUB> t</SUB> (x<SUB> i</SUB>  ))</SUP>  </P> <BR> <BR>  <H4>Conclusion</H4><P>For quick inline equations html is certainly good enough. For block equations the html rendered version does not nearly look as good as the Latex one. In the original text the html tags start to overwhelm the content and it becomes hard to see the mathematical trees in the html forest. This makes correcting or updating complicated code blocks difficult. </P><H2><A HREF="http://www.sitmo.com/latex/">SITMO</A></H2><P>According to it's web site is a 'quant' company located in Delft, The Netherlands. The company provides an easy to use latex editor which it  also makes available as a Google gadget. </P><P>You have to option bteween downloading an image or of embedding links to the rendered latex directly into your html. </P><H4>Embedding</H4><P>Typically you can embed the link provided by <A HREF="http://www.sitmo.com/latex/">SITMO</A> straight into your html, like so : </P><PRE><CODE>    <IMG SRC=" http://www.sitmo.com/gg/latex/latex2png.2.php?z=100&eq=\int_{0}^{\pi}%20"></CODE></PRE><P>This is what I've done below in order to inline special symbols and simple equations : <BR> <BR>  </P><P>    ... a classification error <IMG SRC="http://www.sitmo.com/gg/latex/latex2png.2.php?z=100&eq=\epsilon_{i}">  is the average of all instances of the test data where <IMG SRC="http://www.sitmo.com/gg/latex/latex2png.2.php?z=100&eq=h_{t}%20\neq%20y_{i}"> . This yields a ratio <IMG SRC="http://www.sitmo.com/gg/latex/latex2png.2.php?z=80&eq=\frac{n}{N}">  .... where we integrate <IMG SRC="http://www.sitmo.com/gg/latex/latex2png.2.php?z=100&eq=\int_{0}^{\pi}%20"> ... </P><H4>Block Equation</H4><P>First I tried to generate a link just as I did for the inline symbols and equations. Here's the url generated by the editor : <BR>  </P><P><CODE>  http://www.sitmo.com/gg/latex/latex2png.2.phpz=100&eq=%20D_ %20%3D%20\frac  0}  0}%20\times%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20\begin %20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%26%20e^ %20 0}%20%20%20\text %20%20y_ %20%3D%20h_ (x_ )%20\%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%26%20e^ %20 0}%20%20%20\text %20%20y_ %20\neq%20%20h_ (x_ )%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20\end %20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3D%20\frac  0}  0}%20\times%20e^ %20y_ %20h_ (x_ )) 0}%0A </CODE> </P> <BR>
<P>This obviously needs some serious url-shorting. First up is New York favorite <A HREF="" HTTP:="http:" BIT.LY="bit.ly">bit.ly</A> which shortens the url to : http://bit.ly/x6uke. <BR> <BR>  </P><P><IMG SRC="http://bit.ly/x6uke"></P>  Obviously we've lost some accuracy here. <BR> <BR>  Next up is the link generated by venerable <A HREF="" HTTP:="http:" TINYURL.COM="tinyurl.com">tinyurl</A> which shortens the url to a manageable http://tinyurl.com/yjx54h6 : <BR> <BR>   <P><IMG SRC="http://tinyurl.com/yjx54h6"></P><BR> <BR>  No complaints here. <BR> <BR>  The <A HREF="http://www.sitmo.com/latex/">SITMO editor</A> can also be used to generate a png image, which you then can embed yourself. Here's the result : <BR> <BR>  <P><IMG SRC="http://github.com/fons/blog-images/raw/master/20091003/sitmo-adaboost-equation.png"></P><BR> <BR>  <H4>Conclusion</H4><P>Take a look at the embedded symbols and equations for this section. If you're a stickler for consistent layout using image links to inline equations creates a few challenges. You need to handle the integration with your background color as well as the alignment within the main text. I found the use of embedded links more cumbersome than using inlined html because these embedded links don't integrate well with the overall layout without tweaking. On the other hand the rendered latex is looks better than the embedded html, and using both html (for inline equations) and latex (for blocks) in your post may create it's own formatting discontinuities. <BR> <BR>  </P><P>The block equation is rendered beautifully by the Lates editor.  The challenge is dealing with the ungodly long url if you want to embed the link generated by the SITMO editor. Clearly, not all bit shortners are able to handle this. The alternative is to embed an image in your document. </P><H2><A HREF="" HTTP:="http:" WWW.CODECOGS.COM="www.codecogs.com" COMPONENTS="components" EQUATIONEDITOR="equationeditor" EQUATIONEDITOR.PHP="equationeditor.php">CodeCogs</A></H2><P>This editor is provided by <A HREF="http://www.codecogs.com/">CodeCogs engineering</A>. Please consult the <A HREF="http://www.codecogs.com/components/equationeditor/equation_install.php">install page</A>  for more information. This editor seems to be quite popular on the web. CodeCogs provides an equation editor which generates a url on their website you can embed or a gif image you can host yourself. CodeCogs provides a variety of formats like html snippets, a url and code to embed into wiki pages, among other things. <BR>  Here's an example of an html snippit : </P><P><CODE>  <A HREF="http://www.codecogs.com/eqnedit.php?latex=\epsilon_{i}" TARGET="_blank"><IMG SRC="http://latex.codecogs.com/gif.latex?\epsilon_{i}" TITLE="\epsilon_{i}"></A></CODE></P><H4>Embedding</H4><P>Embedding the url provided by  <A HREF="" HTTP:="http:" WWW.CODECOGS.COM="www.codecogs.com" COMPONENTS="components" EQUATIONEDITOR="equationeditor" EQUATIONEDITOR.PHP="equationeditor.php">CodeCogs</A> into your html is straightforward : </P><PRE><CODE>    <IMG SRC="http://latex.codecogs.com/gif.latex?\epsilon_{i}"></CODE></PRE><P>This generates a default image size of 110 dpi. When you compare this to the SITMO url you'll notice that it doesn't have a size tag. CodeCogs does allow you to vary the size through a drop box . This adds a dpi tag to the url : </P><PRE><CODE>    <IMG SRC="http://latex.codecogs.com/gif.latex?\300dpi&space;\epsilon_{i}"></CODE></PRE><P><BR><BR>  Here's how the embedded equations are rendered, using image url's : <BR> <BR>  </P><P>    ... a classification error <IMG SRC="http://latex.codecogs.com/gif.latex?\epsilon_{i}">  is the average of all instances of the test data where <IMG SRC="http://latex.codecogs.com/gif.latex?h_{t}&space;\neq&space;y_{i}"> . This yields a ratio <IMG SRC="http://latex.codecogs.com/gif.latex?\frac{n}{N}">  .... where we integrate <IMG SRC="http://latex.codecogs.com/gif.latex?\int_{0}^{\pi}"> ... </P><BR> <BR>  <H4>Block Equation</H4><P>I would have liked to have been able to test the HTML snippet generated by the CodeCogs editor. However, as I mentioned in the intor, my <A HREF="" HTTP:="http:" GITHUB.COM="github.com" FONS="fons" CL-BLIKY="cl-bliky">blog engine</A> runs pages through a <A HREF="" HTTP:="http:" COMMON-LISP.NET="common-lisp.net" PROJECT="project" CL-MARKDOWN="cl-markdown">mark down transformer</A>  and chokes on the html generated by CodeCogs. <BR> <BR>  As an alternative I've run the url through two bit shortners. Here's  the rendering using  the url generated by <A HREF="" HTTP:="http:" BIT.LY="bit.ly">bitly</A> 'http://bit.ly/aKH31' : <BR> <BR>  </P><P><IMG SRC="http://bit.ly/aKH31"></P><BR> <BR>  Again, something obviously got lost in translation. <BR> <BR>  Here's the the rendering from the link generated by <A HREF="" HTTP:="http:" TINYURL.COM="tinyurl.com">tinyurl</A> : 'http://tinyurl.com/yhv9xqx' <BR> <BR>  <P><IMG SRC="http://tinyurl.com/yhv9xqx"></P><BR> <BR>  <P>CodeCogs can also be used to generate an image file. You can choose between several image formats, like png, gif, pdf and others and several fonts. By the way, this applies to the url as well. The image below is generated in comic sans serif at 150 dpi : <BR> <BR>   </P><P><IMG SRC="http://github.com/fons/blog-images/raw/master/20091003/codecogs-adaboost-equation.png"></P><BR> <BR>  <H4>Conclusions</H4><P>I found the <A HREF="http://www.codecogs.com/components/equationeditor/equationeditor.php">CodeCogs editor</A> easy to use. It allows equations to be saved in a wide variety of image formats, sizes and fonts and provides code to embed them in a variety of markup formats. <BR> <BR>  </P><P>My observations about inlining symbols when discussing the SITM editor apply here as well. <BR> <BR>  I was unable to test the html snippet generated for the block equation, due to limitations (bugs !) in my blog engine. Nevertheless, running the url provided by the editor through a bit shortner provides a good alternative. In addition, you can save an image of the equation ins a variety of formats. </P><H2><A HREF="" HTTP:="http:" WWW.TEXIFY.COM="www.texify.com" LINKS.PHP="links.php">Textify</A></H2><P>Textify is a product designed by <A HREF="http://www.forkosh.com/">John Forkosh Associates, Inc.</A> Textify does not provide a Latex editor so it  assumes at least some Latex proficiency on the user's part. In fact, it's web page can best be described as no-frills, which I happen to think of as a complement. </P><P>Ok, so let's get to it. </P><H4>Embedding</H4><P>Textify provides results that can be used  in web pages, email or forums. Whereas the previous two services provide a web based api, Textify  generates an html snippet around a gif, which can be embedded in your web page : </P><PRE><CODE>     <IMG ALT="\epsilon_{i}" SRC="http://www.texify.com/img/%5CLARGE%5C%21%5Cepsilon_%7Bi%7D.gif" ALIGN="center" BORDER="0"></CODE></PRE><P>There are two alternative sizes for the image provided on the page. You don't have the ability to customize the size or font the image. <BR>  Let's see how it works out : <BR> <BR>  </P><P>    ... a classification error <IMG ALT="\epsilon_{i}" SRC="http://www.texify.com/img/%5CLARGE%5C%21%5Cepsilon_%7Bi%7D.gif" ALIGN="center" BORDER="0">  is the average of all instances of the test data where .......... </P><BR> <BR>  Ok, ooopsee.. As an alternative, I tried to use the other image files that were generated : <BR> <BR>  <P>    ... a classification error <IMG SRC="http://www.texify.com/img/%5Cnormalsize%5C%21%5Cepsilon_%7Bi%7D.gif">  is the average of all instances of the test data where <IMG>  This yields a ratio <IMG> ... where we integrate <IMG>  ... </P>  <BR> <BR>  Clearly not very useful for my purposes, but it does allow you to use links in emails or google docs. <H4>Block Equation</H4><P>Textify automatically generates shortened url for use in twitter. Unfortunately, it's using <A HREF="" HTTP:="http:" BIT.LY="bit.ly">bit.ly</A> which proves problematic : </P><P>The shortened url used here is 'http://bit.ly/1Jcm0e' and it exhibits the same pathologies I mentioned above for SITMO and CodeCogs. </P><H4>Conclusion</H4><P>I like the clean look of Textify. Unfortunatly, I can't use it as use of the server is restricted. In addition, providing shortened urls for equations is clearly a smart thing, but bit.ly seems doesn't seem to handle the latex formatting urls properly. <BR>  <BR>  </P> ]]></content:encoded>
      <guid> http://fons.github.com/embedding-equations-in-a-blog-post.html </guid>      
      <pubDate> Fri, 06 Nov 2009 06:32:19 EST </pubDate>
    </item>
    
    <item>
      <title> Toy Problem: Simple One Dimensional Least Squares Learner. </title>
      <link> http://fons.github.com/toy-problem-simple-one-dimensional-least-squares-learner.html </link>
      <description> &lt;P&gt;In chapter two of Hastie, Tibshirani and Friedman 's  &lt;A HREF="http://www-stat.stanford.edu/~tibs/ElemStatLearn/"&gt;'The Elements of Statistical Learning'&lt;/A&gt;  the authors discuss the use of least- squares regression to construct a data classifier for linearly separable data. &lt;BR&gt; &lt;BR&gt;  A set of training data together with the least-squares method is used to construct a hyper-plane in the data space. The classification of a data point depends on what side of the hyper-plane you end up on.&lt;BR&gt; &lt;BR&gt;  The example in Hastie uses two data classes in a two dimensional parameter space. I didn't grok the the example immediately, and I thought it would be helpful to try to construct my own much simpler example by staying in one dimension and using a simple normal distribution. The rest of this post describes the details. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>In chapter two of Hastie, Tibshirani and Friedman 's  <A HREF="http://www-stat.stanford.edu/~tibs/ElemStatLearn/">'The Elements of Statistical Learning'</A>  the authors discuss the use of least- squares regression to construct a data classifier for linearly separable data. <BR> <BR>  A set of training data together with the least-squares method is used to construct a hyper-plane in the data space. The classification of a data point depends on what side of the hyper-plane you end up on.<BR> <BR>  The example in Hastie uses two data classes in a two dimensional parameter space. I didn't grok the the example immediately, and I thought it would be helpful to try to construct my own much simpler example by staying in one dimension and using a simple normal distribution. The rest of this post describes the details. </P><H2>Data Classes and Least Squares</H2><P>My data points are generated by two <A HREF="http://en.wikipedia.org/wiki/Normal_distribution">normal distributions.</A> with means on the interval [0,1].<BR> <BR> Class 1 is classified as -1 and class 2 as 1. In my example, class 1 will always have the smaller mean, and hence will lie to the left of class 2. As in  Hastie <A HREF="http://en.wikipedia.org/wiki/Linear_regression">linear regression.</A> is used to find a linear classifier for a set of training data for these two classes.<BR> <BR>  The data space is one-dimensional and the classification boundary will be a point on the interval [0,1] somewhere between the means of the two data classes. For a one dimensional data space linear regression models the classification response function as a line:     y = Mx + Q </P><P>The RSS is the sum of the squares of the difference between the actual and response predicted by the LSM equation : </P><PRE><CODE>RSS = SUM ( (y_obs - y_calc) ^2 ) </CODE></PRE><P>LSM tries to find the slope M and intercept Q such that the responses  generated by the least-squares model (LSM) have the smallest 'residual sum of squared errors' (RSS). </P><P><EM>x</EM> represents the input and is going to be generated by the normal distributions associated with one of the two data classes. <EM>y</EM> represents the response and the only values are -1 or 1, depending on whether <EM>x</EM> was generated by the first or second class distribution respectively. </P><P>The observed classifications, <EM>y_obs</EM>, can have only one of two values : -1 or 1. The values generated by the LSM, <EM>y_calc</EM>, range from Q (for <EM>x</EM> = 0) to M + Q (for <EM>x</EM> = 1). </P><P>The LSM separates the space into two pieces, one for one class and one for the other. In this case the data space is the line piece [0,1], and the 'hyper-plane' separating that space is where the linear regression line crosses the data space, yielding a classification boundary of : </P><PRE><CODE> S = -Q / M </CODE></PRE><P>The LSM estimate of the classifier is then : </P><PRE><CODE>x &lt; S -&gt; class 1  
x &gt; S -&gt; class 2  
</CODE></PRE><H2>A simple R script</H2><P>I've put together a quick <A HREF="http://www.r-project.org/">R script</A>  implementing these ideas. </P><P>This [R script]( http://github.com/fons/blog-code/blob/master/1d-classification-toy/sl-1d-regression.R) has two main functions : <EM>train</EM> and <EM>sim</EM>. The script is loaded on the R command prompt like so: </P><PRE><CODE>source('sl-1d-regression.R') </CODE></PRE><P>The <EM>train</EM> function is used to generate a graph showing two data classes as well as the line generated by the LSM separating both classes. The example discussed below was generated using train as follows : </P><PRE><CODE>&gt; train(20, 0.35, 0.15, 0.65, 0.15)  
  coefficients :  -1.916889  3.973720  
  classifier :  0.4824  
  prob of misclassification of class 1 :  0.16  
  prob of misclassification of class 2 :  0.16  
 
</CODE></PRE><P>The <EM>sim</EM> function returns a vector of monte-carlo simulations of the separation boundary. For example, here's  10 values of the boundary generated by <EM>sim</EM> : </P><PRE><CODE>&gt; sim(10, 0.35, 0.15, 0.65, 0.15)  
  (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept)  
   0.4495022   0.5185030   0.4978533   0.5887321   0.4547277   0.5029925  
  (Intercept) (Intercept) (Intercept) (Intercept)  
   0.4792027   0.5665940   0.4952061   0.5337139 </CODE></PRE><P>The following plots the density function for 400 values of the classification boundary, for the same class parameters as the <EM>train</EM> example above. </P><PRE><CODE>plot(density(sim(400, 0.15, 0.35, 0.65, 0.15)), xlim=c(0,1), xlab="", ylab="")  
</CODE></PRE><P>Obviously, since these  are monte-carlo simulations, subsequent runs are going to be different as the random values generated by <EM>rnorm</EM> are going to be different each time the function is run. </P><H2>Classification Results</H2><P><A NAME="narrow1d"><IMG SRC="http://github.com/fons/blog-images/raw/master/20091101/narrow1d.png"></A></P><P><A HREF="" NARROW1D="narrow1d">This graph</A> shows two classes and the resulting least-squares model. Both distributions have the same standard deviation of 0.15. One class marked the with red inverted triangles has a mean of 0.35.  The other class is marked with the blue squares and has a mean of 0.65. The size of each training class is 20. </P><P>Each class shows up twice. Once as part of the x-axis which is the data space for this problem. The second time I show them as data points (x,y) used in the linear regression. Obviously the red triangles all have y = -1, and the blue squares all have y = 1. </P><P>As you can see, there is some overlap between the two data sets, because of the variance of the normal distribution. </P><P>The green line represents the best fit of the data points according to the least-squares model (LSM). The classification boundary is the point where the green line crosses the x-axis. As you can see that's somewhere around 0.5. </P><P>You would expect 0.5 to be a good estimate of the classification boundary S because the normal distribution is symmetric around the mean and both classes are equi-distant from 0.5. </P><P>The R-script generates the probability of misclassification defined as the probability of ending up on the wrong side of the estimated classification boundary : </P><PRE><CODE> S_est = 0.5 * (mean_1 + mean_2)  
 
 P_mis_classified = Prob( x &gt; S_est) for class 1  
 		      = Prob( x &lt; S_est) for class 2  
</CODE></PRE><P>This is obviously determined in large part by the variance (for given mean), and for this example the probability is around 16 % for both classes. </P><P>The <EM>sim</EM> can be used to generate a whole set of estimates of the classification boundary. This is in fact sampling the space of estimators. The mean of this sample should be very close to the actual value thanks to the <A HREF="http://en.wikipedia.org/wiki/Central_limit_theorem">central limit theorem.</A> Here's a typical run </P><PRE><CODE> &gt;mean(sim(400, 0.35, 0.15, 0.65, 0.15))  
  [1] 0.4977804  
&gt; mean(sim(400, 0.35, 0.15, 0.65, 0.15))  
  [1] 0.4990484  
&gt; mean(sim(400, 0.35, 0.15, 0.65, 0.15))  
  [1] 0.5024554 </CODE></PRE><P>The R function <EM>mean</EM> is used to find the mean of the values of the vector of simulation results. </P><P><A NAME="samplemean1d"><IMG SRC="http://github.com/fons/blog-images/raw/master/20091101/samplemean1d.png"></A></P><P><A HREF="" SAMPLEMEAN1D="samplemean1d">This graph</A> shows a plot of the density of the sample distribution for the mean. The graph shows two distributions one for a standard deviations 0.15 in blue and 0.35 in red respectively. </P><P>Notice how wide the distribution for sd=0.35 is around 0.5. It turns out that about 33 % of your data will be misclassified because there will be significant overlap between the two classes. </P><P><A NAME="wide1d"><IMG SRC="http://github.com/fons/blog-images/raw/master/20091101/wide1d.png"></A></P><P><A HREF="" WIDE1D="wide1d">This graph</A> is generated using a standard deviation of 0.35 for both classes and with 50 data points in each class. As you can see there's significant overlap between the two classes. In fact, it would be hard to just visually pick out a good classification boundary. </P> ]]></content:encoded>
      <guid> http://fons.github.com/toy-problem-simple-one-dimensional-least-squares-learner.html </guid>      
      <pubDate> Sun, 01 Nov 2009 10:30:37 EST </pubDate>
    </item>
    
    <item>
      <title> Processing the Sieve in Python </title>
      <link> http://fons.github.com/processing-the-sieve-in-python.html </link>
      <description> &lt;P&gt;In a &lt;A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html"&gt;previous post&lt;/A&gt; I discussed four methods to multi-thread the &lt;A HREF="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes"&gt;Sieve of Erasthones&lt;/A&gt; in Python. I concluded that multi-threading didn't increase performance, and in fact could have a significant adverse effect. The &lt;A HREF="www.dabeaz.com/python/GIL.pdf"&gt;global interpretor lock (GIL)&lt;/A&gt; prevents threads from running concurrently and thus limits the upside of threading. The use of locks or avoiding the use of shared data can than decrease performance quite a bit. &lt;BR&gt; &lt;BR&gt;  In this section I'll be using Python's &lt;A HREF="http://docs.python.org/library/multiprocessing.html"&gt;multiprocessing&lt;/A&gt; module to 'multi-thread' the &lt;EM&gt;Sieve&lt;/EM&gt;. &lt;BR&gt; &lt;BR&gt;  The multiprocessing module spawns a number of processes and distributes the calculation amongst them.There is no equivalent to the GIL so I should be able to see some gain in performance as the number of processes increases. On the other hand, spawning processes means that there is startup overhead which may offset any performance gain due to the distribution of its execution across multiple processes.  However, I should still be able to investigate how performance scales with the number of processes, and whether the &lt;A HREF="http://docs.python.org/library/multiprocessing.html"&gt;multiprocessing module&lt;/A&gt; is able to take advantage of multiple cores. In this post I'll discuss four approaches to distributing the &lt;EM&gt;Sieve&lt;/EM&gt; algorithm, basically following the approaches I discussed &lt;A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html"&gt;earlier&lt;/A&gt; when using the multi-threading package.The various approaches differ in the way the load is balanced and whether the state of the sieve is shared.&lt;BR&gt; &lt;BR&gt;  The source for the code discussed here and in the  &lt;A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html"&gt;previous post&lt;/A&gt; can be found in  &lt;EM&gt;prime_share.py&lt;/EM&gt; in the &lt;A HREF="git://github.com/fons/blog-code.git"&gt;blog-code package&lt;/A&gt; on github. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>In a <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post</A> I discussed four methods to multi-thread the <A HREF="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes">Sieve of Erasthones</A> in Python. I concluded that multi-threading didn't increase performance, and in fact could have a significant adverse effect. The <A HREF="www.dabeaz.com/python/GIL.pdf">global interpretor lock (GIL)</A> prevents threads from running concurrently and thus limits the upside of threading. The use of locks or avoiding the use of shared data can than decrease performance quite a bit. <BR> <BR>  In this section I'll be using Python's <A HREF="http://docs.python.org/library/multiprocessing.html">multiprocessing</A> module to 'multi-thread' the <EM>Sieve</EM>. <BR> <BR>  The multiprocessing module spawns a number of processes and distributes the calculation amongst them.There is no equivalent to the GIL so I should be able to see some gain in performance as the number of processes increases. On the other hand, spawning processes means that there is startup overhead which may offset any performance gain due to the distribution of its execution across multiple processes.  However, I should still be able to investigate how performance scales with the number of processes, and whether the <A HREF="http://docs.python.org/library/multiprocessing.html">multiprocessing module</A> is able to take advantage of multiple cores. In this post I'll discuss four approaches to distributing the <EM>Sieve</EM> algorithm, basically following the approaches I discussed <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">earlier</A> when using the multi-threading package.The various approaches differ in the way the load is balanced and whether the state of the sieve is shared.<BR> <BR>  The source for the code discussed here and in the  <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post</A> can be found in  <EM>prime_share.py</EM> in the <A HREF="git://github.com/fons/blog-code.git">blog-code package</A> on github. </P><H2>Sieve of Erasthones</H2><P>A more extensive discussion can be found <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">in the first post</A>. To find the number of primes below a threshold the <EM>Sieve</EM> algorithm uses a list of booleans (or 0/1 values). It's the sharing of this list which is a distinction between the approaches discussed below. </P><H2>Test Configuration</H2><P>The test configuration is the same as in the <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post</A>  :A Toshiba A215-S4747 with an AMD Turion 64 X2 (dual-core) processor running Ubuntu 8.04.3 and Python 2.6.2. I won't discuss the performance under Jython or IronPython as they don't have support the multiprocessing module as of yet. </P><H2>Various Implementations</H2><P>In this section I'll discuss four different implementations of the distributed <EM>Sieve</EM>. Each implementation is labeled with the name of the implementation in <EM>prime_share.py</EM> which is part of  <A HREF="git://github.com/fons/blog-code.git">the blog-code package</A> on github. </P><UL><LI>main_smp: no shared data; split the work evenly between the processes.</LI><LI>main_smp_alt: no shared data; split the sieve between the processes.</LI><LI>main_smp_shared: share the sieve between the processes.</LI><LI>main_smp_shared_2: share the sieve between the processes using shared memory.</LI></UL><P>All implementations, except the last, use the <A HREF="http://docs.python.org/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map">map</A> method of the <A HREF="http://docs.python.org/library/multiprocessing.html#module-multiprocessing.pool">Pools</A> class in the <A HREF="http://docs.python.org/library/multiprocessing.html">multiprocessing package.</A> The distributed map method has the same interface as the map built-in function : <CODE>map(func, iterable)</CODE>When the sieve list is not shared some post processing is required. I use the built-in reduce function so we have a <A HREF="http://www.cs.vu.nl/~ralf/MapReduce/">map-reduce</A> pattern.<BR> <BR>  In the last two examples the sieve list is shared amongst the processes. In the first of these the <A HREF="http://docs.python.org/library/multiprocessing.html#module-multiprocessing.managers">manager</A> class in the <A HREF="http://docs.python.org/library/multiprocessing.html">multiprocessing package</A> is used. This class manages a server process from which the shared data data is proxied to the processes accessing it. The alternative is using shared memory which is done in the last example. </P><H3>main_smp : no shared data; split the work evenly</H3><P>This is the first implementation which uses the <EM>Pool</EM> class in the multiprocessing module. The <EM>Pool</EM> class's <EM>map</EM> method is used to distribute the <EM>Sieve</EM> across various processes. The results of each calculation  are captured and returned in a list by the map function. </P><PRE><CODE>def main_smp(top, nthreads) :  
 
    n             = int(top)  
	nthreads      = int(nthreads)  
    B     = smp_load_balance(nthreads, n)  
    p     = multiprocessing.Pool(nthreads)  
    K     = p.map(dowork_smp, map(lambda lst : (n, lst, nthreads), B))  
    PR    = transpose(K)  
    prime = p.map(reduce_chunk, PR)  
 
    return count_primes(prime)     
 
def dowork_smp(args) :  
    n, nexti_smp, chunks = args  
    nk    = 0  
    ops   = 0  
    k     = nexti_smp[0]  
    L     = ( n + 1) * [1]  
    lim   = len(nexti_smp)  
    while 1 :  
 
        k  = nexti_smp[nk]  
        if L[k] == 1 :  
          r = n / k  
          for i in range(nexti_smp[0], r+1) :  
              ops   += 1  
              L[i*k] = 0  
        nk += 1  
        if nk &gt;= lim : break  
 
 
    len_L = n + 1  
    split = len_L / chunks  
 
    K     = range(0, len_L - split + 1, split)+[len_L]  
    Z     = [ L[k[0]:k[1]] for k in zip(K, K[1:]) ]  
 
    return Z  
 
 
def smp_load_balance(th , n) :  
 
    def operations(t) :  
        return int((n / t) + 1 - t)  
 
    def find_min(thr_alloc) :  
        min, lst = thr_alloc[0]  
        if min == 0 :  
            return 0  
        midx = 0  
        for index in range(1, len(thr_alloc)) :  
            count, lst = thr_alloc[index]  
            if count &lt; min :  
                min   = count  
                midx  = index  
        return midx  
 
    lim           = int(math.sqrt(n)) + 1  
    nexti_lb      = range(2, lim, 1)  
 
    if th &lt; 2 :  
        return [nexti_lb]  
 
    thr_allocs = map(lambda i : (0, [] ), range(th))  
    Z = map(operations, nexti_lb)  
 
    L = zip(map(operations, nexti_lb), nexti_lb)  
 
    for i in L :  
        ops, index = i  
        mindex = find_min(thr_allocs)  
        cnt, lst = thr_allocs[mindex]  
        cnt += ops  
        lst.append(index)  
        thr_allocs[mindex] = (cnt, lst)  
 
    return map(lambda p: p[1], thr_allocs)  
 
def list_reduce(l1, l2) :  
    return map(lambda p : p[0]*p[1], zip(l1,l2))  
 
def reduce_chunk(C) :  
     return reduce(lambda x, y : x + y, reduce(list_reduce, C))          
 
 def transpose(K) :  
    nthreads = len(K)  
    chunks   = len(K[0])  
    X = [ (l, k) for k in range(0, chunks) for l in range(0, nthreads)]  
    len_X = len(X)  
    S  = [ X[k:k+nthreads] for k in range(0, len_X, nthreads)]  
    PR = [ [ K[p[0]][p[1]] for p in S[s]] for s in range(0, chunks) ]  
    return PR  
 
</CODE></PRE><P>Each pool works on an  independent set of sieve indices.The indices to work on are chosen such that the amount of work done by each process is roughly the same. This is done in the <EM>smp_load_balance</EM> procedure, which was discussed in more detail in my <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post.</A> The statement </P><PRE><CODE>K     = p.map(dowork_smp, map(lambda lst : (n, lst, nthreads), B)) </CODE></PRE><P>is where the work is distributed between the processes. The results are returned in K.Since the 'sieve list' is not shared the results of each process in the pool need to be combined to form the final result. <BR> <BR>  </P><P>This is done by examining each list in the result set. If an index is flagged as a composite number in any one of them, the resulting sieve location is also flagged as a composite number. In other words, each index value across the result lists is treated as a boolean and the results in each list are 'or-ed' together to arrive at the final value. This is expressed as a reduce operation on the results of the map operation. Needless to say, this reduction is going to be time consuming. So I'm distributing this reduction across the pool processes used in the map phase. <BR>  <BR>  In order to speed up the reduction process, the results are not returned as one list,but as a list of partitionings or chunks. Each partitioning corresponds to a contiguous part of the results list. The number of partionings is equal to the number of processes.These partitionings are then combined with equivalent partitionings returned by the other processes in the pool. This is done in the <EM>transpose</EM> function. <BR>  <BR>  Say we have two processes, and the first one returns [[1,1,1],[1,0,1]] and the second one returns [1,1,1. First of all, if we ignore the chunking, the combination (reduction) of the two results is in fact [1,1,1,1,0,1]. What <EM>transpose</EM> does is generate a list of equivalent pieces whose reduction can be distributed : [[[1,1,1] , [1,1,1]], [[1,0,1] , [1,1,1]]]. This reduction is distributed amongst the processes in the pool: </P><PRE><CODE>    prime = p.map(reduce_chunk, PR) </CODE></PRE><P>The distributed reduction processes work on a subset of the results returned by the mapping operation earlier. Each process in the distributed reduction returns the number of primes it found in the chunks it received as arguments. In this example that would be [3, 2], resulting in a determination of three primes (zero and one are excluded). </P><P><A NAME="smp-results"><IMG SRC="http://github.com/fons/blog-images/raw/master/20090927/smp_cmp.png"></A></P> This graph shows the results for this example, labelled as 'smp' and the next one, labelled as 'smp alt'. The x-axis shows the number of processes that are part of the pool. The y-axis shows the execution time for forty repeats of the <EM>Sieve</EM> algorithm to determine the number of primes less than 10000. I used Python's <A HREF="http://docs.python.org/library/timeit.html">timeit package</A> to determine the run time. <BR><BR> For each example, the graph also shows the time taken in the *startup* phase, the *map* phase and the time taken to complete the calculation. *startup* is considered everything up to and including the load balancing.The *map* phase includes the *startup* phase and the first *Pool map* operation. <BR><BR> The increase in *startup* time is proportional to the number of processes that need to be started. The time the *map phase* takes is primarily a function of the *startup* time. However, if you look closely you'll notice that the *map* phases rise somewhat steeper that the *startup* phase. This is due to the fact that the number of operations increases slightly as the number of processes increases. This is explained in my <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post</A> where I used  a similar implementation when multi-threading the sieve. Lastly, the total time taken rises much more steeply than either of the two previous phases. This is entirely due to the reduction phase. As the number of processes in increases the number of chunks to be reduced increases as well. This reduction phase seems to be somewhat quadratic (it involves after all transposing a matrix), driving down performance even more. <BR> <BR> To summarize, there is no gain to had in distributing the algorithm this way for two reasons: The increase startup time due to the increase in size of the  processing pool, is not offset by an increase in processing speed. In addition, the lack of shared data introduces a very inefficient reduction phase.<BR>
<H3>main_smp_alt:  no shared data; split the sieve evenly</H3><P>The second implementation I looked at is similar to the one called <EM>main_nolocks_alt</EM> discussed in the<BR>
<A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">threading post. </A> The sieve is split and each member of the multiprocessing pool receives a piece to process. The code is shown here:  </P><PRE><CODE> def main_smp_alt(top, nthreads) :  
 
    n             = int(top)  
    nthreads      = int(nthreads)  
 
    ind   = range(2, n, 1)  
    B     = load_balance(ind, nthreads)  
    p     = multiprocessing.Pool(nthreads)  
    K     = p.map(dowork_smp_alt, map(lambda lst : (n, lst), B))  
 
    prime_smp_alt = reduce(lambda l,r : l + r, K)  
    return count_primes(prime_smp_alt)  
 
def dowork_smp_alt(args) :  
    n, irange    = args  
    k            = 2  
    lim          = int(math.sqrt(n)) + 1  
    istart, iend = irange  
	L            = ( n + 1) * [1]  
	ifrom        = 999999  
    ito          = -1  
 
    while 1 :  
 
        if not (k &lt; lim)  : break  
        if not (k &lt; iend) : break  
 
        if k &lt; istart :  
            s = (istart / k ) + 1  
            r = (iend / k) + 1  
            for i in range(s, r) :  
                index = i * k  
 
                if ifrom &gt; index :  
                    ifrom = index  
                if ito &lt; index :  
                    ito = index  
 
            L[i*k] = 0  
        elif L[k] == 1 :  
            s = 2  
            r = (iend / k) + 1  
            for i in range(s, r) :  
 
                index = i * k  
 
                if ifrom &gt; index :  
                    ifrom = index  
                if ito &lt; index :  
                    ito = index  
 
                L[i*k] = 0  
 
        k   = k + 1  
 
    if ifrom == 4 :  
        ifrom = 0  
 
    return L[ifrom: ito + 1]  
</CODE></PRE><P>The reduction phase is very simple. The function do_work_smp_alt returns the part of the sieve it worked on. Consequently the Sieve is the concatenation of the lists returned by the map process. The result are shown <A HREF="#smp-results">  here. </A><BR> <BR>  Notice that the <EM>startup</EM> time of this and the previously discussed implementation are roughly in line. The difference is probably due to a slight difference in load on the box as these tests were done at different times. <BR> <BR>  The second thing to notice is that the performance increases quite a bit from one to two processes. This is not real, but an artifact of the way the test is done. </P><P><A NAME="smp-alt-startup"><IMG SRC="http://github.com/fons/blog-images/raw/master/20090927/smp_alt_startup.png"></A></P><P><A HREF="" AMP-ALT-STARTUP="amp-alt-startup">In this graph</A>  I show two runs where I varied the number of processes in the pool from high to low. I start the run with a pool size of fifteen and nine processes respectively, and repeat the calculation as I lower the number of processors in the pool. Any startup effect would then be seen at right hand side of the graph, and the results on the left hand side would be more consistent. You can see that this startup effect clearly in the graph. I have no explanation for it, but it doesn't invalidate the main conclusion which is that the performance of this approach decreases as the number of processes increases. <BR><BR> The reason for this decrease in performance is that the number of operations increases as the number of processes in the pool increases. All processes, except the one working on the first part of the Sieve, have to zero out position in the Sieve, without being able to take advantage of the prime numbers 'discovered' in the first part of the Sieve. The same observation was made when the multi-threaded implementation was discussed. Not having shared data basically hurts performance.<BR>
</P><H3>main_smp_shared : share the sieve between the processes.</H3><P>In this version the Sieve is shared between the processes in the pool. The shared sieve is created using the <A HREF="http://docs.python.org/library/multiprocessing.html#module-multiprocessing.managers">manager class </A> in the <A HREF="http://docs.python.org/library/multiprocessing.html">multiprocessing package.</A></P><PRE><CODE>def main_smp_shared(top, nthreads) :  
 
    n             = int(top)  
    nthreads      = int(nthreads)  
 
    manager = multiprocessing.Manager()  
    prime_s = (n + 1) * [1]  
    B       = smp_load_balance(nthreads, n)  
    p       = multiprocessing.Pool(nthreads)  
    L_m     = manager.list(prime_s)  
    K       = p.map(dowork_smp_shared, map(lambda lst : (n, lst, L_m), B))  
 
    return count_primes(L_m)     
 
def dowork_smp_shared(args) :  
    n, nexti_shared, L = args  
    nk    = 0  
    ops   = 0  
    k     = nexti_shared[0]  
    lim   = len(nexti_shared)  
    while 1 :  
 
        k  = nexti_shared[nk]  
        if L[k] == 1 :  
            r = n / k  
            for i in range(nexti_shared[0], r+1) :  
                ops   += 1  
                L[i*k] = 0  
        nk += 1  
        if nk &gt;= lim : break  
 
    return []  
</CODE></PRE><P>Here's how the shared sieve is created: </P><PRE><CODE>    prime_s = (n + 1) * [1]  
    L_m     = manager.list(prime_s) </CODE></PRE><P>The load balancing was discussed before. There is no reduction phase. The Sieve is shared and the primes can simply be determined by inspection. </P><P><A NAME="smp-shared"><IMG SRC="http://github.com/fons/blog-images/raw/master/20090927/smp_shared.png"></A></P><P>The <EM>startup</EM> phase includes the everything up and including the creation of the process pools. <EM>phase 1</EM> adds the creation of the shared list through the list manager. Clearly initializing the <EM>manager</EM> this quite a bit of processing time and I've reduced the number of repetitions from forty to five. Nevertheless the run times are significantly higher than the cases discussed previously due to the use of the manager to generate a shared data structure. According to the documentation the <EM>manager</EM> adds an additional process to manage the shared data structure. This is useful if the calculation was distributed across multiple machines,  but it's clearly overkill here. <BR>  <BR>  That said, notice that performance improves as the number of threads increases from one to two and three. The AMD Turion processor is a two core processor so you'd expect that. Beyond three processes the returns diminish as processes start interfering with each other. Because the sieve is shared between processes the number of operations remains the same regardless of the size of the process pool. Therefore, the increase in performance is entirely due to the fact that the multiprocessing module takes advantage of the multiple cores on a machine. <BR> <BR>  Still, in absolute terms the performance here is not particularly good. So let's move on to the next and final approach </P><H3>main_smp_shared_2 : share the sieve between the processes using shared memory.</H3><P>The other way to share data between processes is through shared <A HREF="http://docs.python.org/library/multiprocessing.html#shared-ctypes-objects">ctypes ojects.</A><EM>ctypes</EM> use shared memory which, according to the documentation, 'can be inherited by child processes'. In order to use them, we need to abandon <EM>Pool.map</EM> and rewrite the process slightly using <A HREF="http://docs.python.org/library/multiprocessing.html#the-process-class">Process class.</A></P><P>If you try to use <EM>ctypes</EM> with <EM>Pools</EM> you get this exception thrown at you if you do: </P><PRE><CODE>   RuntimeError: SynchronizedArray objects should only be shared between processes through inheritance </CODE></PRE><P>The rewrite is trivial and in fact looks very similar to the map approach used in the previous examples. </P><PRE><CODE>def main_smp_shared_2(top, nthreads) :  
   n             = int(top)  
   nthreads      = int(nthreads)  
 
   prime         = (n + 1) * [1]  
 
   B     = smp_load_balance(nthreads, n)  
   arr   = multiprocessing.Array('i',prime)  
 
   procs = map(create_process, map(lambda lst : (n, lst, arr), B))  
   map(lambda p : p.start(), procs)  
   map(lambda p : p.join(), procs)  
   prime = arr[:]  
 
   return count_primes(prime)  
 
def create_process(argv) :  
   return multiprocessing.Process(target=dowork_smp_shared_2, args=(argv,))  
 
def dowork_smp_shared_2(args) :  
   n, nexti_sh2, L = args  
   nk    = 0  
   ops   = 0  
   k     = nexti_sh2[0]  
 
   lim   = len(nexti_sh2)  
   while 1 :  
 
       k  = nexti_sh2[nk]  
       if L[k] == 1 :  
           r = n / k  
           for i in range(nexti_sh2[0], r+1) :  
               ops   += 1  
               L[i*k] = 0  
       nk += 1  
       if nk &gt;= lim : break  
 
   return L  
 
</CODE></PRE><P>The results of the test are shown here. </P><P><A NAME="smp-shared"><IMG SRC="http://github.com/fons/blog-images/raw/master/20090927/smp_shared_2_rmst.png"></A></P>  I should note that ran the test by starting at a higher process count than shown here and counting down. I dropped the results for the higher process count, as it clearly showed the startup effect mentioned in the discussion of the results of the <EM>main_smp_alt</EM> implementation. The <EM>startup</EM> phase in this example includes everything up to and including the creation of the shared array. <EM>phase 1</EM> adds the process creation. <BR>  <BR>  There is no reduction phase as the processes share the sieve list. Therefore any change in performance is due to the distribution of the calculation across multiple instances. Notice the increase in performance when the number of processors increase from one to two. Performance starts to decrease after that as adding more processes creates more load on the box. <P><A NAME="smp-shared"><IMG SRC="http://github.com/fons/blog-images/raw/master/20090927/smp_shared_2_mp.png"></A></P>  In this graph I show the performance as a function of the number of processes from 1 to 20. Again, changed the number of processes from hight to low, to remove startup effects on the lower end of the graph. You can clearly see the improvement in performance as you change from one to two and three processes, but that improvement diminished rapidly. <H2>Conclusions</H2><P>This post and the <A HREF="http://www.prognotes.com/threading-the-sieve-in-python.html">previous post</A> use the distribution of the <A HREF="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes">Sieve of Erasthones</A> as way to explore multi-threading and multi-processing in Python. <BR>  <BR>  I explored two simple ways to avoid the use of shared data. Both approaches lead to an increase in the number of total operations required to get the final <EM>Sieve</EM>. In both cases additional processing steps are required which can add significant processing time, offsetting any potential performance gain due to the distribution of the calculation, regardless of whether I use multi-threading or multi-processing. <BR>  <BR>  When shared data is used, the [global interpretor lock (GIL)](a href=www.dabeaz.com/python/GIL.pdf) in the multi-threading module puts a hard floor on any potential performance increase. <BR>  <BR>  The multiprocessing module does show a performance when the number of processes is roughly equal to the number of cores. However, an algorithm as simple as the <EM>Sieve</EM> doesn't allow for amortization of the startup cost, and the performance is significantly worse when multi-threading is used. </P><P>However, in cases where the startup costs can be amortized successfully the multiprocessing module may well lead to a gain in performance. </P> ]]></content:encoded>
      <guid> http://fons.github.com/processing-the-sieve-in-python.html </guid>      
      <pubDate> Sun, 27 Sep 2009 14:52:50 EST </pubDate>
    </item>
    
    <item>
      <title> Threading the Sieve in Python </title>
      <link> http://fons.github.com/threading-the-sieve-in-python.html </link>
      <description> &lt;P&gt;This is the first of two posts on threading and multiprocessing in Python. In this post I'll explore the thread module and in the second post I'll look at Python's multiprocessing module. My starting point is the multi-threaded implementation of the &lt;A HREF="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes"&gt;Sieve of Erasthones&lt;/A&gt; found in this &lt;A HREF="http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf"&gt;tutorial on multi-threading in Python (pdf).&lt;/A&gt;&lt;BR&gt; &lt;BR&gt; Threading a compute-bound algorithm, like the &lt;EM&gt;Sieve&lt;/EM&gt; consists of subdividing of the main task into autonomous sub-tasks which share as little state as possible. Having no shared state eliminates the overhead that inevitably comes with locking. It turns out that Python is not very good at multi-threading compute-bound processes. &lt;A HREF="http://www.dabeaz.com/blog/dablog.html"&gt;This &lt;/A&gt;&lt;A HREF="http://ttimo.vox.com/library/post/python-gil-threading-and-multicore-hardware.html"&gt;is &lt;/A&gt;&lt;A HREF="http://www.grouplens.org/node/244"&gt;not a &lt;/A&gt;&lt;A HREF="http://blog.ianbicking.org/gil-of-doom.html"&gt;surprise.&lt;/A&gt;  CPython has a global interpretor lock &lt;A HREF="http://www.dabeaz.com/python/GIL.pdf"&gt;(GIL)&lt;/A&gt; which prevents threads from running concurrently. &lt;BR&gt; &lt;BR&gt;  Regardless, there are other lessons I learned when multi-threading the &lt;EM&gt;Sieve&lt;/EM&gt; algorithm. One is that sharing state between threads may be unavoidable to achieve reasonable performance. In fact, if you &lt;EM&gt;don't&lt;/EM&gt; share state, performance can become predictable &lt;EM&gt;worse&lt;/EM&gt; as the number of threads of execution increases. &lt;BR&gt; &lt;BR&gt; The other is that locking can have a surprising impact on performance. It's not just the cost of locking per se, but the effect locking has on the distribution of work between the various threads. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>This is the first of two posts on threading and multiprocessing in Python. In this post I'll explore the thread module and in the second post I'll look at Python's multiprocessing module. My starting point is the multi-threaded implementation of the <A HREF="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes">Sieve of Erasthones</A> found in this <A HREF="http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf">tutorial on multi-threading in Python (pdf).</A><BR> <BR> Threading a compute-bound algorithm, like the <EM>Sieve</EM> consists of subdividing of the main task into autonomous sub-tasks which share as little state as possible. Having no shared state eliminates the overhead that inevitably comes with locking. It turns out that Python is not very good at multi-threading compute-bound processes. <A HREF="http://www.dabeaz.com/blog/dablog.html">This </A><A HREF="http://ttimo.vox.com/library/post/python-gil-threading-and-multicore-hardware.html">is </A><A HREF="http://www.grouplens.org/node/244">not a </A><A HREF="http://blog.ianbicking.org/gil-of-doom.html">surprise.</A>  CPython has a global interpretor lock <A HREF="http://www.dabeaz.com/python/GIL.pdf">(GIL)</A> which prevents threads from running concurrently. <BR> <BR>  Regardless, there are other lessons I learned when multi-threading the <EM>Sieve</EM> algorithm. One is that sharing state between threads may be unavoidable to achieve reasonable performance. In fact, if you <EM>don't</EM> share state, performance can become predictable <EM>worse</EM> as the number of threads of execution increases. <BR> <BR> The other is that locking can have a surprising impact on performance. It's not just the cost of locking per se, but the effect locking has on the distribution of work between the various threads. </P><H2>Sieve of Erasthones</H2><P>The Sieve of Erasthones is a way to find all the prime numbers smaller than N. You start with an array of size N, with all slots initialized to 1 (i.e. to 'true').   <BR>
 You start moving down the array, and if the value of the slot at your current position is 'true' (i.e. 1),  you set the slots at multiples of your current position index to false (i.e. 0). At each position there only one transition, from true (1) to false (0), and not vice-versa. The largest multiplier for a position with index i is N/i. You're done when you hit the slot with index equal to the square root of N. Note, that as you make your way through the array, you zero out less and less positions. </P><P>For example, if you want to find all the primes up to 10, you start at position index 2, and zero out 4,6,8 and so on. You move to 3, and zero out 6 and9. since N  = 10, you stop here. The next non-zero position is 5, as 4 was already set to false previously. All slots still flagged as true (1) are primes : 2, 3, 5 and 7. </P><H2>Testing Platform</H2><P>All tests are performed on a Toshiba A215-S4747 with an AMD Turion 64 X2 (dual-core) processor. The operating system is Ubuntu 8.04.3. I replaced the Python version shipped with Ubuntu with version 2.6.2. </P><P>For the Jython test I installed Jython 2.5, from the Jython web site in stand-alone mode. IronPython 1.1.1. was installed as well. </P><P>Each implementation I'm going to discuss below is part of <EM>prime_share.py</EM> in the <A HREF="git://github.com/fons/blog-code.git">blog-code package</A> on github. </P><P><EM>prime_share.py</EM> is designed to perform side-by-side comparisons of the various implementations. It allows you to specify a range of threads used in the distribution of the calculation. Timing is done using Python's timeit module, with garbage collection turned off. </P><H2>Various Implementations</H2><P>This section discusses four different multi-threaded implementations of the <EM>Sieve</EM>. </P><UL><LI><P>main_orig: From the <A HREF="http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf">tutorial (pdf).</A></P></LI><LI><P>main_nolocks: Drastically reduces locking.</P></LI><LI><P>main_nolocks_alt: Splits the sieve across threads.</P></LI><LI><P>main_nolocks_alt2: Distributes the work more equitably amongst the threads.</P></LI></UL><P>The main difference between the implementations is the amount of locking, and the way work is distributed across multiple threads. </P><P>I'll start out <A HREF="#allresults">  by showing </A>  the result of a performance test of each of these algorithms. The x-axis shows the number of threads used in the run. Each run calculates the number of primes up to 10000. The time it took to run this calculation 40 times in sucession is shown on the y-axis. Each data set is labelled  'main_xyz' where 'xyz' identifies the implementation. </P><P><A NAME="allresults"><IMG SRC="http://github.com/fons/blog-images/raw/master/thread.png"></A></P><P>You would expect the run time to decrease as the number of threads increases. That's clearly not the case. The performance the <EM>main_orig</EM> implementation is extremenly erratic, and the performance of the <EM>main_nolock_alt</EM> implementation <EM>decreases</EM> steadily as the number of threads increases.<BR>
 The performance of the remaining two implementations is unaffected by the number of threads.  </P><P>What's the reason for this ? Well, below are four sections discussing these four implementations. </P><H3>main_orig : the original implementation</H3><P>As I mentioned before, the implementation of the <EM>Sieve</EM> as shown in the <A HREF="http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf">tutorial</A> is the starting point for subsequent implementations. The full code I used in my test is shown below : </P><PRE><CODE>def count_primes(prime) :  
	p = reduce(lambda x, y: x + y, prime) - 2  
	return p  
 
def dowork_orig(tn): # thread number tn  
	global n,prime_global,nexti_global,nextilock,nstarted,nstartedlock,donelock  
    donelock[tn].acquire()  
    nstartedlock.acquire()  
    nstarted += 1  
    nstartedlock.release()  
    lim = math.sqrt(n)  
 
    while 1:  
       nextilock.acquire()  
       k = nexti_global  
       nexti_global += 1  
       nextilock.release()  
       if k &gt; lim: break  
       if prime_global[k]:  
           r = n / k  
           for i in range(2,r+1):  
               prime_global[i*k] = 0  
 
    donelock[tn].release()  
 
def main_orig(top, nthreads):  
   global n,prime_global,nexti_global,nextilock,nstarted,nstartedlock,donelock  
 
   n        = int(top)  
   nthreads  = int(nthreads)  
   prime_global = (n+1) * [1]  
 
   nstarted = 0  
   nexti_global = 2  
 
   nextilock = thread.allocate_lock()  
   nstartedlock = thread.allocate_lock()  
   donelock = []  
   for i in range(nthreads):  
       d = thread.allocate_lock()  
       donelock.append(d)  
       thread.start_new_thread(dowork_orig,(i,))  
   while nstarted &lt; nthreads: pass  
   for i in range(nthreads):  
       donelock[i].acquire()  
 
   return count_primes(prime_global)    </CODE></PRE><P>The variable <EM>n</EM> is the upper limit of the search and <EM>nthreads</EM> is the number of threads. The global variable <EM>prime_global</EM> is the sieve, and <EM>nexti_global</EM> is its index. </P><P>The function <EM>dowork_orig</EM> implements the sieve algorithm along the lines mentioned earlier. </P><P>All the global variables are shared amongst the threads. There are three locking variables.<BR>
 The first one is <EM>nstartedlock</EM>, which is used to set a counter. The counter is used in the main  to wait for all threads to start. </P><P>The second one is the <EM>donelock</EM> array. This is used to implement a mechanism to wait for all the threads to finish, similar to 'join' in other threading packages. The number of entries is equal to the number of threads. </P><P>The last lock is <EM>nexti_lock</EM>. It's purpose is to protect access to the global index variable so that each thread processes a unique index.<BR>
</P><P>Access to the 'sieve' variable <EM>prime_global</EM> is not protected by lock. There is no need to, since the value of each slot can only go from 1 (true) to 0 (false). If two threads access the same slot simultaneously they can't reach conflicting conclusions of it's final state. Obviously, the GIL effectively prevents any of this from happening, but it's good the know that if it didn't the algorithm wouldn't break. Even out-of-order execution where a higher index is processed first, is not going to lead to different final state (provided all indexes are processed). </P><P>As you can see in the <A HREF="#allresults">  side-by-side comparison </A>  the performance of this implementation is very erratic, and shows no obvious dependence on the number threads. </P><P><IMG SRC="http://github.com/fons/blog-images/raw/master/thread_repeat.png"></P><P>  Here I show five data sets generated by running the <EM>main_orig</EM> implementation with the same input parameters as in the <A HREF="#allresults">  side-by-side comparison </A>  above. Note the how the speed varies dramatically in each data set, and between data sets. </P><P>For a single thread the performance in of all four data sets is roughly the same. The small differences are probably due to other activity on the box taking some time away from the test run. In general the calculation time increases as the number of threads increases, except for the first ('red') data set. </P><P>The reason for the wide variation in performance I believe are the <EM>nstartedlock</EM> and <EM>nextilock</EM>. </P><P>Each thread tries to acquire <EM>nstartedlock</EM> at startup. My suspicion is that this lock acquisition varies<BR>
 the start of the threads sufficiently to change to way work is balanced between the threads for each  subsequent run, leading to the dramatic variation in speed. That's because the amount of work a thread performs depends on the index it's processing. For example, let's say that we have run of three threads. The first thread is active, and the other two threads are dormant. The first thread will process indices two and three, and therefore do most of the work. When the other threads wake up, they 'll do less work. On a second run, a context switch occurs sooner and the second thread works on index three, leading to a more equitable distribution of the work. </P><P>In addition, there is contention on <EM>nextilock</EM>. As I pointed out, for higher index values, less work is done. So when the prime check return false (as it would most of the time for these values), the thread will try to acquire the lock again which puts it in contention with other threads. The way the lock is hit by a particular thread probably varies bit from run to run, and this results in the erratic performance profile. </P><P>The main difference between this implementation and the three others discussed below is the removal of this lock, and the lock around the index variable. As you <A HREF="#allresults">  can see </A>  this has dramatic impact on performance. </P><H3>main_nolocks: remove locking; load balance naively</H3><P>In this implementation I've removed the <EM>nstartedlock</EM> as well as the <EM>nextilock</EM> lock. The <EM>donelock</EM> array is kept in place, since the thread package has no 'join' mechanism. </P><P>Here's the complete code for the variation. The other two main_nolocks* which I' m discussing below are very similar. </P><PRE><CODE>def dowork2(n, nexti_ns, prime_nl) :  
 
   k     = nexti_ns[0]  
   lim   = nexti_ns[1]  
   if nexti_ns[0] &gt; nexti_ns[1] :  
       raise "boundaries out-of-order"  
 
   while 1 :  
 
       if not (k &lt; lim) : break  
 
       if prime_nl[k] == 1 :  
           r = n / k  
           for i in range(2, r+1) :  
               prime_nl[i*k] = 0  
 
       k   = k + 1  
 
 
return prime_nl  
 
def dowork_th(tn, donelock, n, nexti_ns) :  
   global prime_nl  
   prime_nl = dowork2(n, nexti_ns, prime_nl)  
   donelock.release()  
 
 
def load_balance(s, th) :  
 
   len_s = len(s)  
   if len_s == 0 :  
       return [ s ]  
 
   base = len_s / th  
   rem  = len_s - base * th  
   K = map(lambda i : i * base, range(1, th+1))  
   t = range(1, rem + 1 )  + (th - rem )*[rem]  
   K = map(lambda p : p[0] + p[1], zip(K, t))  
   K = zip([0] + K, K)  
   last = s[len_s - 1]  
   s.append(last+1)  
   K = map(lambda p : (s[p[0]], s[p[1]]), K)  
   return K  
 
def start_th(fn, args) :  
   return  thread.start_new_thread(fn, args)  
 
def main_nolocks(top, nthreads) :  
   global prime_nl  
   n             = int(top)  
   nthreads      = int(nthreads)  
 
   prime_nl      = (n + 1)  * [1]  
   donelock      = map(lambda l : l.acquire() and l,  
                       map(lambda i : thread.allocate_lock(), range(nthreads)))  
 
   lim   = int(math.sqrt(n)) + 1  
   nexti_ns = range(2, lim, 1)  
 
   B = load_balance(nexti_ns, nthreads)  
 
   map(lambda i : start_th(dowork_th, (i, donelock[i], n, B[i])),  
       range(nthreads))  
 
   map(lambda i : donelock[i].acquire(), range(nthreads) )  
 
   return count_primes(prime_nl)    </CODE></PRE><P>In the original implementation the <EM>nexti_global</EM> variable was shared between the threads and used to balance the load between them. In this version that's replaced by the <EM>load_balance</EM> routine. </P><P><EM>load_balance</EM> takes as its input an array and the number of threads and returns a list of pairs which represent the start and end point of the sub arrays each thread will work on. </P><P>For example, if we want to know the number of primes smaller than 101, the indexes we consider in the sieve algorithm would run from 2 to 10. If the number of threads is 3, <EM>load_balance</EM> would return the following array : <CODE>[ (2, 5), (5, 8), (8, 11) ]</CODE> </P><P>This is naive, because the thread working on the first pair will do a lot more work than the one assigned the last pair. </P><P>The difference in performance with the original implementation is <A HREF="#allresults">  quite dramatic. </A></P><P>The big difference in performance even for a single thread between this and the previous version must be due to the difference in locking. </P><P>The original implementation acquires a lock at the startup of each thread and uses locks to control access to the global index counter. In this implementation these locks have been removed. The benefit this removal is quite obvious from the graph. </P><P>The effect of the GIL is also quite obvious: The performance does not depend on the number of threads at all. </P><H3>main_nolocks_alt: remove locking, split the sieve</H3><P>This version is a minor variation of the previous one. Previously the <EM>range</EM> of indices from 2 to sqrt(N) was split up between the threads. </P><P>In this version each thread processes the full index range, but the <EM>sieve array</EM> is split between the threads: One thread works on the first part of the sieve, the second on a second part and so on. </P><P>For example, if the size of the sieve is 100, and we have three threads, the first thread works on the sieve up to index 33, the second thread takes indexes 34 to 66 and thread 3 takes the rest. </P><P>The <EM>dowork</EM> routine changes a little bit : </P><PRE><CODE>def dowork3(n, irange, prime_nla) :  
   k     = 2  
   lim   = int(math.sqrt(n)) + 1  
   istart, iend = irange  
   while 1 :  
 
       if not (k &lt; lim)  : break  
       if not (k &lt; iend) : break  
 
       if k &lt; istart :  
            s = (istart / k ) + 1  
            r = (iend / k) + 1  
            for i in range(s, r) :  
               prime_nla[i*k] = 0  
       elif prime_nla[k] == 1 :  
           assert k &gt;= istart and k &lt;= iend  
           s = 2  
           r = (iend / k) + 1  
           for i in range(s, r) :  
               prime_nla[i*k] = 0  
 
       k   = k + 1  
 
return prime_nla </CODE></PRE><P>If the index is smaller than the starting index of the part of the sieve array under consideration, we can just zero out multiples of the index. If that's not case we proceed as before, but stop at the upper index of the array if it's smaller than sqrt(N). </P><P>If you look <A HREF="#allresults">  at the results  </A> , notice that the single-threaded performance of this and the previous version are the same. That's to be expected as both versions perform the same amount of work. Less expected is that the performance of this version decreases steadily as the number of threads increases. </P><P>The reason is that the amount of work performed across all threads (i.e. on the process level) increases as the number of threads increases. That's because threads working on the 'higher' part of the sieve can't take advantage of the primes 'discovered' by the thread working on the first, 'lower' part of the sieve. </P><P>Suppose we start with N = 100, and two threads. The first thread works on the sieve range from 0 to 50. It performs the basic sieve operations for the case N = 50. The second thread basically zeros out multiples of values in the range 2 to 11. This thread can't probe the lower part of the sieve to see if it's working with a prime, so it needs to calculate multiples of every element in the range. The more disconnected pieces, the more extra work is done. </P><P>In the absence of the GIL, the additional work per thread could be balanced by the performance gain from distributing the load across multiple threads. I"ll revisit this when tmultiprocessing module is discussed, as it doesn't use a GIL. </P><H3>main_nolocks_alt2: remove locking, load balance more accurately</H3><P>In this last version I try to distribute the work more evenly amongst the threads. This is in contrast with to just splitting the range in pieces, and assigning each thread a piece. This leads to the thread assigned the lowest index range doing most of the work. </P><P>To accomplish a more equitable distribution, I estimate the number of operations per index i. I then assign each thread a set of indices so that the total number of operations per thread varies very little. </P><P>An upper limit for the number of operations for index i for a sieve of length N is : </P><PRE><CODE>1 + N/i - i </CODE></PRE><P>This assumes that the sieve array is not shared amongst the threads. If it is, than we're over-counting the number of operations, since it also counts operations on indices that are multiples of each other. </P><P>For example, when N = 200, for i = 2 the number of operations is 99. For i = 4 the that number is in fact 0 as all multiples of four are also multiples of two. </P><P>Here's the code fragment where the load balancing takes place : </P><PRE><CODE>def smp_load_balance(th , n) :  
 
   def operations(t) :  
       return int((n / t) + 1 - t)  
 
   def find_min(thr_alloc) :  
       min, lst = thr_alloc[0]  
       if min == 0 :  
           return 0  
       midx = 0  
       for index in range(1, len(thr_alloc)) :  
           count, lst = thr_alloc[index]  
           if count &lt; min :  
               min   = count  
               midx  = index  
       return midx  
 
   lim           = int(math.sqrt(n)) + 1  
   nexti_lb      = range(2, lim, 1)  
 
   if th &lt; 2 :  
       return [nexti_lb]  
 
   thr_allocs = map(lambda i : (0, [] ), range(th))  
   Z = map(operations, nexti_lb)  
 
   L = zip(map(operations, nexti_lb), nexti_lb)  
 
   for i in L :  
       ops, index = i  
       mindex = find_min(thr_allocs)  
       cnt, lst = thr_allocs[mindex]  
       cnt += ops  
       lst.append(index)  
       thr_allocs[mindex] = (cnt, lst)  
 
   return map(lambda p: p[1], thr_allocs) </CODE></PRE><P>The distribution between the threads is done as follows: For each index, estimate the number of operations using the equation above. Then, assign the index to the thread with the least amount of work already assigned to it. </P><P>For example, for N = 200 the estimated number of operations per index assuming no sharing is shown in <A HREF="#smpdist">  this graph </A> . </P><P><A NAME="smpdist"><IMG SRC="http://github.com/fons/blog-images/raw/master/smp_dist.png "></A></P><P>For three threads of execution the distribution of the sieve indices given </P><P>by <EM>smp_load_balance</EM> is : </P><UL><LI><P>thread 1 : [2, 9, 12, 14], with a total number of operations of 120.</P></LI><LI><P>thread 2 : [3, 6, 8, 11], with a total number of operations of 119.</P></LI><LI><P>thread 3 : [4, 5, 7, 10, 13], with a total number of operations of 123.</P></LI></UL><P>When the sieve is shared, the amount of work remains unbalanced. Consider the example above. Thread 3 is assigned index 4, but - when the sieve is shared - there's no work to be done. </P><P>The <A HREF="#allresults"> timing test </A> shows that there is no difference<BR>
 when using naive load balancing as is done in <EM>main_nolocks</EM>.  In fact, the performance of both is comparable. </P><P>That's because the total number of operations across all threads of execution remains the same, and the GIL effectively prevents real parallel execution of the threads. So it really doesn't matter how unbalanced the work is. </P><H3>What about Jython and IronPython ?</H3><P>I repeated the runs <A HREF="#allresults">  shown earlier </A>  on Jython 2.5. I run Jython in stand-alone mode. </P><P><IMG SRC="http://github.com/fons/blog-images/raw/master/threads_jython.png"></P><P>Note that all run times are about a factor three higher than under CPython.  The Jython run times are pretty much the same. However, you still see that the locking used in the original implementation creates a performance hit. In addition, the performance doesn't depend on the number of threads either. </P><P>I attempted to repeat this experiment with IronPython. However, IronPython will not run the timing tests with garbage collection disabled. </P><H2>Conclusions</H2><P>The thread module in Python is not well suited when multi-treading compute-bound algorithms. That's nothing knew. </P><P>There are other lessons I learned from the various ways the <EM>Sieve</EM> can be distributed across threads: avoid locking, consider sharing state, and be aware how much work each thread does. </P><P>Let's start with locking. It's obvious from the <A HREF="#allresults">  results </A>  that locking comes at a huge cost. Removing the locks proved to be the single most effective performance booster. </P><P>The <EM>Sieve</EM> algorithm is interesting in that shared state is quite beneficial to performance. Look at the <A HREF="#allresults"><EM>main_nolocks_alt</EM> implementation. </A>  That was an attempt to eliminate shared state amongst the threads. Each thread proceeds based on it's local data only. But because the prime numbers discovered by one of the threads couldn't be shared with the others, they ended up doing more work, and performance decreased. It could be that when real parallel processing is possible, this may be balanced out by the increase in performance due to better parallelism. That however remains to be seen. </P><P>An alternative approach is to remove the global sieve variable in <EM>main_nolocks_alt</EM>. Each thread would work independently on a local seive array, resulting in a partially processed seive. The 'partial' sieves need to be combined to get the final sieve array. </P><P>One way to think of this is as a 'map' operation over the inputs, where each 'map' operation works independently. The results of the mapping phase are the combined (i.e. 'reduced') to get the final result, hence <A HREF="http://www.cs.vu.nl/~ralf/MapReduce/">map-reduce</A>. </P><P>The amount of work for each index in the <EM>Sieve</EM> algorithm varies quite a bit. So care should be taken to make sure that the work load is properly balanced amongst the threads. The effect of the (lack of) balance wasn't so obvious here, since the GIL limits parallelism.<BR>
</P><P>These last two points are going to play a part in my next post where I "Al be using the <EM>Sieve</EM> to explore the multiprocessing module. </P> ]]></content:encoded>
      <guid> http://fons.github.com/threading-the-sieve-in-python.html </guid>      
      <pubDate> Sat, 12 Sep 2009 17:03:23 EST </pubDate>
    </item>
    
    <item>
      <title> Simplified command line processing with dyn-options.py </title>
      <link> http://fons.github.com/simplified-command-line-processing-with-dyn-optionspy.html </link>
      <description> &lt;P&gt;Am I the only one in the world who feels that using python's  &lt;EM&gt;getopt&lt;/EM&gt; is a bit of a struggle ? It involves a lot of boiler plate. Tedious refactoring is required each time you add or change an option.  This is not specific to Python, as most languages have a similar facility to parse the command line, which is similarly annoying. &lt;BR&gt; &lt;BR&gt;  I decided to create an easier way to process command line options, by transforming the command line into an immutable (read-only) object. The result is &lt;A HREF="http://github.com/fons/dyn_options/tree/master"&gt;dyn_options&lt;/A&gt;. &lt;BR&gt; &lt;BR&gt;  &lt;A HREF="http://github.com/fons/dyn_options/tree/master"&gt;dyn_options&lt;/A&gt; considers every string on the command line which starts with either  - or -- (i.e. a single or double dash) an option flag. The value of the option flag is a concatenation of everything that follows it, until the next flag is encountered.  A simple option flag is one without explicit values and is considered a boolean flag, set to &lt;EM&gt;True&lt;/EM&gt;. &lt;A HREF="http://github.com/fons/dyn_options/tree/master"&gt;dyn_options&lt;/A&gt; creates a read-only object, with attributes and values set to the command line option flags and values respectively. &lt;BR&gt; &lt;BR&gt;  So, '--opt4 hello world' will be converted to an option flag  called &lt;EM&gt;opt4&lt;/EM&gt;, with a value of &lt;EM&gt;hello world&lt;/EM&gt;. This makes dealing with spaces on the command line a lot easier. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>Am I the only one in the world who feels that using python's  <EM>getopt</EM> is a bit of a struggle ? It involves a lot of boiler plate. Tedious refactoring is required each time you add or change an option.  This is not specific to Python, as most languages have a similar facility to parse the command line, which is similarly annoying. <BR> <BR>  I decided to create an easier way to process command line options, by transforming the command line into an immutable (read-only) object. The result is <A HREF="http://github.com/fons/dyn_options/tree/master">dyn_options</A>. <BR> <BR>  <A HREF="http://github.com/fons/dyn_options/tree/master">dyn_options</A> considers every string on the command line which starts with either  - or -- (i.e. a single or double dash) an option flag. The value of the option flag is a concatenation of everything that follows it, until the next flag is encountered.  A simple option flag is one without explicit values and is considered a boolean flag, set to <EM>True</EM>. <A HREF="http://github.com/fons/dyn_options/tree/master">dyn_options</A> creates a read-only object, with attributes and values set to the command line option flags and values respectively. <BR> <BR>  So, '--opt4 hello world' will be converted to an option flag  called <EM>opt4</EM>, with a value of <EM>hello world</EM>. This makes dealing with spaces on the command line a lot easier. </P><H3>Using <A HREF="http://github.com/fons/dyn_options/tree/master">dyn_options</A></H3><P>Here is how an option object is created : </P><PRE><CODE>  import dyn_options   
 
  option = dyn_options.create_option(argv, option_defaults())  
</CODE></PRE><P>If you have defaults, <EM>option_defaults()</EM> should return a dictionary of key-value pairs, with the key corresponding to the option, and its value to the desired default value. </P><P>An easy way to check whether an option is set, is to do something like : </P><PRE><CODE>  if option.some_flag :   
          do_something   
           .......   
</CODE></PRE><H3>A few examples</H3><P> You can play around with  <EM> example.py</EM>  to test how various options are handled. Here's the source: </P>  <PRE><CODE>  #!/usr/bin/env python   
  import sys   
  import dyn_options   
 
  def option_defaults() :   
     return dict( [("opt1", "opt1_default"), ("help", False)])   
 
  def main(argv) :   
        option = dyn_options.create_option(argv, option_defaults())   
         print "using defaults :", option   
 
         option = dyn_options.create_option(argv)   
         print "no defaults :", option   
 
             if option.opt4 :   
                 print "opt4 is set :", option.opt4   
           else :   
                print "opt4 is not set"   
 
     if __name__ == '__main__':   
              sys.exit(main(sys.argv))   
 
</CODE></PRE><P>I create two different <EM>option</EM> objects. The first one has defaults;  The second one doesn't. The output for </P><PRE><CODE>    ./example.py --opt2 --opt4 hello world </CODE></PRE><P>is this: </P><PRE><CODE>     using defaults : options :   
        #) help ==&gt; False   
            #) program ==&gt; ./example.py   
            #) opt4 ==&gt; hello world   
            #) opt1 ==&gt; opt1_default   
            #) opt2 ==&gt; True   
 
   no defaults : options :   
           #) program ==&gt; ./example.py   
           #) opt4 ==&gt; hello world   
           #) opt2 ==&gt; True   
   opt4 is set : hello world </CODE></PRE><P>When <EM> option</EM>  is initialized with the dictionary returned by <EM> option_defaults()</EM> ,<BR>  <EM> opt1</EM>  is set to the default value specified for it in the dictionary. In the second case, when no defaults are supplied, it's not set.  </P> <P> Here's the output for : <CODE> ./example.py --opt1 new_value</CODE>  </P>  <PRE><CODE>  using defaults : options :   
       #) help ==&gt; False   
       #) program ==&gt; ./example.py   
       #) opt1 ==&gt; new_value   
   no defaults : options :   
       #) program ==&gt; ./example.py   
       #) opt1 ==&gt; new_value   
   opt4 is not set </CODE></PRE><P>As you can see, the value of <EM>opt1</EM> is now the one provided on the command line, rather than the default. <BR> <BR><BR>
</P><H3>Immutable/Read-Only</H3><P>This is one of the tests in <EM> dyn<EM>options</EM>test.py</EM> . It verifies that the <EM> option</EM>  remains unchanged, after it's been created. </P>  <PRE><CODE>  def test4() :   
  """   
            option is immutable   
     """   
 
     L=['./dyn_options.py', '--opt1', 'opt1_value', '-opt2', 'opt2_value', '-opt3']   
    option = dyn_options.create_option(L, option_defaults())   
    try :   
        assert option.opt1 == "opt1_value"   
        assert option.opt2 == "opt2_value"   
        assert option.opt3 == True   
        assert option.help == False   
 
        #Try to override...   
        option.help = True   
        assert option.help == False   
 
        option.opt2 == "new_opt2_value"   
        assert option.opt2 == "opt2_value"   
 
 
        #Try to add new attribute   
        option.opt55 = "opt55_value"   
        assert option.opt55 == False   
 
     except AssertionError :   
          traceback.print_exc()   
 
          print "Failed test4 : parsing ", str(L)   
          print "generated : ", option   
          print "internals : ", option.__repr__()   
          return -1   
 
    print "pass test4"   
    return 0 </CODE></PRE><P> You can't set additional attributes, nor override existing ones. This seems reasonable to me. Once your options are set, they should remain so. Notice that I don't throw an exception when try to override the value of option <EM>opt2</EM>. <BR> <BR><BR>
</P> <H3> Internals</H3><P>  How does all of this work ? Very simple : The command line is converted to a dictionary, which in turn is used to initialize the internal dictionary <STRONG>dict</STRONG> of the <EM> options</EM>  object. I also override the <STRONG>getattr</STRONG> and <STRONG>setattr</STRONG> methods. Those are used to 'get' and 'set' elements of the internal dictionary, and need to be overridden to make the object read-only. </P>  <P> Enjoy. </P> ]]></content:encoded>
      <guid> http://fons.github.com/simplified-command-line-processing-with-dyn-optionspy.html </guid>      
      <pubDate> Sat, 29 Aug 2009 19:15:49 EST </pubDate>
    </item>
    
    <item>
      <title> Factorials, Tail Recursion and CPS ... in C </title>
      <link> http://fons.github.com/factorials-tail-recursion-and-cps--in-c.html </link>
      <description> &lt;P&gt;Recursive algorithms are elegant. However, if the recursion is not a  &lt;A HREF="http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-453.pdf"&gt;tail call&lt;/A&gt;  the growth of the stack leads to a stack overflow. &lt;/P&gt;&lt;P&gt;Tail call recursion is a technique whereby the last call in a recursive function does not depend on the variables pushed on the stack. In other words the function returns the value of its additional (recursive) call. &lt;/P&gt;&lt;P&gt;Functional languages like Haskell or Lisp are designed to support the use of tail recursive algorithms.The JVM -although now the target platform of a lisp like &lt;A HREF="http://www.clojure.org"&gt;clojure&lt;/A&gt; or a hybrid functional language like &lt;A HREF="http://www.scala-lang.org"&gt;scala&lt;/A&gt; - &lt;A HREF="http://blogs.sun.com/jrose/entry/tail_calls_in_the_vm"&gt;does not support tail recursion at all&lt;/A&gt;. In C/C++ the compiler can in fact replace tail recursive calls with a simple loop, thereby eliminating the allocation for additional stack frames all together. In this post I'll consider various implementations of the humble factorial to illustrate some of these things. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>Recursive algorithms are elegant. However, if the recursion is not a  <A HREF="http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-453.pdf">tail call</A>  the growth of the stack leads to a stack overflow. </P><P>Tail call recursion is a technique whereby the last call in a recursive function does not depend on the variables pushed on the stack. In other words the function returns the value of its additional (recursive) call. </P><P>Functional languages like Haskell or Lisp are designed to support the use of tail recursive algorithms.The JVM -although now the target platform of a lisp like <A HREF="http://www.clojure.org">clojure</A> or a hybrid functional language like <A HREF="http://www.scala-lang.org">scala</A> - <A HREF="http://blogs.sun.com/jrose/entry/tail_calls_in_the_vm">does not support tail recursion at all</A>. In C/C++ the compiler can in fact replace tail recursive calls with a simple loop, thereby eliminating the allocation for additional stack frames all together. In this post I'll consider various implementations of the humble factorial to illustrate some of these things. </P><P><BR> The factorial of an integer n is defined as: </P><PRE><CODE>n! = n * (n-1)!  
1! = 1  
</CODE></PRE><P>Here's a straightforward implementation using a while : </P><PRE><CODE>fact = 1;  
while (n &gt; 0) {  
  	  fact *=n;  
	  n--;  
}  
</CODE></PRE><P>A naive recursive implementation looks like this : </P><PRE><CODE>	int  fact(int n) {  
        if (n == 1) return 1;  
   	    return n * fact(n-1);  
   } </CODE></PRE><P><EM>fact</EM> cannot return a value until all previous values have been popped of the stack, causing the stack to grow. A tail recursive version of the same algorithm uses an accumulator to remove the multiplication on the last line : </P><PRE><CODE>int fact(int n, int accum) {  
	if n==1 return accum;  
	return fact(n-1, n*accum);  
}  
</CODE></PRE><P>When discussing recursion the <A HREF="http://en.wikipedia.org/wiki/Fixed_point_combinator">y-combinator</A> cannot be far behind. </P><P><A HREF="www.dreamsongs.com/Files/WhyOfY.pdf">This document</A> discusses the reduction of a factorial in scheme to an application of the y-combinator. When I read this, a third possibility of calculating the factorial sprung to mind: pass the function as part of the recursion. This is very much like a <A HREF="http://library.readscheme.org/page6.html">CPS transformation</A>: Each subsequent call is passed a continuation of the previous calculation. </P><P>C doesn't support continuations. The role of the continuation is played by the stack, but that is not accessible as a first-order object in C. </P><P>A first approach is to rewrite the previous expressions as : </P><PRE><CODE>int fact(Func f, int n, int accum) {  
  	    if n == 1 return accum;  
  	    return f(f, n-1, n*accum);  
} </CODE></PRE><P>What's the signature of f ? It's a function which takes as it's first argument a function , which takes as it's first argument a function, which.... In other words we now need to explicitly type the recursion, which is not possible in C. Never mind then ? Well, we can now take advantage of C's lack of typing and change the implementation just a little: </P><PRE><CODE>     int fact(void* f, int val, int acc) {  
  	      if (val == 1) return acc;  
  	      Func* fptr = (Func * ) f;  
  	      return fptr((void *) f, val-1, acc*val);  
     } </CODE></PRE><P>This does the trick. </P><HR><P>The file <EM>factorial.c</EM> in <A HREF="http://github.com/fons/c_fact/tree/master">c_fact</A> contains all three implementations discussed above, as well as the 'loop' based calculation. I'm compiling on a 64 bit machine (AMD Turion dual core), running Ubuntu. </P><P>It's instructive too look at the assembly code generated by the compilet for each implementation, at various levels of optimization. </P><P>First, I compile a non-optimized debug version, like so : </P><PRE><CODE>gcc -m32 -g    -c -o factorial.o factorial.c  
gcc -m32 factorial.o -o factorial  
objdump -D ./factorial </CODE></PRE><P>The -m32 will force gcc to generate 32 bit code. The 64 bit code is not going to be materially different. <STRONG>objdump -D</STRONG> is used to disassemble the resulting executable. </P><P>Here is the dump for the first variation of the factorial function : </P><PRE><CODE> 0804845e factorial:  
 804845e:	55                   	push   %ebp  
 804845f:	89 e5                	mov    %esp,%ebp  
 8048461:	83 ec 08             	sub    $0x8,%esp  
 8048464:	83 7d 08 00          	cmpl   $0x0,0x8(%ebp)  
 8048468:	75 09                	jne    8048473 &lt;factorial+0x15&gt;  
 804846a:	c7 45 fc 01 00 00 00 	movl   $0x1,-0x4(%ebp)  
 8048471:	eb 17                	jmp    804848a &lt;factorial+0x2c&gt;  
 8048473:	8b 45 08             	mov    0x8(%ebp),%eax  
 8048476:	83 e8 01             	sub    $0x1,%eax  
 8048479:	89 04 24             	mov    %eax,(%esp)  
 **804847c:	e8 dd ff ff ff       	call   804845e &lt;factorial&gt;**  
 8048481:	89 c2                	mov    %eax,%edx  
 8048483:	0f af 55 08          	imul   0x8(%ebp),%edx  
 8048487:	89 55 fc             	mov    %edx,-0x4(%ebp)  
 804848a:	8b 45 fc             	mov    -0x4(%ebp),%eax  
 804848d:	c9                   	leave   
 804848e:	c3                   	ret     
</CODE></PRE><P>Notice that in position <EM>804847c</EM> the factorial function calls itself again. </P><P>The growth of the stack is quite obvious in the debugger: </P><PRE><CODE>#0  factorial (val=3) at factorial.c:23  
#1  0x08048481 in factorial (val=4) at factorial.c:24  
#2  0x08048481 in factorial (val=5) at factorial.c:24  
#3  0x08048481 in factorial (val=6) at factorial.c:24  
#4  0x08048481 in factorial (val=7) at factorial.c:24  
#5  0x08048481 in factorial (val=8) at factorial.c:24  
#6  0x08048481 in factorial (val=9) at factorial.c:24  
#7  0x08048481 in factorial (val=10) at factorial.c:24  
#8  0x0804861e in main (argc=2, argv=0xfffe8364) at factorial.c:66 </CODE></PRE><P>The assembly code generated for the other two implementations is not substantially different. </P><PRE><CODE>0804848f &lt;factorial2&gt;:  
 804848f:	55                   	push   %ebp  
 8048490:	89 e5                	mov    %esp,%ebp  
 8048492:	83 ec 0c             	sub    $0xc,%esp  
 8048495:	83 7d 08 01          	cmpl   $0x1,0x8(%ebp)  
 8048499:	75 08                	jne        80484a3 &lt;factorial2+0x14&gt;  
 804849b:	8b 45 0c             	mov    0xc(%ebp),%eax  
 804849e:	89 45 fc             	mov    %eax,-0x4(%ebp)  
 80484a1:	eb 1c                	jmp        80484bf &lt;factorial2+0x30&gt;  
 80484a3:	8b 45 08             	mov    0x8(%ebp),%eax  
 80484a6:	0f af 45 0c          	imul   0xc(%ebp),%eax  
 80484aa:	8b 55 08             	mov    0x8(%ebp),%edx  
 80484ad:	83 ea 01             	sub    $0x1,%edx  
 80484b0:	89 44 24 04          	mov    %eax,0x4(%esp)  
 80484b4:	89 14 24             	mov    %edx,(%esp)  
 80484b7:	e8 d3 ff ff ff       	call       804848f &lt;factorial2&gt;  
 80484bc:	89 45 fc             	mov    %eax,-0x4(%ebp)  
 80484bf:	8b 45 fc             	mov    -0x4(%ebp),%eax  
 80484c2:	c9                   	leave   
 80484c3:	c3                   	ret     
 
 080484c4 &lt;factorial3&gt;:  
 80484c4:	55                   	push   %ebp  
 80484c5:	89 e5                	mov    %esp,%ebp  
 80484c7:	83 ec 28             	sub    $0x28,%esp  
 80484ca:	83 7d 0c 01          	cmpl   $0x1,0xc(%ebp)  
 80484ce:	75 08                	jne        80484d8 &lt;factorial3+0x14&gt;  
 80484d0:	8b 45 10             	mov    0x10(%ebp),%eax  
 80484d3:	89 45 ec             	mov    %eax,-0x14(%ebp)  
 80484d6:	eb 29                	jmp        8048501 &lt;factorial3+0x3d&gt;  
 80484d8:	8b 45 08             	mov    0x8(%ebp),%eax  
 80484db:	89 45 fc             	mov    %eax,-0x4(%ebp)  
 80484de:	8b 45 10             	mov    0x10(%ebp),%eax  
 80484e1:	0f af 45 0c          	imul   0xc(%ebp),%eax  
 80484e5:	8b 55 0c             	mov    0xc(%ebp),%edx  
 80484e8:	83 ea 01             	sub    $0x1,%edx  
 80484eb:	89 44 24 08          	mov    %eax,0x8(%esp)  
 80484ef:	89 54 24 04          	mov    %edx,0x4(%esp)  
 80484f3:	8b 45 08             	mov    0x8(%ebp),%eax  
 80484f6:	89 04 24             	mov    %eax,(%esp)  
 80484f9:	8b 45 fc             	mov    -0x4(%ebp),%eax  
 80484fc:	ff d0                	call   *%eax  
 80484fe:	89 45 ec             	mov    %eax,-0x14(%ebp)  
 8048501:	8b 45 ec             	mov    -0x14(%ebp),%eax  
 8048504:	c9                   	leave   
 8048505:	c3                   	ret    </CODE></PRE><P>In both cases the recursion is indicated by the assembly instruction <EM>call</EM> on the function itself. </P><P>Ok, now let's turn up the optimization level a bit : </P><PRE><CODE>gcc -m32 -g -O2   -c -o factorial.o factorial.c  
gcc -m32 factorial.o -o factorial </CODE></PRE><P>Here's the assembly code generated by the compiler for all three versions of the factorial: </P><PRE><CODE> 08048410 &lt;factorial&gt;:  
 8048410:	55                   	push   %ebp  
 8048411:	b8 01 00 00 00       	mov    $0x1,%eax  
 8048416:	89 e5                	mov    %esp,%ebp  
 8048418:	8b 55 08             	mov    0x8(%ebp),%edx  
 804841b:	85 d2                	test   %edx,%edx  
 804841d:	74 09                	je         8048428 &lt;factorial+0x18&gt;  
 804841f:	90                   	nop     
 
 8048420:	0f af c2             	imul   %edx,%eax  
 8048423:	83 ea 01             	sub    $0x1,%edx  
 8048426:	75 f8                	jne        8048420 &lt;factorial+0x10&gt;  
 
 8048428:	5d                   	pop    %ebp  
 8048429:	c3                   	ret     
 804842a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi  
 
 08048430 &lt;factorial2&gt;:  
 8048430:	55                   	push   %ebp  
 8048431:	89 e5                	mov    %esp,%ebp  
 8048433:	8b 55 08             	mov    0x8(%ebp),%edx  
 8048436:	8b 45 0c             	mov    0xc(%ebp),%eax  
 8048439:	83 fa 01             	cmp    $0x1,%edx  
 804843c:	74 0d                	je         804844b &lt;factorial2+0x1b&gt;  
 804843e:	66 90                	xchg   %ax,%ax  
 
 8048440:	0f af c2             	imul   %edx,%eax  
 8048443:	83 ea 01             	sub    $0x1,%edx  
 8048446:	83 fa 01             	cmp    $0x1,%edx  
 8048449:	75 f5                	jne         8048440 &lt;factorial2+0x10&gt;  
 
 804844b:	5d                   	pop    %ebp  
 804844c:	c3                   	ret     
 804844d:	8d 76 00             	lea    0x0(%esi),%esi  
 
 
 08048450 &lt;factorial3&gt;:  
 8048450:	55                   	push   %ebp  
 8048451:	89 e5                	mov    %esp,%ebp  
 8048453:	8b 45 0c             	mov    0xc(%ebp),%eax  
 8048456:	8b 4d 08             	mov    0x8(%ebp),%ecx  
 8048459:	8b 55 10             	mov    0x10(%ebp),%edx  
 804845c:	83 f8 01             	cmp    $0x1,%eax  
 804845f:	74 0f                	je     8048470 &lt;factorial3+0x20&gt;  
 8048461:	0f af d0             	imul   %eax,%edx  
 8048464:	83 e8 01             	sub    $0x1,%eax  
 8048467:	89 45 0c             	mov    %eax,0xc(%ebp)  
 804846a:	89 55 10             	mov    %edx,0x10(%ebp)  
 804846d:	5d                   	pop    %ebp  
 
 804846e:	ff e1                	jmp    *%ecx  
 
 8048470:	5d                   	pop    %ebp  
 8048471:	89 d0                	mov    %edx,%eax  
 8048473:	c3                   	ret     
 8048474:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi  
 804847a:	8d bf 00 00 00 00    	lea    0x0(%edi),%edi  
</CODE></PRE><P>At this stage, the recursion has been replaced by a loop in the first two factorial algorithms (<EM>factorial</EM> and <EM>factorial2</EM> above). </P><P>In <EM>factorial3</EM> the recursion has been replaced by a <EM>jmp</EM> instruction. This is obviously an improvement on the call instruction, since it skips saving the instruction pointer on the stack. Note though, that for all intents and purposes the instruction pointer is already on the stack, as the function pointer is an argument of the function itself. </P><P>It's instructive so see how all three implementations behave in the debugger. If you set a breakpoint at the entry point of the first two functions the breakpoint will respected the first time functions are entered. Since the recursion has been replaced by a loop, this will be the only time the breakpoint is hit. </P><P>That's pretty annoying, if you want to examine the state of the function body on subsequent iterations by breaking on the function invocation. The compiler has rewritten your function body and essentially eliminated that entry point. Sure, gdb allows you to debug at the assembly level, and with the assembly generated by objdump in hand you're certainly in a position to examine the function body nevertheless. But you have to work a level down (or is it up ?) on the abstraction scale. The optimization of the third factorial function does not eliminate the function invocation as a possible break point. </P><HR><P>What's the upshot of all of this ? </P><P>A C/C++ compiler should be able to optimize a simple recursive algorithm, like the factorial, by replacing the recursive call with a loop. The compiler rewrites the function body, in effect changing the algorithm. That's easy to verify by inspecting the assembly code. For a simple enough function like the first two factorial functions this is probably not too much of an issue. When the function body is more complex, there is going to be a disconnect between the code generated by the compiler and the original. That will complicate reasoning about the correctness of it. </P><P>However, the extend of the rewrite can be controlled. The optimization of the CPS like algorithm kept the original function body substantially in place, but did remove the build up of the stack. The benefit here is that the non-optimized and optimized function are not substantially different, as can be verified by inspection in the debugger. </P> ]]></content:encoded>
      <guid> http://fons.github.com/factorials-tail-recursion-and-cps--in-c.html </guid>      
      <pubDate> Sun, 09 Aug 2009 10:27:45 EST </pubDate>
    </item>
    
    <item>
      <title> CL-BLIKY : A simple lisp based blog engine </title>
      <link> http://fons.github.com/cl-bliky--a-simple-lisp-based-blog-engine.html </link>
      <description> &lt;P&gt;I 'm writing this using a self rolled blog engine called cl-bliky. I'm indebted to an excellent &lt;A HREF="http://roeim.net/vetle/docs/cl-webapp-intro/"&gt;tutorial&lt;/A&gt; put together by &lt;A HREF="http://roeim.net/vetle/"&gt;Vetle Roeim&lt;/A&gt;. His goal was obviously to put together a compelling tutorial and he succeeded. My goal was to use lisp in a small programming project, and developing a simple and easily customizable blog engine seemed like a good start. &lt;/P&gt; </description> 
      <content:encoded><![CDATA[<P>I 'm writing this using a self rolled blog engine called cl-bliky. I'm indebted to an excellent <A HREF="http://roeim.net/vetle/docs/cl-webapp-intro/">tutorial</A> put together by <A HREF="http://roeim.net/vetle/">Vetle Roeim</A>. His goal was obviously to put together a compelling tutorial and he succeeded. My goal was to use lisp in a small programming project, and developing a simple and easily customizable blog engine seemed like a good start. </P><P><BR>cl-bliky is build around a simple <A HREF="http://www.weitz.de/hunchentoot/">hunchentoot</A> web server. <A HREF="http://www.cliki.net/htmlgen">Dynamically generated html</A>  is used to provide a gui interface to provide an upload and edit facility for posts as well as a facility for publishing posts to the remote repository. <A HREF="http://common-lisp.net/project/elephant/">Elephant</A> is used together with <A HREF="http://www.oracle.com/technology/products/berkeley-db/index.html">Berkeley DB</A> to store the posts on the local disk.  <BR> <BR> The engine generates static pages from the posts stored in the database. These static pages are pushed a local git repository, which is then synced with a remote one hosted on <A HREF="http://www.github.com">github</A>. Hosting is done through  <A HREF="http://pages.github.com">github pages</A>. Each post has a title, an introduction and a body. I decided against some automated form of intro generation, because at the end of the day you"ll never be able to get it completely right. Leaving it in the hands of the writer seems more appropriate.  <BR> <BR> The engine recognizes two types of blog posts : regular posts, and side bar elements.  Side bar elements are used for things like a list of code repositories or other short notices.  <BR> <BR> Posts can be written in html and markdown and I'm using <A HREF="http://common-lisp.net/project/cl-markdown/">cl-markdown</A> to translate <A HREF="http://daringfireball.net/projects/markdown">markdown</A> to html.A 'feature' I added was to remove all line feed characters and to mark an empty line with an <HTML>  break which seems to be added by the web interface. This makes the encoding of 'hard' line breaks a little different from standard<A HREF="http://daringfireball.net/projects/markdown">markdown</A>. This is now a combination of less than three spaces at the end of a line plus an additional line break followed by less than three spaces.  <BR> <BR> Pushing the posts to the local or remote git repository is straight forward. I used sbcl's proprietary sb-ext to accomplish this. I haven't tried to use other lisp implementations.     <BR> <BR> Posts can be recreated from the git repo. I do this by sprinkling the html generated by bliky with various "&lt;div id=.." tags. That allows me to identify the various pieces I need to recreate the underlying object. I use <A HREF="http://www.cliki.net/CL-HTML-Parse">cl-html-parse</A> to do this, followed by <A HREF="http://www.cliki.net/htmlgen">htmlgen</A> to recreate the html in the posts.  <BR> <BR> There 's some loss in fidelity, since I don't translate the html back to markdown. Also, things delimited by brackets &#60& &#62;  are interpreted as tags by the <A HREF="http://www.cliki.net/CL-HTML-Parse">cl-html-parse</A>. The text can obviously be copied and pasted from the original post, so this shouldn't be too much of a problem.  <BR> <BR> The current implementation assumes that  <A HREF="http://pages.github.com">github pages</A> are used for hosting. However, it should be relatively simple to add hosting through any other simple page server. </HTML></P> ]]></content:encoded>
      <guid> http://fons.github.com/cl-bliky--a-simple-lisp-based-blog-engine.html </guid>      
      <pubDate> Sun, 19 Jul 2009 11:49:05 EST </pubDate>
    </item>
    
  </channel>
</rss>