Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

More old posts and a new one

  • Loading branch information...
commit 295a87b6ac880ad6b3a7df3992c4f7017f9219dd 1 parent 9a7c1d3
@mitsuhiko authored
View
346 2007/5/21/getting-started-with-wsgi.rst
@@ -0,0 +1,346 @@
+public: yes
+tags: [wsgi, python]
+summary: |
+ A friendly introduction into getting started with WSGI on Python.
+
+Getting Started with WSGI
+=========================
+
+I finally finished the written matura and have some more time to work on
+projects and write articles. One of the things I wanted to write for a
+long time is a WSGI tutorial that does not require a specific framework
+or implementation. So here we go.
+
+.. image:: http://dev.pocoo.org/~mitsuhiko/wsgi-snake.png
+ :alt: Getting started with WSGI
+
+What's WSGI?
+~~~~~~~~~~~~
+
+Basically WSGI is lower level than CGI which you probably know. But in
+difference to CGI, WSGI does scale and can work in both multithreaded
+and multi process environments because it's a specification that doesn't
+mind how it's implemented. In fact WSGI is not CGI because it's between
+your web application and the webserver layer which can be CGI,
+mod_python, FastCGI or a webserver that implements WSGI in the core like
+the python stdlib standalone WSGI server called wsgiref.
+
+WSGI is specified in the `PEP 333
+<http://www.python.org/dev/peps/pep-0333/>`_ and adapted by various
+frameworks including the well known frameworks django and pylons.
+
+If you are too lazy to read the pep 333 here's a short summary:
+
+* WSGI application are callable python objects (functions or classes
+ with a `__call__` method that are passed two arguments: a WSGI
+ environment as first argument and a function that starts the response.
+* the application has to start a response using the function provided
+ and return an iterable where each yielded item means writing and
+ flushing.
+* The WSGI environment is like a CGI environment just with some
+ additional keys that are either provided by the server or a
+ middleware.
+* you can add middlewares to your application by wrapping it.
+
+Because that's a lot of information let's ignore it for now and have a
+look at a basic WSGI application:
+
+Extended Hello World
+~~~~~~~~~~~~~~~~~~~~
+
+Here a simple, but not too simple example of a WSGI application that
+says `Hello World!` where World can be specified via url parameter.
+
+.. sourcecode:: python
+
+ from cgi import parse_qs, escape
+
+ def hello_world(environ, start_response):
+ parameters = parse_qs(environ.get('QUERY_STRING', ''))
+ if 'subject' in parameters:
+ subject = escape(parameters['subject'][0])
+ else:
+ subject = 'World'
+ start_response('200 OK', [('Content-Type', 'text/html')])
+ return ['''Hello %(subject)s
+ Hello %(subject)s!
+
+ ''' % {'subject': subject}]
+
+As you can see the `start_response` function takes two arguments. A
+status string and a list of tuples that represent the response headers.
+What you cannot see because it's not used here and nowhere else is that
+the `start_response` function returns something. It returns a `write`
+function that directly writes to the webserver output stream. Because it
+bypasses middlewares (we'll cover that later) it's a terrible bad idea
+to use that function. For debugging purposes however it can be useful.
+
+But how to start that application now? A webserver doesn't know how to
+handle that and neither does python because nothing calls that function.
+Because we're lazy we don't setup a server with WSGI support now but use
+the `wsgiref` WSGI standalone server bundled with python2.5 and higher.
+(You can also download it for python2.3 or 2.4)
+
+Just add this to your file:
+
+.. sourcecode:: python
+
+ if __name__ == '__main__':
+ from wsgiref.simple_server import make_server
+ srv = make_server('localhost', 8080, hello_world)
+ srv.serve_forever()
+
+When you now start the file you should be able to get a `Hello John!` on
+`http://localhost:8080/?subject=John`.
+
+Path Dispatching
+~~~~~~~~~~~~~~~~
+
+You probably worked with CGI or PHP before. If you did so you know that
+you most of the time have multiple public files (`.pl` / `.php`) a user
+can access and that do something. Not so in WSGI. There you only have
+one file which consumes all paths. Thus if you have your server from the
+previous example still running you should get the same content on
+`http://localhost:8080/foo?subject=John`.
+
+The accessed path is saved in the `PATH_INFO` variable in the WSGI
+environment, the real path to the application in `SCRIPT_NAME`. In case
+of the development server `SCRIPT_NAME` will be empty, but if you have a
+wiki that is mounted on `http://example.com/wiki` the `SCRIPT_NAME`
+variable would be `/wiki`. This information can now be used to serve
+multiple indepentent pages with nice URLs.
+
+In this example we have a bunch of regular expressions and match the
+current request against that:
+
+.. sourcecode:: python
+
+ import re
+ from cgi import escape
+
+ def index(environ, start_response):
+ """This function will be mounted on "/" and display a link
+ to the hello world page."""
+ start_response('200 OK', [('Content-Type', 'text/html')])
+ return ['''Hello World Application
+ This is the Hello World application:
+
+ `continue <hello/>`_
+
+ ''']
+
+ def hello(environ, start_response):
+ """Like the example above, but it uses the name specified in the
+ URL."""
+ # get the name from the url if it was specified there.
+ args = environ['myapp.url_args']
+ if args:
+ subject = escape(args[0])
+ else:
+ subject = 'World'
+ start_response('200 OK', [('Content-Type', 'text/html')])
+ return ['''Hello %(subject)s
+ Hello %(subject)s!
+
+ ''' % {'subject': subject}]
+
+ def not_found(environ, start_response):
+ """Called if no URL matches."""
+ start_response('404 NOT FOUND', [('Content-Type', 'text/plain')])
+ return ['Not Found']
+
+ # map urls to functions
+ urls = [
+ (r'^$', index),
+ (r'hello/?$', hello),
+ (r'hello/(.+)$', hello)
+ ]
+
+ def application(environ, start_response):
+ """
+ The main WSGI application. Dispatch the current request to
+ the functions from above and store the regular expression
+ captures in the WSGI environment as `myapp.url_args` so that
+ the functions from above can access the url placeholders.
+
+ If nothing matches call the `not_found` function.
+ """
+ path = environ.get('PATH_INFO', '').lstrip('/')
+ for regex, callback in urls:
+ match = re.search(regex, path)
+ if match is not None:
+ environ['myapp.url_args'] = match.groups()
+ return callback(environ, start_response)
+ return not_found(environ, start_response)
+
+Now that's a bunch of code. But you should get the idea how URL
+dispatching works. Basically if you now visit
+`http://localhost:8080/hello/John` you should get the same as above but
+with a nicer URL and a error 404 page if you enter the wrong url. Now
+you could improve that further by encapsulating `environ` in a request
+object and replacing the `start_response` call and the return iterator
+with a response objects. This is also what WSGI libraries like `Werkzeug
+<http://werkzeug.pocoo.org/>`_ and `Paste
+<http://www.pythonpaste.org/>`_ do.
+
+By adding something to the environment we did something normally
+middlewares do. So let's try to write one that catches exceptions and
+renders them in the browser:
+
+.. sourcecode:: python
+
+ # import the helper functions we need to get and render tracebacks
+ from sys import exc_info
+ from traceback import format_tb
+
+ class ExceptionMiddleware(object):
+ """The middleware we use."""
+
+ def __init__(self, app):
+ self.app = app
+
+ def __call__(self, environ, start_response):
+ """Call the application can catch exceptions."""
+ appiter = None
+ # just call the application and send the output back
+ # unchanged but catch exceptions
+ try:
+ appiter = self.app(environ, start_response)
+ for item in appiter:
+ yield item
+ # if an exception occours we get the exception information
+ # and prepare a traceback we can render
+ except:
+ e_type, e_value, tb = exc_info()
+ traceback = ['Traceback (most recent call last):']
+ traceback += format_tb(tb)
+ traceback.append('%s: %s' % (e_type.__name__, e_value))
+ # we might have not a stated response by now. try
+ # to start one with the status code 500 or ignore an
+ # raised exception if the application already started one.
+ try:
+ start_response('500 INTERNAL SERVER ERROR', [
+ ('Content-Type', 'text/plain')])
+ except:
+ pass
+ yield '\n'.join(traceback)
+
+ # wsgi applications might have a close function. If it exists
+ # it *must* be called.
+ if hasattr(appiter, 'close'):
+ appiter.close()
+
+So how can we use that middleware now? If our WSGI application is called
+`application` like in the previous example all we have to do is to wrap
+it:
+
+.. sourcecode:: python
+
+ application = ExceptionMiddleware(application)
+
+Now all occouring exceptions will be catched and displayed in the
+browser. Of course you don't have to do that because there are many
+libraries that do exactly that and with more features.
+
+Deployment
+~~~~~~~~~~
+
+Now where the application is "finished" it must be installed on the
+production server somehow. You can of course use wsgiref behind
+mod_proxy but there are also more sophisticated solutions available.
+Many people for example prefer using WSGI applications on top of
+FastCGI. If you have `flup <http://trac.saddi.com/flup>`_ installed all
+you have to do is to defined a `myapplication.fcgi` with this code in:
+
+.. sourcecode:: python
+
+ #!/usr/bin/python
+ from flup.server.fcgi import WSGIServer
+ from myapplication import application
+ WSGIServer(application).run()
+
+The apache config then could look like this:
+
+.. sourcecode:: apache
+
+ <ServerName www.example.com>
+ Alias /public /path/to/the/static/files
+ ScriptAlias / /path/to/myapplication.fcgi/
+ </ServerName>
+
+As you can see there is also a clause for static files. If you are in
+development mode and want to serve static files in your WSGI application
+there are a couple of middlewares (werkzeug and paste as well as "static"
+from Luke Arno's tools provide that) available.
+
+NIH / DRY
+~~~~~~~~~
+
+Avoid the "Not Invented Here" problem and don't repeat yourself. Use
+the libraries that exist and their utilities! But there are so many!
+Which one to use? Here my suggestions:
+
+Frameworks
+^^^^^^^^^^
+
+Since Ruby on Rails appeared on the web everybody is talking about
+frameworks. Python has two major ones too. One that abstracts stuff very
+much and is called `Django <http://www.djangoproject.com/>`_ and the other
+that is much nearer to WSGI and called `pylons
+<http://www.pylonshq.com/>`_. Django is an awesome framework but only as
+long as you don't want to distribute your application. It's if you have to
+create a webpage in no time. Pylons on the other hand requires more
+developer interaction and your applications are a lot easier to deploy.
+
+There are other frameworks too but **my** experiences with them are quite
+bad or the community is too small.
+
+Utility Libraries
+^^^^^^^^^^^^^^^^^
+
+For many situations you don't want a full blown framework. Either because
+it's too big for your application or your application is too complex that
+you can solve it with a framework. (You can solve any application with a
+framework but it could be that the way you have to solve it is a lot more
+complex than without the "help" of the framework)
+
+For that some utility libraries exist:
+
+* `Paste <http://www.pythonpaste.org/>`_ — used by pylons behind the scenes.
+ Implements request and response objects. Ships many middlewares.
+
+* `Werkzeug <http://werkzeug.pocoo.org/>`_ — minimal WSGI library we wrote
+ for pocoo. Ships unicode away request and response objects as well as an
+ advanced URL mapper and a interactive debugger.
+
+* `Luke Arno's WSGI helpers <http://lukearno.com/projects/>`_ —
+ various WSGI helpers in independent modules by Luke Arno.
+
+There are also many middlewares out there. Just look for them at
+the `Cheeseshop
+<http://cheeseshop.python.org/pypi?:action=search&term=wsgi>`_.
+
+Template Engines
+^^^^^^^^^^^^^^^^
+
+Here a list of template engines I often use and recommend:
+
+
+* `Genshi <http://genshi.edgewall.org/>`_ — the world's best XML template
+ engine. But quite slow, so if you need a really good performance you have
+ to go with something else.
+
+* `Mako <http://www.makotemplates.org/>`_ — stupidely fast text based
+ template engine. It's a mix of ERB, Mason and django templates.
+
+* `Jinja2 <http://jinja.pocoo.org/>`_ — sandboxed, designer friendly and
+ quite fast, text based template engine. Of course my personal choice :D
+
+Conclusion
+~~~~~~~~~~
+
+WSGI rocks. You can simply create your own personal stack. If you think
+it's too complicated have a look at werkzeug and paste, they make things a
+lot easier without limiting you.
+
+I hope this article was useful.
View
108 2008/1/1/python-template-engine-comparison.rst
@@ -0,0 +1,108 @@
+public: yes
+tags: [python, jinja, mako, genshi]
+summary: |
+ A comparison of three of the most popular template engines for Python
+ and why they are different.
+
+Python Template Engine Comparison
+=================================
+
+I was small-talking with `zzzeek <http://techspot.zzzeek.org/>`_ about
+some things when I told him that I'm using Jinja, Genshi and Mako
+depending on what I'm doing. He told me that it's unusual to switch
+tools like that but I don't think it's that unusual.
+
+All three template engines are totally different but have a one thing in
+common: All three are the "second generation" of template engines.
+Genshi is the formal successor of kid, Mako somewhat replaced Mygthy and
+Jinja was inspired by the django templates. All three of them are
+framework agnostic, use unicode internally and have a cool API you can
+use in WSGI applications without scratching your head. But what inspired
+those template engines and which template engine to chose for which
+situation?
+
+I often used PHP in the past to do simple header/footer inclusion. But
+what always drove me nuts was that I had to use mod_rewrite to get nice
+URLs or use a bunch of folders with index.php files or use files and
+folders and drop the extension in the apache config. While this is nice,
+this is now that portable and you can't have dynamic parts in the URL
+and once you want some more dynamic stuff such as RSS feeds etc. you
+notice that you made a mistake by choosing PHP. Some days ago I then
+started working on the website for TextPress (not yet online) and wanted
+to try something new: I wrote a tiny WSGI application (about 50 lines of
+code) that just uses werkzeug's routing system and uses template names
+as endpoints. These templates are then loaded with Mako, rendered and
+returned as responses. This is not possible in the same way with Jinja
+because you don't have python blocks and not so simple and
+straightforward with Genshi because you have to think about XML or use a
+rather limited text based template engine. Another very cool feature of
+Mako is that you can do dynamic inheritance which is not possible in
+Jinja.
+
+Mako is a great template engine if you know Python, if you need some
+logic in templates (and you know: logic in templates is not bad in every
+situation) and if you need the best performance. Without a doubt Mako is
+one of the fastest template engines for Python and the fastest template
+engine that does not use a C extension.
+
+Then there is Jinja which is also a text based template engine like
+Mako. However the focus is on a completely different level. When Mako is
+like PHP, Jinja is like Smarty (even though Mako is a million times
+better than PHP as template engine). When I stated working with Python
+as programming language for web applications I stumbled about django. I
+looked at the template engine and thought: WTF is that? The syntax
+seemed odd and the restrictions ridiculous. Later on I loved the syntax
+(and apparently others do to: the mini template engine by Ian Bicking
+(tempita if I recall correctly) and the Genshi text templates are using
+that syntax or a similar one too) but some of the restrictions seem
+still ridiculous. When I looked at all those Django templates I created
+over the time I noticed that I often moved calculations into template
+tags that could be function calls, that I did other calculations in in
+the view functions that did not belong there and even more important:
+that you could replace 95% of the custom template tags with function
+calls or function calls with an enclosed template block if the template
+engine had proper expressions. This lead to the development of what is
+now known as Jinja. The syntax, the fact that it's sandboxed and the
+designed friendliness is still very similar to Django, but unlike Django
+python like expressions are possible in Jinja.
+
+I'm using Jinja wherever I think web designers want to work on later on.
+For example as template engine for TextPress or other applications that
+should be styled by third party web designers.
+
+Genshi on the other hand is an XML template engine. As a result of that
+it's slower but also "context aware". It knows when it's processing a
+CDATA section, it knows when it's inside a tag or an attribute etc. This
+makes it possible to defend XSS in an automatic way. Per default Genshi
+inserts the text into the output stream as text and not as markup. That
+means all the HTML entities are automatically escaped for you. And
+because it's stream based you can rewrite streams during the rendering
+process. This makes it possible to fill form fields automatically, use
+XInclude for simple layout templates and a lot more. You can even
+translate your XML based templates into HTML4 on the fly. So you can use
+your XML tool chain internally and output HTML4 and use the best of both
+worlds. But because of this high flexibility Genshi also has some
+problems to fight: You need to have XML knowledge to use it. No problem
+if you are a programmer, but not that good if you are a web designer
+doing fancy layouts. You are also forced to use XML templates
+everywhere. It's true that Genshi has text templates too to fill the
+gaps, but they are not comparable with real text template engines and
+you are still operating on an XML stream, just that you don't see it.
+And lastly: this whole stream processing makes Genshi slow. Not so slow
+that you can't use it for big applications, but noticeably slower than
+Mako or Jinja.
+
+If you are using XML anyways in your application, Genshi is a very good
+idea. Also if you don't have template designers that don't know XML or
+if performance is not that much of a problem. Most of the time the
+bottleneck is the database anyways. I never had real problems with
+Genshi performance so far.
+
+I hope this post sums up why I'm using all three template engines and
+why I think we should be happy that we can chose between a couple of
+template engines :-) Why I'm not covering other template engines like
+Cheetah or SimpleTAL? Mostly because I looked at them, tried them out
+and never used them for something big. Mostly because Mako looks a lot
+nicer than Cheetah to me and SimpleTAL is far too much away from Python
+for me.
+
View
136 2008/1/28/mercurial-for-subversion-users.rst
@@ -0,0 +1,136 @@
+public: yes
+tags: [hg]
+summary: |
+ Short introduction into mercurial from the perspective of a subversion
+ user.
+
+Mercurial for Subversion Users
+==============================
+
+More and more projects are switching over to `mercurial
+<http://www.selenic.com/mercurial/>`_ or similar DVCS. Great as
+mercurial is, it's hard to get started if you are used to subversion
+because the concept behind Subversion (svn) and mercurial (hg) is
+fundamentally different. This article should help you understand how
+mercurial and similar systems work and how you can use it to contribute
+patches to the pocoo projects.
+
+If you compare Subversion to mercurial you won't find that many
+similarities beside the command arguments. Subversion works like FTP
+whereas mercurial is bittorrent. In Subversion the server is special: it
+keeps all the revision log and all the operations require a connection
+to this server. In mercurial I can take down the central repository if
+there is one an all developers will still be able to exchange changes.
+All the revision information is available to anyone and there is
+absolutely no difference between server and clients.
+
+This fundamental design decision means that there are dozens of separate
+branches of the code. hg makes it easy to merge and branch and it's
+developed exactly for that. In Subversion branching and merging is
+painful an often people just don't branch and don't commit there changes
+until the testsuite etc. passes again which of course results in huge
+changesets. But let's step right into it!
+
+The first thing in Subversion you do is either creating a repository on
+the server or checking it out on the client. In hg there is no
+difference between server and client so the process of creating a
+repository is available to everybody. Creating a repository is just as
+simple as typing "hg init name_of_the_repository". If that folder does
+not exist yet it will create an empty folder and initialize it as root
+of the repository, otherwise it will create the repository in the name
+of that folder.
+
+The process of checking out is a bit different from Subversion because
+it's effectively the same as creating a branch. Say you want to check
+out the current Pygments version to do some changes. The first thing you
+will do is looking for a way to access this repository. There are three
+very common ways to access it: filesystem, HTTP or SSH. Pygments is
+available as SSH and HTTP, but for non core developers only HTTP is
+available. Interestingly quite a few people have problems locating the
+checkout URL which is not very surprising because hgweb handles that.
+hgweb is the standard mercurial web interface which doesn't only provide
+a way to look at the changesets and tree but also handles patch
+exchange. In the case of Pygments this command should give you a fresh
+checkout in a few seconds into the new folder "pygments":
+
+::
+
+ hg clone http://dev.pocoo.org/hg/pygments-main pygments
+
+One thing you will notice is that it's incredible fast and even though
+the repository contains the whole history the checkout is pretty small.
+By the time I'm writing this blog post the pygments sourcecode including
+the unittests and example sourcecode without the revision history is
+2.5MB. A complete mercurial checkout is only 5MB even though it includes
+486 changesets.
+
+After you got your very own repository by cloning the pygments one you
+will notice that all the subversion-like commands ("hg ci", "hg add",
+"hg up", ...) work locally only. You check into your local version of
+the repository and hg up won't incorporate remote changes. One of the
+things that happen on hg clone is that mercurial will set the path to
+the repository you cloned from into the hgrc of the newly created
+repository. This file (".hg/hgrc") is used to store per-repository
+configuration like the path of remote repositories, the name used for
+checkins, plugins that are only enabled for this repository and more.
+Executing "hg pull" will automatically pull changes from this remote
+repository and put them into the current repository as second branch. To
+see what "hg pull" will pull from that remote repository you can execute
+"hg incoming" and it will print a list of changesets that are in the
+remote repository but not yet in the local one. After you have pulled
+you have to update the repository with "hg up" so that you can actually
+see the changes. If there were remote changes that require merging you
+have to "hg merge" them and "hg ci" the merge.
+
+Because this process is very common there are ways to simplify it. "hg
+pull && hg update" can be written as "hg pull -u". All the commands
+(pull, update, merge and checkin if required) can be handled in one go
+using "hg fe". This command however is part of a plugin which is
+disabled by default. If you want to use it you have to add the following
+lines into the repository hgrc or your personal one:
+
+::
+
+ [extensions]
+ hgext.fetch=
+
+The other important difference to subversion is how you push your
+changes back to the server. In open source projects usually only a small
+number of developers has access to the main repository and contributors
+create patches using "diff" or "svn diff" and mail it to one of the
+persons with commit rights or attach it to a ticket in the project's
+tracker. If you are a person with push privileges you can do "hg push"
+and it will push the changesets which are not yet on the server (you can
+look at them using "hg outgoing"). If you don't have push access you can
+create a bundle of changes and attach that to a ticket rather than a
+patch. A bundle stores multiple changesets in one file and it also
+preserves the correct author information and timestamps. Another way is
+mailing the changes to a different developer using the patchbomb
+extension (I won't cover that here, just google it up). Or you can let
+other people pull from your repository. Therefore you either have to
+configure your apache to server a hgweb instance or you just call "hg
+serve" and it will spawn a server on localhost:8000 everybody can pull
+from.
+
+Once the developer has decided to put your changes into the central
+repository and pushed them, your changes will appear there unaltered and
+with the same revision hashes. What will be different is the local
+number the changeset is given. If the revision was called deadbeef:42
+locally it could be called deadbeef:52 on the server because different
+changesets were applied first.
+
+All the commands that interact with remote repositories ("hg pull", "hg
+push", "hg fe", ...) also take a different path than the default path
+from the hgrc as argument. This allows you to pull changes from
+repositories shared over the web.
+
+A cool example what mercurial allows you to do is our last ubuntuusers
+webteam meetup. There we used my notebook to store the central
+repository and everybody pushed the changes every once in a while to it.
+Additionally some people exchanged patches to not yet working features
+among each other so that the code on the central repo was seldom broken.
+When I left everybody had all the changes locally because they pulled
+and I could remove my notebook and everybody continued working on their
+way home. When we met again on IRC I copied my repo on the server and
+everybody pushed their local changes to it.
+
View
58 2008/7/17/deploying-python-web-applications.rst
@@ -0,0 +1,58 @@
+public: yes
+tags: [python, fabric]
+summary: |
+ Short tutorial on how to use fabric to deploy Python web applications.
+
+Deploying Python Web Applications
+=================================
+
+Every once in a while I'm really impressed by a library I stumble upon.
+A while back that was `virtualenv
+<http://lucumr.pocoo.org/cogitations/2008/07/05/virtualenv-to-the-rescue/>`_,
+now i stumbled upon `fabric <http://www.nongnu.org/fab/>`_. I was using
+capistrano for a `project I was working on <http://www.plurk.com/>`_
+which was kinda okay but somehow I wasn't sold to it.
+
+Yesterday however `apollo13 <http://djangopeople.net/apollo13/>`_
+stumbled upon fabric which is capistrano just in Python, with a working
+put command and less annoying in general.
+
+In combination with a custom virtualenv bootstrapping script Python web
+application deployment is a charm. One “fab bootstrap” later the servers
+are creating a virtual python environment, compiling all dependencies,
+checking out all eggs and initializing the application environment.
+Updates are just one “fab production deploy” away.
+
+And the best part is that fabric is not limited to Python. You can use
+it to deploy anything you can control over ssh.
+
+Here an example fabfile (the file that controls the deployment)
+
+.. sourcecode:: python
+
+ set(
+ fab_hosts = ['srv1.example.com', 'srv2.example.com']
+ )
+
+ def deploy():
+ """Deploy the latest version."""
+ # pull all changes from mercurial and touch the wsgi file to
+ # tell the apache to reload the application.
+ run("hg pull -u; touch application.wsgi")
+
+ def bootstrap():
+ """Asks for a list of servers and bootstrapps the application there."""
+ set(fab_hosts=[x.strip() for x in raw_input('Servers: ').split()])
+ run("hg clone http://repository.example.com/application")
+ local("./generate-wsgi-file.py &gt; /tmp/application.wsgi")
+ put("/tmp/application.wsgi", "application.wsgi")
+
+Saved as fabfile.py “fab bootstrap” then asks for some servers and
+bootstraps the application there, after changes in the repository you
+can “fab deploy” the latest version. Of course that's just a very basic
+made up example, but it shows how you can use fabric.
+
+I'm using makefiles currently to execute common tasks for various Python
+projects (like releasing code, resting unittests and much more), I
+suppose fabric could also do that for me. And that would have the
+advantage that it works for windows users too.
View
245 2009/4/8/c++-pitfalls.rst
@@ -0,0 +1,245 @@
+public: yes
+tags: [c++]
+summary: |
+ A concise list of some common C++ pitfalls.
+
+C++ Pitfalls
+============
+
+I just recently started using C++ for university (about two months ago)
+and still have a hard time accepting some of the weird syntax rules and
+semantics. For someone that mainly does Python development C++ feels
+very unnatural. In Python the syntax is clean and there are no
+ambiguities. C++ is drastically different in that regard. I know there
+are tons of resources on the net about C++ pitfalls already, but I
+thought I have to add my own for people switching to C++ with a
+background in Python and/or C.
+
+Private is the new Public
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In Python you usually don't have to worry too much about that topic
+because the language is very dynamic, but it's somewhat of an issue in
+C. I'm talking about hiding implementation details behind a surface in a
+way that you can change the implementation in later version of a library
+without breaking backwards compatibility. If you have an object in C, or
+something that looks/works like an object, you usually have some kind of
+typedef to a struct or `void*`, and functions to create, delete and
+manipulate it. The reason why a lot of code does it that way is that the
+size and position of the struct members these functions access, is not
+stored in the calling code. So you can safely change the size of the
+struct for later versions of the library and code that compiled against
+the older version of the library continues to work.
+
+If you look at C++ classes, you will sooner or later notice that the
+“new” operator and size of the allocated structure where the operator is
+called. That means if you change the size of your class later (by adding
+a new private member for example) you have to recompile the code,
+otherwise new would not allocate enough memory and most likely crash in
+your constructor call. I don't really know how C++ libraries solve that
+problem, but I suppose they provide wrapper classes that contain the
+constructor call and proxy all the calls.
+
+So be warned. If you add private members existing code will no longer
+work without recompilation.
+
+It's a Constructor call, no, it's a Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This one is pretty obvious if you know C, but it's still something that
+could baffle a newcomer. If you have a class with a constructor that
+accepts arguments and you want to allocate it on the stack, you would
+usually do it like this:
+
+.. sourcecode:: c++
+
+ MyClass obj(param1, param2, param3);
+
+However, if you have a default constructor without argument, you *have*
+to create the instance without the parentheses:
+
+.. sourcecode:: c++
+
+ MyClass obj;
+
+The reason for this is of course, that with the parentheses after `obj`
+you would declare a function called obj that takes no arguments and
+returns a `MyClass` object by value.
+
+There are more cases like this, the others are harder to spot. This
+code for example fails to compile because `foo` is declared as function
+accepting another function returning a `std::string` object, taking no
+parameters:
+
+.. sourcecode:: c++
+
+ #include
+ #include
+
+ class Foo {
+ public:
+ Foo(const std::string msg) : m_msg(msg) {}
+ void display() { std::cout << m_msg << std::endl; }
+ private:
+ std::string m_msg;
+ };
+
+ int main()
+ {
+ Foo foo(std::string());
+ foo.display();
+ }
+
+The correct way to create a `foo` object in that situation is using
+the long version of the initialization syntax:
+
+.. sourcecode:: c++
+
+ Foo foo = Foo(std::string());
+
+So whenever you get an error message that contains something that
+looks like a function pointer where you expected to have an object,
+you probably stumbled upon that limitation in the syntax.
+
+More Constructor / Destructor Fun
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+But there is more fun with constructors and destructors. C++ creates
+some of them for you, if you don't do it yourself. Basically the C++
+compiler adds a default constructor for you if you did not declare a
+constructor for the class, and it will add a copy constructor if you
+did not declare a copy constructor. The same thing happens for the
+infamous `operator=` which is created by default as well.
+
+This becomes a problem if you have pointers in your class which are
+not copied. So what most people do is declare some operators and
+constructors as private and don't implement them. That way the
+compiler will give you errors if you try to create copies of the
+objects:
+
+.. sourcecode:: c++
+
+ class MyClass {
+ private:
+ MyClass(const MyClass &);
+ MyClas &operator=(const MyClass &);
+ }
+
+Also if you plan to subclass your class, you *have* to declare the
+destructor virtual, otherwise subclasses will not be able to add new
+members. However the compiler will not warn about that, so be
+warned.
+
+If you want your class to be copyable and you have subclasses, don't
+forget to call `operator=` of the parent class. Because `operator=`
+nearly works like a copy constructor you can easily forget to call
+the operator of the parent function. But if you don't do that, the
+parent members are not copied.
+
+Rules for Operator Overloading
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you do operator overloading, there are some rules you have to
+follow. They are not that hard to remember, but not following them
+will cause memory leaks and headaches.
+
+`operator=`
+ … returns a reference to `this`
+
+`operator+` and friends
+ … return the newly constructed object *by value*. Do not use
+ “new”!
+
+`operator[]`
+ … returns a reference. Otherwise it's not possible to add/change
+ items.
+
+`operator bool` and friends
+ … are declared without return value!
+
+Pointers VS Exceptions
+~~~~~~~~~~~~~~~~~~~~~~
+
+Take a look at this code:
+
+.. sourcecode:: c++
+
+ int main()
+ {
+ FILE *file = fopen("myfile.txt", "r");
+ if (!file) {
+ fwrite(stderr, "Could not open file\n");
+ return 1;
+ }
+ do_something_with(file);
+ fclose(file);
+ return 0;
+ }
+
+While this code would work perfectly fine in C, it's very dangerous
+in C++ because `do_something_with` could raise an exception. Even
+if *you* don't raise one there, something else could still raise one
+(Like for example “new”). The correct solution for this particular
+problem would be using streams of course, but if you need to work
+with pointers, wrap them in something that closes the resource in
+the destructor:
+
+.. sourcecode:: c++
+
+ class File {
+ public:
+ File(const char *filename) : m_handle(fopen(filename, "r")) {}
+ ~File() { if (m_handle) fclose(m_handle); }
+ FILE *get() { return m_handle; }
+ bool operator!() { return !m_handle; }
+
+ private:
+ FILE *m_handle;
+ File(const File &);
+ File &operator=(const File &);
+ };
+
+ int main()
+ {
+ File file("myfile.txt");
+ if (!file) {
+ fwrite(stderr, "Could not open file\n");
+ return 1;
+ }
+ do_something_with(file.get());
+ return 0;
+ }
+
+Now when the object goes out of scope, the destructor is called and the
+file is properly closed if it was open.
+
+More Syntax Problems
+~~~~~~~~~~~~~~~~~~~~
+
+Because C++ is based on some older version of C it continues to carry some
+of C's problems around. One is the preprocessor which does not play well
+with templates for example. If you plan to create a `FOREACH` macro, the
+chance is high that the following code won't work:
+
+.. sourcecode:: c++
+
+ FOREACH(Pair item, items) { ... }
+
+The preprocessor does not know that `Pair<...>` belongs together and will
+try to split it up.
+
+Another common problem seems to be that wrapped template definitions often
+end in “>>” which the parser interprets as right-shift but you actually
+wanted to close two templates. In this case you have to add some
+whitespace:
+
+.. sourcecode:: c++
+
+ // wrong
+ std::list<shared_ptr<Foo>>
+
+ // correct
+ std::list<shared_ptr<Foo> >
+
+I suppose there is more I missed, but these are the ones that caused my
+some headache already. I'll update the post when I discover more.
View
45 2009/7/14/free-vs-free.rst
@@ -0,0 +1,45 @@
+public: yes
+tags: [licensing]
+summary: |
+ Why the BSD licenses work for me.
+
+free VS free
+============
+
+Seems like my favourite `discussion
+<http://zedshaw.com/blog/2009-07-13.html>`_ `is back
+<http://jacobian.org/writing/gpl-questions>`_. In the ring two guys: Zed
+Shaw, the developer of lamson and mongrel, on the other side we have
+Jacob Kaplan-Moss Django's BDFL.
+
+This time the discussion seems to be entitled "Because the only thing
+better than an arbitrarily restrictive license is an ambiguously
+restrictive license" [`via twitter
+<http://twitter.com/jacobian/status/2598708129>`_]. I won't warm up the
+discussion with new arguments (promised) but what I found most
+interesting about the discussion is `Zed's blog post
+<http://zedshaw.com/blog/2009-07-13.html>`_ why he's using the (A/L)GPL.
+Basically what he's saying is that he does not want to be burned again
+like he was with Mongrel and uses the GPL to force people to contribute.
+
+I'm not exactly sure how that supports freedom. I might be idealistic
+here, but what motivates me the most about the open source libraries I
+work on is how they are used. I got mails from developers in many
+companies that are using various `Pocoo <http://dev.pocoo.org/>`_
+libraries internally and cannot contribute patches due to restrictions
+in the company structure. Every once in a while I get patches those
+developers craft in their free time and very often I don't get any.
+However the point is, that I can see people using my stuff which
+motivates.
+
+I'm not making money with my libraries, but that's probably because I'm
+not a friend of selling code. I love to give the stuff away I'm working
+on, and get payed for support if one needs it. And so far this worked
+flawlessly for me.
+
+Forcing people to freedom is not exactly my definition of being free.
+
+**So dear users: Use my stuff, have fun with it. And letting me know
+that you're doing is the best reward I can think of. And if you can
+contribute patches, that's even better.**
+
View
204 2009/8/5/pro-cons-about-werkzeug-webob-and-django.rst
@@ -0,0 +1,204 @@
+public: yes
+tags: [werkzeug, django, python]
+summary: |
+ Why Werkzeug and WebOb exist when Django is already around.
+
+Pro/Cons about Werkzeug, WebOb and Django
+=========================================
+
+Yesterday I had a discussion with `Ben Bangert
+<http://twitter.com/benbangert>`_ from the Pylons team, `Philip Jenvey
+<http://twitter.com/pjenvey>`_ and zepolen from the pylons IRC channel.
+The topic of the discussion was why we have Request and Response objects
+in `Werkzeug <http://werkzeug.pocoo.org/>`_, `WebOb
+<http://pythonpaste.org/webob/>`_ and `Django
+<http://djangoproject.com/>`_ and what we could to to improve the
+situation a bit.
+
+We decided on writing down what we like or dislike on these three
+systems in order to find out in which direction to go, so this is my
+attempt. Please keep in mind that this are my opinions only!
+
+WebOb
+~~~~~
+
+Let's start with WebOb which is the smallest of the three libraries in
+question. WebOb really just sticks to the basics and provides request
+and response objects and some data structures required.
+
+The philosophy of WebOb is to stay as compatible to paste as possible
+and that modifications on the request object appear in the WSGI
+environment. That basically means that when you do anything on the
+request object and you create another one later from the same
+environment you will see your modifications again.
+
+This is without doubt something that neither Werkzeug or Django do. Both
+Werkzeug and Django consider the incoming request something you should
+not modify, after all it came from the client. If you need to create a
+request or WSGI environment in Werkzeug you get a separate utility for,
+that is designed for exactly that purpose.
+
+While I have to admit that the idea of a reflecting request object is
+tempting, I don't think it's a good idea. Using the WSGI environment as
+a communication channel seems wrong to me. The main problem with it is
+that WebOb cannot achieve what it's doing with standard environment
+keys. There are currently five WebOb keys in the environment for
+“caching” purposes and for compatibility with paste it also understands
+a couple of paste environment keys.
+
+The idea is that other applications can get a request again at a
+completely different point, but I'm not sure if WSGI is the correct
+solution for that particular problem. Reusable applications based on the
+complex WSGI middleware system seems to be the wrong layer to me.
+
+Some other parts where I don't agree with the WebOb concepts:
+
+* The parsing of the data is implemented either in private functions
+ or directly in the request object. I strongly prefer giving the user
+ the choice to access the parser separately. Sometimes you really just
+ need a cookie parsed, why create a full request object then?
+* WebOb uses `request.GET` and `request.POST` for URL parameters and
+ form data. Because you can have URL parameters in non-GET requests as
+ well this is misleading, for POST data it's wrong as well because form
+ data is available in more than just POST requests. Accessing
+ `request.POST` to get form data in a PUT request seems wrong.
+* WebOb still uses `cgi.FieldStorage` and not only internally but also
+ it puts those objects into the `POST` dict. This is not the best idea
+ for multiple reasons. First of all users are encouraged to trust their
+ submitted data and blindly expect a field storage object if they have
+ a upload field in their form. One could easily cause trouble by
+ sending forged requests to the application. If logging is set up the
+ administrator is sent tons of error mails instantly. I strongly prefer
+ storing uploaded files in a separate dictionary like Django and
+ Werkzeug do. The other problem with using `FieldStorage` as parser is
+ that it's not WSGI compliant by requiring a size argument on the
+ readline function and that it has a weird API. You can't easily tell
+ it to not accept more than n bytes in memory and to switch between in
+ memory uploading and a temporary file based on the length of the
+ transmitted data. Also `cgi.FieldStorage` supports nested files which
+ no browser supports and which could cause internal server errors as
+ well because very few developers know that a) nested uploads exist and
+ b) that the field storage object behaves differently if a nested
+ uploaded file is transmitted.
+* Also WebOb barks on invalid cookies and throws away all of them if
+ one is broken. This is especially annoying if you're dealing with
+ cookies outside of your control that use invalid characters (stuff
+ such as advertisement cookies)
+
+Now to the parts where WebOb wins over Django and Werkzeug:
+
+* Unlike Django and Werkzeug WebOb provides not only a unicode API but
+ also a bytestring based API. This could help existing applications
+ that are not unicode ready yet. Downside is that with the current
+ plans of Graham for WSGI on Python 3 there do not seem to be ways to
+ support it on Python 3.
+* WebOb supports the HTTP range feature.
+* The charset can be switched on the fly in WebOb, in Werkzeug you set
+ the charset for your request/response object and from that point
+ nowards it's used no matter what. In Django the charset is application
+ wide.
+
+An interesting thing is that WebOb uses `datetime` objects with timezone
+informations. The tzinfo attribute is set to a tzinfo object with an UTC
+offset of zero. That's different to Werkzeug and Django which use
+offset-naive `datetime` objects. Because Python treats them differently
+and does not support operations that mix those. Unfortunately the
+`datetime` module makes it hard to decide what to do. Personally I
+decided to use `datetime` objects that have no tzinfo set and only dates
+in UTC.
+
+Werkzeug
+~~~~~~~~
+
+In terms of code base size Werkzeug's next. The problem with Werkzeug
+certainly is that it does not really know what belongs into it and what
+not. That situation will slightly improve with the next version of it
+when some deprecated interfaces go away and when the debugger is moved
+into a new library together with all sorts of debugging tools such as
+profilers, leak finders and more (enter `flickzeug
+<http://dev.pocoo.org/projects/flickzeug/>`_).
+
+Werkzeug is based on the principle that things should have a nice API
+but at the same time allow you to use the underlying functions. For
+example you can easily access `request.form` to get a dict of uploaded
+form data, but at the same time you can call `werkzeug.parse_form_data`
+to parse the stuff into a multidict. You can even go a layer down and
+tell Werkzeug to not use the multidict and provide a custom container or
+a standard dict, list, whatever.
+
+Also Werkzeug has a slightly different goal than WebOb. WebOb focuses on
+the request and response object only, Werkzeug provides all kind of
+useful helpers for web applications. The idea is that if there is a
+function you can use, you are more likely to use it than that you
+reimplement it. For example many applications take the uploaded file
+name and just create a file with the same name. This however turns out
+to be a security problem so Werkzeug gives you a function
+(`werkzeug.secure_filename`) you can use to get a secure version of the
+filename that also is limited to ASCII characters.
+
+So obviously there is a lot of stuff in Werkzeug you probably would not
+expect there.
+
+So here some of the things I like especially about Werkzeug:
+
+* The request/response objects. They are designed to be lightweight
+ and can be extended using mixins. Werkzeug also provides full-featured
+ request objects that implement all shipped mixins. Also the
+ request/response objects are not doing any parsing or dumping, that is
+ all available through separate functions as well which makes the code
+ readable and easy to extend.
+* It fixes many problems with the standard library or reimplements
+ broken features. It does not depend on the `cgi.FieldStorage` since
+ 0.5, allows you to limit the uploaded data before it's consumed. That
+ way an attacker cannot exhaust server resources.
+* The data structures provide handy helpers such as raising key errors
+ that are also bad request exceptions so that if you're not catching
+ them, you are at least not generating internal server errors as long
+ as the base `HTTPException` is catched.
+* Werkzeug uses a non-data descriptor for the properties on the
+ request and response objects. The first time you access the property
+ code is executed and that is stuffed into the dict. After that there
+ is no runtime penalty when accessing the attributes.
+
+And of course here the list of things that are not that nice:
+
+* It's too large for a library that only wants to implement request
+ and response objects.
+* There is no support for if-range and friends.
+* The response stream is useless because each `write()` ends up as a
+ separate “item” in the application iterator. Because each item is
+ followed by a flush it makes the response stream essentially useless.
+* The `MultiDict` is unordered which means that some information is
+ lost.
+* The response object modifies itself on `__call__`. This allows some
+ neat things like automatically fixing the location header, but in
+ general that should happen temporarily when called as WSGI application
+ instead of modifying the object.
+
+Django
+~~~~~~
+
+Now Django isn't exactly a reusable library for WSGI applications but it
+does have a request and response object with an API, so here my thoughts
+on it:
+
+* URL arguments are called `request.GET` like in WebOb, but files and
+ form data was split up into `request.POST` and `request.FILES`.
+* The request object is unicode only and the encoding can be set
+ dynamically.
+* Problem is, they don't work with non-Django WSGI applications.
+
+Chances on a common Request Object?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+WebOb and Werkzeug will stick around, and the chances that Django starts
+depending on external libraries for the Request object are very, very
+low. However it could be possible to share the implementation of the
+HTTP parsers etc.
+
+To be humble, I would not want to break Werkzeug into two libraries for
+utlities and request/response objects and parsers because of the current
+packaging situation. A lot of small stuff I work on works perfectly fine
+with nothing but what Werkzeug provides which is pretty handy. So yes,
+it's selfish to not break it up, but that's how I feel about the
+situation currently.
View
450 2010/11/24/collections-in-c.rst
@@ -0,0 +1,450 @@
+public: yes
+tags: [c]
+summary: |
+ Using the C preprocessor to achive basic generic collection types.
+
+Collections in C
+================
+
+Inspired by Minecraft (like countless other people out there) I decided to
+give a clone a try to see how this can be accomplished. My last few
+adventures into the 3D world went through C++ but that always presented me
+with a problem: The C++ code I write does not scale well to computer
+games. This time I thought I can probably try to work around the problem
+by forcing myself to simple C.
+
+This worked really well up to the point where I needed two kinds of lists.
+One that accepted floats and another one that works with arbitrary
+pointers.
+
+The C++ way for this is obvious: use templates. That's also one of the
+best features in C++ and works surprisingly well for the (I imagine) great
+hack it is. However in C we don't have that but the closest to templates
+in C (or code generation in general) is the preprocessor.
+
+A warning upfront: all these examples use very generic names. This is a
+very bad idea. If you want to use something like this in your own code,
+make sure to prefix all macros, functions, types etc. with a unique
+prefix. (For instance instead of ``CAT`` name it ``MYLIB_CAT``).
+
+Abusing the Preprocessor
+------------------------
+
+Generally there are two ways to do code generation for C. One involves
+external tools that create new C files, the other involves the C
+preprocessor which is normally intended to expand macros, include other
+files or remove comments before the compiler goes over your code.
+
+I normally like to avoid tools that generate new C files for the very
+simple reason that these usually generate ugly looking code I then have to
+look at which is annoying or at least require yet another tool in my
+toolchain which causes headaches when compiling code on more than on
+operating system. A simple Python script that generates a C file sounds
+simple, but it stops being simple if you also want that thing to be part
+of your windows installation where Python is usually not available or
+works differently.
+
+So for my collections I went straight with the C preprocessor. When you
+limit yourself to the C preprocessor (CCAT) there are two common usage
+patterns:
+
+- The most common is generating code in a macro. This has the downside
+ that macros are always expanded into a single line which has brings
+ nearly useless error messages if you make a mistake on expanding.
+
+ The second downside of this approach is that macros generally are
+ really unfriendly to write. They must be in a single line and when
+ you need more than one you have to backslash escape the newline. If
+ you forget about that, welcome to interesting compiler errors.
+
+ Macros are fine if they are small, bug looking at endless lines of
+ generated C code from the preprocessor can be hairy and frustrating.
+
+- The second approach is to move your implementation into a standard C
+ file and use macros to replace the dynamic parts of the
+ implementation. (Such as the function names, storage types and other
+ things).
+
+ This is what I ended up doing for my collection classes.
+
+Features of the Preprocessor
+----------------------------
+
+Before we head over to the details of the implementation, let's have a
+look at some of the features of the C preprocssor.
+
+``#include "file"``
+ This is the best known feature of the preprocessor. It includes
+ another file from the search path and inserts it at the same location.
+ Unless you protect your file from multiple inclusions with include
+ guards (macros that are set and detected) you can include a file more
+ than once. This is actually quite helpful in our case.
+
+``#error "message"``
+ With this directive you can emit an error message that will abort the
+ execution. You can combine it with preprocessor conditionals to
+ notify the user about unsupported configurations, platforms etc. In
+ our case it can help us giving the user feedback about missing
+ defines.
+
+``#ifdef / #ifndef``
+ Executes code only if a macro of a given name was defined or not.
+ This does not support checking for more than one macro, so often it
+ makes more sense to use ``#if defined(MACRO)``.
+
+``#if / #elif / #else``
+ Can test arbitrary preprocessor conditions. It can perform basic
+ arithmetic and check if other macros are defined (``defined(X)``).
+
+``#define MACRO value``
+ Defines a new simple macro. From that point onwards each occurrence
+ of `MACOR` will be replaced with `value`. In fact, after `MACRO` more
+ than one C token of any form can be placed if necessary. You can let
+ a macro be replaced by a full function definition.
+
+``#define MACRO(X) X``
+ Macros can also have parameters that are simple tokens. Whenever that
+ token appears in the list of tokens that will act as replacement, that
+ token is replaced with the actual token that was passed to the macro.
+
+ For example this is a very simple and stupid way to specify an `abs()`
+ macro that will take a value and return the absolute value:
+
+ .. sourcecode:: c
+
+ #define ABS(X) ((X) < 0 ? -(X) : (X))
+
+ The preprocessor will then expand the macro upon usage:
+
+ .. sourcecode:: c
+
+ int x = ABS(-42);
+
+ /* this is expanded to this: */
+
+ int x = ((-42) < 0 ? -(-42) : (-42))
+
+ The additional parentheses are there to avoid ambuiguities in case
+ there are operators involved in the passed expression.
+
+ Because macro arguments work by replacing tokens I always use
+ uppercase letters as first letter of a macro argument. The reason for
+ this is that nothing in my C code is written in camelcase and thus
+ there is no way this could clash with an actual token that might be in
+ use.
+
+``#``
+ Inside macro expressions the ``#`` operator can be used to convert the
+ following macro argument token passed into a string. Please keep in
+ mind that this only works for macro arguments, not arbitrary tokens.
+ This is very helpful if you want to implement things like `assert()`
+ and have helpful error messages:
+
+ .. sourcecode:: c
+
+ #define assert(Expr) do { \
+ if (!(Expr)) fail_with_message("Assertion failed: " #Expr, \
+ __LINE__, __FILE__); \
+ } while (0)
+
+ This also showcases two other things you have to keep in mind when
+ using the preprocessor:
+
+ 1. The macro might be used in the body of an if expression and using
+ a sole `if` there might cause the dangling else problem. As a
+ simple workaround, always wrap your macros in a loop that only
+ runs once (``do { ... } while (0)``). Also make sure to not
+ include a trailing semicolon. The user of the macro should add
+ the semicolon, not the author of the macro.
+
+ 2. If a macro spans more than one line you have to escape the
+ newlines by adding a backslash before them. Also be sure not to
+ add any other whitespace before the newline or this will break.
+
+``##``
+ The ``##`` operator can be used to concatenate a macro argument token
+ with any other token. Again, this only works if a macro argument
+ token is involved, it will not work on arbitrary tokens.
+
+ This can for example be used to dynamically generate functions that
+ are prefixed with something else:
+
+ .. sourcecode:: c
+
+ #define TEST(TestName) int mylib_##TestName(void)
+
+ TEST(foo)
+ {
+ assert(foo == 42);
+ }
+
+Preprocessor Utilities
+----------------------
+
+Now that we know the basics of the preprocessor we can also infer what
+probelms might exist. Mainly the interesting operators for code
+generation (``#`` and ``##``) can only operate on macro arguments. This
+is not a problem for the former, but it will become somewhat of a
+limitation in case of the latter. Thankfully this can be countered
+nicely with another macro
+
+.. sourcecode:: c
+
+ #define _CAT(A, B) A##B
+ #define CAT(A, B) _CAT(A, B)
+
+Why do we need two macros here? Wouldn't the first macro be enough to
+concatenate macros? Unfortunately not because when a macro argument is
+another macro argument it wouldn't be expanded. Look here:
+
+.. sourcecode:: c
+
+ #define CAT(A, B) A##B
+
+ int
+ main(void)
+ {
+ int CAT(foo, CAT(bar, baz));
+ }
+
+This would generate the following C code:
+
+.. sourcecode:: c
+
+ #define CAT(A, B) A##B
+
+ int
+ main(void)
+ {
+ int fooCAT(bar, baz);
+ }
+
+The extra indirection solves this problem nicely.
+
+The second macro I like to declare for code generation is an ``UCAT``
+macro that concatenates two tokens with an underscore instead of
+concatenating them directly:
+
+.. sourcecode:: c
+
+ #define UCAT(A, B) CAT(A, CAT(_, B))
+
+
+Creating a List Header
+----------------------
+
+Now we have everything to get started implementing a simple list type.
+For this we first create a header where we declare all list types we want
+to use. In my case I am interested in a list for pointers and floats.
+The header looks like this:
+
+.. sourcecode:: c
+
+ #ifndef _INC_LIST_H_
+ #define _INC_LIST_H_
+
+ /* list of pointers */
+ #define _COLLECTION_TYPE void *
+ #define _COLLECTION_NAME list
+ #define "_list.h"
+
+ /* list of floats */
+ #define _COLLECTION_TYPE float
+ #define _COLLECTION_NAME floatlist
+ #define "_list.h"
+
+ #endif
+
+As you can see we have a standard include guard and then we include
+another header in there twice (once for each list type we want to have).
+Before including that header, we also define the type for the list and the
+name we want to use.
+
+That header then declares the struct for the list and the methods we want
+to have. For this to work we will need another header that is used both
+by this header as well as the implementation C file. Let's call this
+header `_collection_pre.inc`. Because we have a `pre` header we will also need
+a `post` header (`_collection_pre.inc`). The purpose of the `pre` header is
+to declare some helper macros that return function names prefixed with the
+necessary name and the idea of the `post` header is to get rid of these
+macros again to allow the inclusion of this header another time (for the
+next type).
+
+This is what these headers look like:
+
+`_collection_pre.inc`:
+
+.. sourcecode:: c
+
+ /* include the header that declares CAT and UCAT */
+ #include "pputils.h"
+
+ /* ensure that the includer set type and name */
+ #if !defined(_COLLECTION_TYPE) || !defined(_COLLECTION_NAME)
+ # error "Includer has to set _COLLECTION_TYPE and _COLLECTION_NAME"
+ #endif
+
+ /* helper macros to declare types and methods */
+ #define _COLLECTION_TYPENAME SC_PP_UCAT(_COLLECTION_NAME, t)
+ #define _COLLECTION_METHOD(Name) SC_PP_UCAT(_COLLECTION_NAME, Name)
+
+
+`_collection_post.inc`:
+
+.. sourcecode:: c
+
+ /* get rid of everything declared in _collection_pre.h and the includer */
+ #undef _COLLECTION_NAME
+ #undef _COLLECTION_TYPE
+ #undef _COLLECTION_TYPENAME
+ #undef _COLLECTION_METHOD
+
+Now we finally have everything in place to implement our `_list.h` header
+that declares types and methods. This is how it can look like:
+
+.. sourcecode:: c
+
+ #include "_collection_pre.inc"
+
+ typedef struct {
+ size_t size;
+ size_t allocated;
+ _COLLECTION_TYPE *items;
+ } _COLLECTION_TYPENAME;
+
+ /* creates a new list */
+ _COLLECTION_TYPENAME *_COLLECTION_METHOD(new)(void);
+
+ /* frees the list */
+ void _COLLECTION_METHOD(free)(_COLLECTION_TYPENAME *self);
+
+ /* appends a new item to the list */
+ int _COLLECTION_METHOD(append)(_COLLECTION_TYPENAME *self, _COLLECTION_TYPE item);
+
+ /* removes the last item from the list */
+ _COLLECTION_TYPE _COLLECTION_METHOD(pop)(_COLLECTION_TYPENAME *self);
+
+ #include "_collection_post.inc"
+
+The preprocessor will then use this to generate a `list_t`, `floatlist_t`,
+`list_new()`, `floatlist_new()` etc.
+
+Implementing the List
+---------------------
+
+The actual implementation of the list (`list.c`) looks similar to our
+`list.h` header, just that we are including `_list.inc` instead of
+`_list.h`. In both cases however we are using the same tricks as we did
+with our header files:
+
+`list.c`:
+
+.. sourcecode:: c
+
+ /* list of pointers */
+ #define _COLLECTION_TYPE void *
+ #define _COLLECTION_NAME list
+ #define "_list.inc"
+
+ /* list of floats */
+ #define _COLLECTION_TYPE float
+ #define _COLLECTION_NAME floatlist
+ #define "_list.inc"
+
+`_list.inc`:
+
+.. sourcecode:: c
+
+ #include "_collection_pre.inc"
+
+ _COLLECTION_TYPENAME *
+ _COLLECTION_METHOD(new)(void)
+ {
+ _COLLECTION_TYPENAME *rv = malloc(sizeof(_COLLECTION_TYPENAME));
+ if (!rv)
+ return NULL;
+ rv->size = 0;
+ rv->allocated = 32;
+ rv->items = malloc(sizeof(_COLLECTION_TYPE) * rv->allocated);
+ if (!rv->items) {
+ free(rv);
+ return NULL;
+ }
+ return rv;
+ }
+
+ void
+ _COLLECTION_METHOD(free)(_COLLECTION_TYPENAME *self)
+ {
+ if (!self)
+ return;
+ free(self->items);
+ free(self);
+ }
+
+ int
+ _COLLECTION_METHOD(append)(_COLLECTION_TYPENAME *self, _COLLECTION_TYPE item)
+ {
+ if (self->size >= self->allocated) {
+ size_t new_size = (size_t)(self->allocated * 1.33f);
+ _COLLECTION_TYPE *rv = realloc(self->items,
+ sizeof(_COLLECTION_TYPE) * new_size);
+ if (!rv)
+ return 0;
+ self->allocated = new_size;
+ self->items = rv;
+ }
+ self->items[self->size++] = item;
+ return 1;
+ }
+
+ _COLLECTION_TYPE
+ _COLLECTION_METHOD(pop)(_COLLECTION_TYPENAME *self)
+ {
+ return self->items[--self->size];
+ }
+
+ #include "_collection_post.inc"
+
+Usage
+-----
+
+And this is then how you would use that list:
+
+.. sourcecode:: c
+
+ #include "list.h"
+
+ int
+ main(void)
+ {
+ floatlist_t *list = floatlist_new();
+ floatlist_append(list, 42.0f);
+ floatlist_append(list, 23.0f);
+ assert(list->size == 2);
+ assert(list->items[0] == 42.0f);
+ assert(list->items[1] == 23.0f);
+ assert(floatlist_pop(list) == 23.0f);
+ floatlist_free(list);
+ }
+
+Language Limits
+---------------
+
+On top of that general concept you can then implement arbitrary data
+structures. The main problem with this over the template system from C++
+is not only that it needs more files or does not have virtual functions,
+but that it requires you to explicitly specify the types you want in the
+header and implementation files and then generate specific typedefs and
+functions for it. There is really nothing you can do to change this, this
+is how the language works.
+
+Another problem is that you can't use the preprocessor to generate other
+macros. So if you want to declare a type specific macro that returns an
+item from the list after doing an size assertion, you are out of luck.
+However all modern compilers do support inlines, so what you want is to
+create a static, inline function in the header instead of a macro.
+
+Generally speaking though, this is probably good enough to cover the
+majority of use cases and small applications. It did the trick for me at
+least.
View
296 2010/2/11/porting-to-python-3-a-guide.rst
@@ -0,0 +1,296 @@
+public: yes
+tags: [python]
+summary: |
+ Various notes on how to port libraries and applications over to Python 3
+ based on my experiences with the Jinja2 port.
+
+Porting to Python 3 — A Guide
+=============================
+
+The latest `Jinja 2 <http://jinja.pocoo.org/2/>`_ release came with
+basic support for Python 3. It was surprisingly painless to port the
+application over but it did require a substantial amount of tweaks and
+code changes in order to get it running. For everyone else out there who
+is interested in getting started, I decided to share my experiences:
+
+Changing APIs
+~~~~~~~~~~~~~
+
+Before you start porting the library you have to decide how interfaces
+will behave in Python 3. The biggest issue here is obviously unicode,
+but there are others as well. I would say there are four kinds of
+libraries you might encounter regarding string behavior in Python 2:
+There are the libraries that only accept unicode and only output
+unicode, there are those that only accept byte-strings and output
+byte-strings but operate on textual data, there are the libraries that
+operate on either or and what has been fed into it, comes out of it and
+there are libraries that operate either on unicode or byte-strings and
+also accept the other type as long as it's a subset of the default
+encoding (ASCII).
+
+First you have to find out what your library does, what it is supposed
+to do, and how you want to deal with that in Python 3. Because
+byte-strings no longer exist in Python 3 and were replaced by a `bytes`
+object that works similar, but has an incompatible API it is very
+unlikely that your code will be able to support both in the future (or
+that it is something you would desire).
+
+Byte-Based Libraries
+~~~~~~~~~~~~~~~~~~~~
+
+This is might the most tricky one if you are aiming for Python 2.5
+support or lower and you are operating on bytes directly. The issue is
+that the way you operate on bytes changed fundamentally from Python 2.x
+to 3.x and 2to3 is not really able to pick it up. Worse, it will try
+convert all your bytestring literals to unicode! The official support is
+as far as I know, to explicitly prefix the byte strings in the 2.x code
+with a leading `b` to indicate bytes. Unfortunately that means no
+support for 2.x. I am not completely sure what to do in that situation,
+but at least I found a way to trick python to operate on bytes: if you
+have code like this:
+
+.. sourcecode:: python
+
+ magic = 'M23\x01'
+
+And you want to ensure it does not end up being a `str` in 3.x, add a
+dummy encode:
+
+.. sourcecode:: python
+
+ magic = u'M23\x01'.encode('iso-8859-1')
+
+The only downside is that the encode happens at runtime, so it will slow
+down execution a bit.
+
+Text Based Libraries
+~~~~~~~~~~~~~~~~~~~~
+
+The second kind of library is a library that operates on text. In 2.x
+there were multiple ways to implement such libraries and it basically
+came down to what data type was used internally and what was accepted
+for input and output. There are the libraries that operate exclusively
+either on bytestrings or unicode. These are the ones that are the
+easiest to port, because 2to3 was written with nearly that in mind. If
+your library was only accepting bytestrings in 2.x it will (after a 2to3
+run) only be accepting a Python 3 `str` type which is unicode based.
+This works well as long as you do not intend to use some kind of IO in
+your library. Once you start doing that, you will need to make sure you
+can somehow specify the encoding to be used when opening files. In that
+case, make sure you open the file in byte mode (*not* in text mode!) and
+do the decoding/encoding yourself. This is the only way your IO code
+will work the same in both 2.x and 3.x. But more on IO later.
+
+What 2to3 does out of the box is converting calls from `unicode` to
+`str` automatically. Unfortunately it does not change the special
+`__unicode__` method to `__str__`. You can easily do that in a custom
+fixer though, so it should be easy to accomplish. If your library
+however supports both `__str__` *and* `__unicode__` you are in a more
+tricky situation here. Let me show you an example of the kind of
+classes I deal with in Jinja 2 for example:
+
+.. sourcecode:: python
+
+ class MyObject(object):
+
+ def __init__(self):
+ self.value = u'some value'
+
+ def __str__(self):
+ return unicode(self).encode('utf-8')
+
+ def __unicode__(self):
+ return self.value
+
+The big problem here is that 2to3 will convert it to this:
+
+.. sourcecode:: python
+
+ class MyObject(object):
+
+ def __init__(self):
+ self.value = 'some value'
+
+ def __str__(self):
+ return str(self).encode('utf-8')
+
+ def __unicode__(self):
+ return self.value
+
+If you call `str()` on your instance now, it will die with a runtime
+error because it recurses infinitely. Even if it would not recurse, it
+would try to return a bytes object from the `__str__` method because of
+the encode call. My plan was to write a custom fixer that, if it detects
+a `__str__` that just calls into `__unicode__` and encodes, will drop
+the `__str__` method and rename `__unicode__` to `__str__`.
+Unfortunately the tree you are dealing with in 2to3 does not appear to
+be designed to removing code so what I do instead of removing the
+`__str__` is just renaming the `__unicode__` to `__str__` and let Python
+override the dummy `__str__` with the correct one. The fixer I use for
+that, looks like this:
+
+.. sourcecode:: python
+
+ from lib2to3 import fixer_base
+ from lib2to3.fixer_util import Name
+
+ class FixRenameUnicode(fixer_base.BaseFix):
+ PATTERN = r"funcdef< 'def' name='__unicode__' parameters< '(' NAME ')' > any+ >"
+
+ def transform(self, node, results):
+ name = results['name']
+ name.replace(Name('__str__', prefix=name.prefix))
+
+After conversion with this fixer in place, the class from above will
+then look like this:
+
+.. sourcecode:: python
+
+ class MyObject(object):
+
+ def __init__(self):
+ self.value = 'some value'
+
+ def __str__(self):
+ return str(self).encode('utf-8')
+
+ def __str__(self):
+ return self.value
+
+But where to put those fixers? Edit 2to3 directly? And do I have to
+provide two source packages for 2.x and 3.x? This is where `distribute
+<http://pypi.python.org/pypi/distribute>`_ comes in.
+
+2to3 through distribute
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Distutils itself already has the possibility to run 2to3 for you, but
+what it cannot do is adding custom fixers without a lot of custom code.
+distribute on the other hand not gives you built in 2to3 support as a
+single keyword argument to `setup()` but can also pass custom fixers to
+2to3 which is very helpful. Because these new keyword arguments however
+would warn if the setup script was executed with setuptools instead of
+distribute, you should only pass them to the setup function if invoked
+from Python 3. The setup script then looks like this:
+
+.. sourcecode:: python
+
+ import sys
+
+ from setuptools import setup
+
+ # if we are running on python 3, enable 2to3 and
+ # let it use the custom fixers from the custom_fixers
+ # package.
+ extra = {}
+ if sys.version_info >= (3, 0):
+ extra.update(
+ use_2to3=True,
+ use_2to3_fixers=['custom_fixers']
+ )
+
+
+ setup(
+ name='Your Library',
+ version='1.0',
+ classifiers=[
+ # make sure to use :: Python *and* :: Python :: 3 so
+ # that pypi can list the package on the python 3 page
+ 'Programming Language :: Python',
+ 'Programming Language :: Python :: 3'
+ ],
+ packages=['yourlibrary'],
+ # make sure to add custom_fixers to the MANIFEST.in
+ include_package_data=True,
+ **extra
+ )
+
+Now all you have to do is to put the custom 2to3 fixers (written in
+Python 3!) into the `custom_fixers` package next to your real library
+and they will be added automatically. For examples of fixers, look into
+the `lib2to3/fixes` package or your Python 3 installation. If you run
+`python3 setup.py build` it will run 2to3 on your files and put the
+output into the build folder for you to test.
+
+Input/Output
+~~~~~~~~~~~~
+
+So in Python 3 there is a completely new input/output system. It is very
+Java-ish and is able to deal with unicode. The downside is that you
+either don't have it in 2.x or the implementation is too slow, so what
+you want to do is to create yourself an abstraction layer.
+
+If your library was unicode based in older Python versions you probably
+just did `file.read().decode(encoding)` or something similar. This still
+works on 3.x and I strongly recommend doing that, but be sure to open
+the file in binary mode, otherwise on Python 3 the decode will attempt
+to decode an already decoded unicode string, which does not make any
+sense. If you *need* normalized newlines (windows newlines converted to
+`'\n'`) you would have to post-process the string by hand, but must
+applications and libraries are able to deal with any kind of newline
+anyways.
+
+You could also just create a IO helper module that calls the builtin
+open on 3.x and `codecs.open` on 2.x. Unfortunately codecs.open has a
+worse performance than the built in open on 2.x, so you might want to
+check how you are dealing with files, if a high performance is necessary
+and so forth. Most of the time, opening the file in binary mode is what
+you want to do.
+
+If you library was byte based in 2.x and you opened files in the
+library, instead of just working on open file objects, you will have to
+change your API slightly in order to take the charset and error mode
+into account. If you previously had a function like this:
+
+.. sourcecode:: python
+
+ def read_file_contents(filename):
+ with open(filename) as f:
+ return f.read()
+
+You will have to change it to something like this now:
+
+.. sourcecode:: python
+
+ def read_file_contents(filename, charset='utf-8', errors='strict'):
+ with open(filename, 'rb') as f:
+ return f.read().decode(charset, errors)
+
+And then ensure that you give the user to provide these arguments to the
+function. This means that whatever calls this, would also have to accept
+this arguments and so forth. Not everyone is using utf-8, there might be
+legacy files in iso-8859-1 a user might still want to be able to open.
+With a proper error handling system, it might even be possible to fall
+back to another encoding if it does not decode as utf-8 properly.
+
+Last but not least, 3.x `StringIO` is a "string IO", not something that
+accepts binary data. If you have a lot of unittests that are dealing
+with binary data in such objects, you will have to use the `io.BytesIO`
+instead. If it does not exist, you are running 2.x, and you can safely
+fall back to `cStringIO.StringIO`.
+
+Unit-Testing
+~~~~~~~~~~~~
+
+Now the biggest problem I had with switching to 3.x: The unittests.
+First of all: **do not use doctest**. There is a doctest converter in
+2to3, but it does not give you much. Error messages changed, reprs
+changed which it cannot properly pick up, nested tracebacks cause a lot
+of grief and they are hard to debug. I was playing with the idea to
+write a tool that automatically converts doctests to unittests, but I
+was too lazy and converted the few I had in my code, to unittests by
+hand. Furthermore, the few doctests left (used as code examples in the
+documentation) are only tested if the testsuite is invoked from Python
+2.x
+
+Nosetest has 3.x support in a separate branch, py.test comes with 3.x
+for a while now and the builtin unittest does the trick as well. I
+personally converted all my Jinja 2 tests to unittest lately. If you are
+using unittest you can point distribute to your test suite function and
+it will run the test for you if you write python setup.py test. This
+even runs 2to3 for you if you execute it with Python 3. So very helpful.
+
+Hope that helps you porting your libraries to Python 3. Would love to
+hear about your experiences, because even if Python 3 did not work out
+as some of us hoped, it is very important that we continue to port
+libraries over to 3.x.
View
188 2010/4/3/april-1st-post-mortem.rst
@@ -0,0 +1,188 @@
+public: yes
+tags: [flask, python]
+summary: |
+ My conclusions of an April's fools day joke I made this year that later
+ lead to the development of the Flask micro framework.
+
+April 1st Post Mortem
+=====================
+
+This year I decided to finally do what I planned for quite some time: an
+April's fool joke. (I did contribute a bit to `PEP 3117
+<http://www.python.org/dev/peps/pep-3117/>`_, but that does not count).
+This year I decided to make a little joke about Python microframeworks
+(micro-web-frameworks?) and wrote a little thing, and created a website
+and screencast for it: `denied.immersedcode.org
+<http://denied.immersedcode.org/>`_.
+
+I did expect some responses to that, but I was a little bit surprised by
+some of them though. So here my full disclosure of the april's fool
+prank, what people thought of it and what my conclusion is.
+
+The Motivation
+~~~~~~~~~~~~~~
+
+It seems like everybody likes microframeworks. Not sure what caused
+that, but there are plenty of them. web.py (Python) and camping (Ruby)
+where the first of their kind I think. Later others followed and it
+seemed that people love the idea of software that does not have
+dependencies and comes in a single file. So I thought, I can do the same
+and make fun of it, so let's just create a framework based on existing
+technology and throw everything together in a large single file: denied
+was born. I just bundled a Werkzeug, simplejson and Jinja2 into a single
+file and added a bit of code that glues them together.
+
+The Implementation
+~~~~~~~~~~~~~~~~~~
+
+Denied consists of 160 lines of code that implements a very basic WSGI
+application based on Werkzeug and Jinja2 that incorporates really stupid
+ideas into the code:
+
+* it `stores state in the module
+ <http://lucumr.pocoo.org/2009/7/24/singletons-and-their-problems-in-python>`_
+ and uses implicitly defined data structures
+* there is a function that accepts both a template filename or a
+ template source string as the same parameter and guesses based on the
+ contents of the string.
+* it introspects the interpreter frame to figure out the name of the
+ function that called a template render function to automagically guess
+ the name of the template.
+* it uses automatic function registration and decorators to register
+ URL rules.
+
+I don't want to go into details why I hate everything there, that would
+be a blog post of its own, but I want to point out that nearly all of
+these "features" were inspired by existing microframeworks.
+
+I did not expect anyone to detect from these things that the framework
+was an April's fool joke, but I thought that the obfuscated sourcecode
+and the fact that it was basically just a zipfile would be obvious.
+However I got more than one mail asking me to release the sourcecode of
+it because people want to hack on it. Right now it has more than 50
+followers and 6 forks on github which is insane if you keep in mind that
+Jinja2 and Werkzeug have less than 30 on bitbucket.
+
+Thinking about it a bit more made me realize that camping back in the
+days was in fact delivered as obfuscated 2K file of Ruby code. Not sure
+why _why did that, but he was a man of mysteries so probably just
+because he thought it was fun.
+
+The Screencast
+~~~~~~~~~~~~~~
+
+To make the joke more obvious I created a screencast that would showcase
+the framework and do pretty much everything wrong. For that I created a
+persona called "Eirik Lahavre" that implemented the framework and did
+the screencast. Originally I wanted that person to be a Norwegian web
+developer but unfortunately the designated speaker disappeared so I had
+to ask a friend of mine (Jeroen Ruigrok van der Werven) to record it for
+me but he told me he can't do a norwegian accent so he went with French
+and Eirik Lundbergh became Eirik Lahavre. I lay flat on the floor when I
+listened to the recording for the first time because he's actually Dutch
+:)
+
+The Website
+~~~~~~~~~~~
+
+For the website I collected tongue-in-cheek fake endorsements from
+popular Python programmers and added one for myself that was just
+bashing the quality of the code. I'm afraid I sort of made myself
+popular by bashing other people's web frameworks, at least reading
+reddit, hacker news and various mailinglists leaves that impression so I
+thought it would be fun to emphasize that a bit more on that website.
+This also comes very close to the website of web.py which shows a few
+obviously bad comments from popular Python hackers.
+
+Furthermore the website shows a useless and short hello world example
+which shows nothing about how the framework works. This was inspired by
+every other microframework website out there. It claims RESTfulnes and
+super scaling capabilities, kick-ass performance and describes the
+developer of the project (the fictional Eirik Lahavre) as god of Python
+code and coming from an professional company.
+
+The Details
+~~~~~~~~~~~
+
+For everything in the joke I did what I would never do. I even went so
+far to create the HTML of the website against my own code style, to use
+deprecated HTML tags in the presentation, claim to use XHTML even though
+the doctype and mimetype was wrong. The screencast also claims that flat
+files were a scalable NoSQL database and that missing form helpers were
+something positive because it means full flexibility.
+
+The Impact
+~~~~~~~~~~
+
+The screencast was downloaded over 10,000 times and the website got more
+than 50.000 hits. The link is still tweeted and I never got that many
+retweets for anything related to my projects so far. The fake project on
+github has more than 50 followers and 6 forks. Quite a few people took
+the project serious from the few comments on reddit and the emails I
+got.
+
+What I learned
+~~~~~~~~~~~~~~
+
+* It does not matter how good intended or well written a project is,
+ the bold marketing is king. Being present on github is *huge*. As much
+ as I love bitbucket and mercurial, but there is an immense difference
+ between having your project on github or bitbucket, and I'm afraid
+ that no matter what bitbucket does or what the mercurial people do,
+ they will never even come close to github in terms of user base people
+ following your code and contributing.
+* Small snippets of code on the website are killer. Werkzeug tries to
+ be honest by not showcasing a small "Hello World" application but
+ something more complex to show the API, but that does not attract
+ users. Jinja2 does not even try to show anything at all, you have to
+ look at the documentation to see how it looks like. That drives
+ potential users away.
+* Don't be honest: be bold. Nobody will check your claims anyway and
+ if they don't live up to the promise, you can still say that your test
+ setup was or your understanding of the problem is different.
+* There is no such thing as a "bad endorsement". People took it as a
+ good sign that I did not give the project my blessing.
+
+The Small Library
+~~~~~~~~~~~~~~~~~
+
+I'm currently trying to learn everything about game development and 3D
+graphics I possibly can. I found out that the best way to learn that is
+to write a minimal engine from scratch. Right now I'm doing that by
+looking at other source code and reading books and writing the most
+minimal code I can. I always try to prove to myself: existing code is
+way to complex, that has to be easier. After the third refactoring and
+improvements I usually end up with something as complex as the original
+code or the explanation from the book.
+
+There is a reason why things are as complex as they are and not easier.
+I think the same is true for microframeworks. The reason why everybody
+is that crazy about having a single file implementing whatever is
+necessary to implement a web application is because you can claim it's
+easy and you can understand it. However things are not that easy in
+reality. I am pretty sure that other framework developers will agree.
+
+web.py is the perfect example for that. It started as a library in 1000
+lines of code in a single file, and look at what it became. It's not
+that simple any more. Many of the initial design decisions that were
+plain wrong were reverted. Such as abusing the print statement for
+outputting values to the browser. There were good reasons why nobody
+before web.py used print to output strings, yet web.py did it that way.
+And a few versions later it disappeared again for good.
+
+What will Change?
+~~~~~~~~~~~~~~~~~
+
+For one I will put small example snippets on the Werkzeug and Jinja2
+website. Also for the fun of it I will publish one of the projects on
+github just to see how that works out. In general though, I will try to
+keep things low profile because I just feel more comfortable with that.
+
+Obviously, denied will stay the April's fool joke it was and not get
+further attention. The "promised" documentation will not come :) However
+I will probably blog about "how to create your own microframework based
+on Werkzeug" because right now people base their microframeworks on the
+standard library which I think is a terrible idea. One dependency might
+not be as good as no dependency, but with Tarek Ziade's tremendous work
+on packaging with Python that should not be a problem in the near
+future.
View
323 2010/5/25/wsgi-on-python-3.rst
@@ -0,0 +1,323 @@
+public: yes
+tags: [wsgi, python]
+summary: |
+ A short summary about the current state of WSGI on Python 3.
+
+WSGI on Python 3
+================
+
+Yesterday after my talk about WSGI on Python 3 I announced an OpenSpace
+about WSGI. However only two people showed up there which was quite
+disappointing. On the bright side however: it was in parallel to some
+interesting lighting talks and I did not explain to well what the
+purpose of this OpenSpace was.
+
+In order to do better this time around, I want to summarize the current
+situation of WSGI on Python 3, what the options are and why I'm at the
+moment thinking of going back to an earlier proposal that was dismissed
+already.
+
+So here we go again:
+
+Language Changes
+~~~~~~~~~~~~~~~~
+
+There are a couple of changes in the Python language that are relevant
+to WSGI because they make certain things harder to implement and others
+easier. In Python 2.x bytestrings and unicode strings shared many
+methods and Python would do a lot to make it easy for you to implicitly
+switch between the two types. The root cause of the unicode decode and
+unicode encode errors everybody knows in Python are often caused by the
+implicit conversion going on.
+
+Now in Python 3 the whole thing looks a lot different. There are only
+unicode strings now and the bytestrings got replaced by things that are
+more like arrays than strings. Take this Python 2 example:
+
+.. sourcecode:: pycon
+
+ >>> 'foo' + u'bar'
+ u'foobar'
+ >>> 'foo %s' % 42
+ 'foo 42'
+ >>> print 'foo'
+ foo
+ >>> list('foo')
+ ['f', 'o', 'o']
+
+Now compare that to the very same example on Python 3, just with syntax
+ adjusted to the new rules:
+
+.. sourcecode:: pycon
+
+ >>> b'foo' + 'bar'
+ Traceback (most recent call last):
+ File "", line 1, in
+ TypeError: can't concat bytes to str
+ >>> b'foo %s' % 42
+ Traceback (most recent call last):
+ File "", line 1, in
+ TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
+ >>> print(b'foo')
+ b'foo'
+ >>> list(b'foo')
+ [102, 111, 111]
+
+There are ways to convert these bytes to unicode strings and the other way
+round, there are also string methods like `title()` and `upper()` and
+everything you know from a string, but it still does not behave like a
+string. Keep this in mind when reading the rest of this article, because
+that explains why the straightforward approach does not work out too well
+at the moment.
+
+Something about Protocols
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+WSGI like HTTP or URIs are all based on ASCII or an encoding like latin1
+or even different encodings. But all those are not based on a single
+encoding that represents unicode. In Python 2 the unicode situation for
+web applications was fixed pretty quickly by all frameworks in the same
+way: you as the framework/application know the encoding, so decode
+incoming request data from the given charset and operate on unicode
+internally. If you go to the database, back to HTTP or something else
+that does not operate on unicode, encode to the target encoding which you
+know.
+
+This is painless some libraries like Django make it even less painful by
+having special helpers that can convert between utf-8 encoded strings and
+actual unicode objects at any point. Here a list of web related libraries
+operating on unicode (just a small pick): Django, Pylons, TurboGears 2,
+WebOb, Werkzeug, Jinja, SQLAlchemy, Genshi, simplejson, feedparser and the
+list goes on.
+
+What these libraries can have, what a protocol like WSGI does not, is
+having the knowledge of the encoding used. Why? Because in practice (not
+on the paper) encodings on the web are very simple and driven by the
+application: the encoding the application sends out is the encoding that
+comes back. It's as simple as that. However WSGI does not have that
+knowledge because how would you tell WSGI what encoding to assume? There
+is no configuration for WSGI so the only thing we could do is forcing a
+specific charset for WSGI applications on Python 3 if we want to get
+unicode onto that layer. Like utf-8 for everything except headers which
+should be latin1 for RFC compliance.
+
+Byte Based WSGI
+~~~~~~~~~~~~~~~
+
+On Python 2 WSGI is based on bytes. If we would go with bytes on Python 3
+as well, the specification for Python 3 would look like this:
+
+1. WSGI `environ` keys are unicode
+2. WSGI `environ` values that contain incoming request data are
+ bytes
+3. headers, chunks in the response iterable as well as status
+ code are bytes as well
+
+If we ignore everything else that makes this approach hard on Python
+3 and only look at the bytes object which just does not behave like a
+standard string any more, a WSGI library based on the standard libraries
+functions and the bytes type is quite complex compared to the Python 2
+counterpart. Take the very simple code commonly used to reproduce a URL
+from the WSGI environment on Python 2:
+
+.. sourcecode:: python
+
+ def get_host(environ):
+ if 'HTTP_HOST' in environ:
+ return environ['HTTP_HOST']
+ result = environ['SERVER_NAME']
+ if (environ['wsgi.url_scheme'], environ['SERVER_PORT']) not \
+ in (('https', '443'), ('http', '80')):
+ result += ':' + environ['SERVER_PORT']
+ return result
+
+ def get_current_url(environ):
+ rv = '%s://%s/%s%s' % (
+ environ['wsgi.url_scheme'],
+ get_host(environ),
+ urllib.quote(environ.get('SCRIPT_NAME', '').strip('/')),
+ urllib.quote('/' + environ.get('PATH_INFO', '').lstrip('/'))
+ )
+ qs = environ.get('QUERY_STRING')
+ if qs:
+ rv += '?' + qs
+ return rv
+
+This depends on many string operations and is entirely based on bytes
+(like URLs are). So what has to be changed to make this code work on
+Python 3? Here an untested version of the same code adapted to
+theoretically run on a byte based WSGI implementation for Python 3.
+
+The `get_host()` function is easy to port because it only concatenates
+bytes. This works exactly the same on Python 3, but we could even improve
+that theoretically by switching to bytearrays which are mutable bytes
+objects which in theory give us better memory management. But here the
+straightforward port:
+
+.. sourcecode:: python
+
+ def get_host(environ):
+ if 'HTTP_HOST' in environ:
+ return environ['HTTP_HOST']
+ result = environ['SERVER_NAME']
+ if (environ['wsgi.url_scheme'], environ['SERVER_PORT']) not \
+ in ((b'https', b'443'), (b'http', b'80')):
+ result += b':' + environ['SERVER_PORT']
+ return result
+
+The port of the actual `get_current_url()` function is a little different
+because the string formatting feature used for the Python
+2 implementation are no longer available:
+
+.. sourcecode:: python
+
+ def get_current_url(environ):
+ rv = (
+ environ['wsgi.url_scheme'] + b'://'
+ get_host(environ) + b'/'
+ urllib.quote(environ.get('SCRIPT_NAME', b'').strip(b'/')) +
+ urllib.quote(b'/' + environ.get('PATH_INFO', b'').lstrip(b'/'))
+ )
+ qs = environ.get('QUERY_STRING')
+ if qs:
+ rv += b'?' + qs
+ return rv
+
+The example did not become necessarily harder, but it became a little bit
+more low level. When the developers of the standard library ported over
+some of the functions and classes related to web development they decided
+to introduce unicode in places where it's does not really belong. It's an
+understandable decision based on how byte strings work on Python 3, but it
+does cause some problems. Here a list of places where we have unicode,
+where we previously did not have it. Not judging here on if the decision
+was right or wrong to introduce unicode there, just that it happened:
+
+* All the HTTP functions and servers in the standard library are
+ now operating on latin1 encoded headers. The header parsing
+ functions will assume latin1 there and pass unicode to you.
+ Unfortunately right now, Python 3 does not support non *ASCII*
+ headers at all which I think is a bug in the implementation.
+* The `FieldStorage` object is assuming an utf-8 encoded input
+ stream as far as I understand which currently breaks binary file
+ uploads. This apparently is also an issue with the email package
+ which internally is based on a common mime parsing library.
+* `urllib` also got unicode forcely integrated. It is assuming
+ utf-8 encoded string in many places and does not support other
+ encodings for some functions which is something that can be fixed.
+ Ideally it would also support operations on bytes which is
+ currently only the case for unquoting but none of the more complex
+ operations.
+
+The about-to-be Spec
+~~~~~~~~~~~~~~~~~~~~
+
+There are some other places as well where unicode appeared, but
+these are the ones causing the most troubles besides the bytes not
+being a string thing. Now what later most of WEB-SIG agreed with and
+what Graham implemented for `mod_wsgi` ultimately is a fake unicode
+approach. What does this mean? Make sure that all the information is
+stored as unicode but not with the proper encoding (which WSGI would
+not know) but just assume latin1. If latin1 is not what the
+application expected, the application can encode back to latin1 and
+decode from utf-8. (As far as I know, this is loss-less).
+
+Here what the current specification looks like that is about to be
+crafted into a PEP:
+
+1. The application is passed an instance of a Python dictionary
+ containing what is referred to as the WSGI environment. All keys
+ in this dictionary are native strings. For CGI variables, all
+ names are going to be ISO-8859-1 and so where native strings are
+ unicode strings, that encoding is used for the names of CGI
+ variables.
+2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
+ environment, the value of the variable should be a native
+ string.
+3. For the CGI variables contained in the WSGI environment, the
+ values of the variables are native strings. Where native strings
+ are unicode strings, ISO-8859-1 encoding would be used such that
+ the original character data is preserved and as necessary the
+ unicode string can be converted back to bytes and thence decoded
+ to unicode again using a different encoding.
+4. The WSGI input stream 'wsgi.input' contained in the WSGI
+ environment and from which request content is read, should yield
+ byte strings.
+5. The status line specified by the WSGI application should be a
+ byte string. Where native strings are unicode strings, the
+ native string type can also be returned in which case it would
+ be encoded as ISO-8859-1.
+6. The list of response headers specified by the WSGI
+ application should contain tuples consisting of two values,
+ where each value is a byte string. Where native strings are
+ unicode strings, the native string type can also be returned in
+ which case it would be encoded as ISO-8859-1.
+7. The iterable returned by the application and from which
+ response content is derived, should yield byte strings. Where
+ native strings are unicode strings, the native string type can
+ also be returned in which case it would be encoded as
+ ISO-8859-1.
+8. The value passed to the 'write()' callback returned by
+ 'start_response()' should be a byte string. Where native strings
+ are unicode strings, a native string type can also be supplied,
+ in which case it would be encoded as ISO-8859-1.
+
+Why I'm Unhappy again
+~~~~~~~~~~~~~~~~~~~~~
+
+I did some tests lately with toying around and starting to work on a
+port of Werkzeug but the more I worked with it, the more I disliked
+it. WSGI in Python 2 was already a protocol that was far more
+complex than it should have been and some parts of it just don't
+make any sense (like the input stream having readline without size)
+but it was something you could get started quickly and the basics
+were simple. Middlewares, the area where WSGI was already a far too
+complex now just become more complex because they have to encode
+unicode strings before they can operate on them, even if it's just
+comparing.
+
+It just feels like the more I play with it, the more unhappy I
+become with how the bytes object works and how the standard library
+behaves. And I doubt I will be the only one here. It's just that
+playing with the actual code shows problems you wouldn't spot on the
+paper so I would love to see a wider crowd of people toying with
+both the language and specification to make sure WSGI stays a
+specification everybody is happy with.
+
+Right now I'm a little bit afraid we end up with a specification
+that requires use to do the encode/decode/encode/decode dance just
+because the standard library and a limitation on the bytes object
+makes us do. Because one thing is for certain: ASCII and bytes are
+here to stay. Nobody can change the protocols that are in use, and
+even those would on the very bottom have to be based on bytes. And
+if the tools to work with them are not good enough in Python 3 we
+will see the problems with that on multiple levels, not just WSGI
+(Databases, email, and more).
+
+What I currently have in mind is a bit more than what was ever on
+discussion for WSGI which is why I don't expect anything like that
+to be implemented, but it can't harm sharing:
+
+* Support basic string formatting for bytes
+* Support bytes in more places of the standard library (urllib,
+ cgi module etc.)
+* use bytes for values (not keys) in the WSGI spec for Python 3,
+ just like in Python 2
+* use bytes for headers, status codes and everything for Python 3
+
+I am happy to accept a quasi-unicode support as well and will port
+Werkzeug over to it. But it's probably still the time to improve the
+specification *and* language that everybody is happy. Right now it
+looks like not a lot of people are playing with the specification,
+the language and the implications of all that. The reason why Python
+3 is not as good as it could be, is that far too few people look at
+it. It is clear that the future of Python will be Python 3 and that
+there are no intentions of make other releases than Python 2.7, so
+to make the process less painful it's necessary to start playing
+with it now.
+
+So I encourage everyone to play with Python 3, the spec, the
+standard library so that there is more input. Maybe the bytes issue
+does look like I think it is, maybe it's not. But if only a four
+people are discussing the issue, there is too few input to make
+rational decisions.
+
View
196 2010/6/14/opening-the-flask.rst
@@ -0,0 +1,196 @@
+public: yes
+tags: [flask, python]
+summary: |
+ First part of a series of articles about how Flaks works internally
+ and how you can create micro frameworks with Werkzeug.
+
+Opening The Flask
+=================
+
+Howdy everybody. This is the first part of a multi-part blog post about
+creating web frameworks based on `Werkzeug
+<http://werkzeug.pocoo.org/>`_ and other existing code. This is
+obviously based on my `Flask <http://flask.pocoo.org/>`_ microframework.
+So it probably makes sense to head over to the documentation first to
+look at some example code. But before we get started, let's discuss
+first about why you should create your own frameworks.
+
+Why create your own Framework
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is quite unpopular these days to go with building your own framework;
+everybody quickly shouts "reinventing the wheel" and points you to one
+of the tons of existing web frameworks out there. But it is actually a
+really good idea to create a framework for an application and not go
+with a stock one. Why? Because you are a lot more flexible and your
+application might require something that does not exist yet. For an
+application I wrote in the past in the very early Django days the
+development process looked a lot like this:
+
+Step 1: download django, Step 2: get started and feel happy, Step 3:
+encounter problems in the framework design and start modifiying the
+core, Step 4: phase more and more Django code out and end up with a
+completely new package that everybody hates.
+
+Turns out: Django like every other framework out there is improving
+quickly, but often not in the areas you might be interested in. Then you
+start modifying it yourself and when Django improves sideways, you
+suddenly end up in the situation where it becomes nearly impossible to
+upgrade to a newer Django version or it's too painful. Obviously Django
+has greatly improved since then, but a few things continue to work
+differently than I want them to work. For one I personally don't like
+the template engine too much and also would love the ORM to ensure that
+objects with the same primary key are actually the same objects and
+queries sent less often. These are things that are very unlikely to
+change in Django and there are really good reasons why this will not
+change which are totally fine, but certainly not what I want.
+
+Another reason to roll your own framework is that you know everything
+and you can fix it quickly yourself.
+
+End Result
+~~~~~~~~~~
+
+This is what should work at the end of the day:
+
+.. sourcecode:: python
+
+ from yourflask import YourFlask
+ app = YourFlask()
+
+ @app.route('/')
+ def index(request):
+ return 'Hello World'
+
+ if __name__ == '__main__':
+ app.run()
+
+Looks a lot like a simplified Flask version, which is exactly what it
+should be. Not yet as capable, but easier to dive in and to understand
+the concepts.
+
+In a nutshell: 1) create an application, 2) register functions on that
+application that listen on a specific path (or URL rule), 3) these
+functions return response objects or strings. We also pass the request
+object explicitly to the function for now because that's easier to
+understand and implement.
+
+The Code
+~~~~~~~~
+
+The following code implements the full framework for this blog post. As
+I said, it's a very simplified Flask but it is capable of producing
+simple web applications and to run the example from above:
+
+.. sourcecode:: python
+
+ from werkzeug import Request, Response, run_simple
+ from werkzeug.exceptions import HTTPException
+ from werkzeug.routing import Map, Rule
+
+ class YourFlask(object):
+
+ def __init__(self):
+ self.url_map = Map()
+ self.views = {}
+
+ def route(self, string, **options):
+ def decorator(f):
+ options['endpoint'] = f.__name__
+ self.views[f.__name__] = f
+ self.url_map.add(Rule(string, **options))
+ return f
+ return decorator
+
+ def run(self, **options):
+ return run_simple('localhost', 5000, self, **options)
+
+ def make_response(self, rv):
+ if isinstance(rv, basestring):
+ return Response(rv, mimetype='text/html')
+ return rv
+
+ def __call__(self, environ, start_response):
+ request = Request(environ)
+ adapter = self.url_map.bind_to_environ(environ)
+ try:
+ endpoint, values = adapter.match()
+ response = self.make_response(self.views[endpoint](request, **values))
+ except HTTPException, e:
+ response = e
+ return response(environ, start_response)
+
+So how exactly does it work and what does it do? The following list is
+the summary of the above code:
+
+* We create a class called `YourFlask` that implements a WSGI
+ application and provides methods to register callback functions and
+ binds them to a Werkzeug URL map.
+* The `route()` method can be used as a decorator to register new view
+ functions. It does this by accepting a string with the URL rule as
+ first argument and accepts some more keyword arguments that are
+ forwarded unchanged. The routing system uses an opaque string to
+ identify functions. This is called the endpoint. In this example we
+ will use the function name as endpoint (something Flask does as well
+ for simple setups).
+* The `run()` method just starts the internal development server that
+ comes with Werkzeug. That's just a nice shortcut.
+* `make_response()` is called with the return value from the view
+ function. If it's a string, we create a response object. That's just a
+ nice shortcut.
+* In the `__call__()` method we implement the full WSGI application.
+ First a request object is created from the WSGI environment and then
+ the URL map is used to create an adapter. This adapter is basically
+ bound to the WSGI environment and can be used to match the current
+ URL. If a match is found the endpoint and values are returned (the
+ values are variable parts in the rule as dictionary). In case nothing
+ matched, a `NotFound` exception is raised which incidentally is also
+ an `HTTPException`. If all works out we look up the view function and
+ pass it the values and the request object.
+* The return value of the function is passed to our `make_response()`
+ method so that we can ensure it's a response object.
+* If an `HTTPException` is raised we catch it and use it as response
+ object. It's not exactly a response object but close enough to one
+ that we can do the same with it.
+* Either way, the response is invoked as WSGI application and the
+ application iterator is returned.
+
+Where WSGI fits in
+~~~~~~~~~~~~~~~~~~
+
+So what we created is a WSGI application. How exactly does it work and
+where is the WSGI part? The majority of the pain is handled for us by
+Werkzeug. WSGI itself looks like this:
+
+1. There is a thing that can be called. It's passed a WSGI
+ environment (which is basically a dict with incoming data) and a
+ function that is used to start the response.
+2. What the function returns is an iterable of data send back to the
+ browser, it has to call the response starting function first.
+
+If you look close, we are doing that in our `__call__()` method. Well,
+it's not really visible but it happens. When we invoke the response
+thingy, internally Werkzeug will call the response starting function and
+all for us. We also use the WSGI environment when we create the request
+object.
+
+The request object itself gives us access to all the stuff that is
+incoming from the browser: where the request went, what values were
+transmitted, what browser is used, the cookies etc. We will focus on
+that with the next blog post.
+
+Coming up Next
+~~~~~~~~~~~~~~
+
+Now that all is working fine we should focus on these things next:
+
+1. explore the concept of thread / context local objects to avoid
+ passing the request object (not saying it's necessarily a good idea
+ but crucial for understanding web frameworks in general. Even if you
+ think Django does not use them, it does. The i18n and database
+ system is powered by thread local objects).
+2. add support for a template engine and serving up static files
+3. add more helper functions for URL building, rendering templates
+ and aborting requests with errors.
+
+Stay tuned :)
View
238 2010/8/17/git-and-mercurial-branching.rst
@@ -0,0 +1,238 @@
+public: yes
+tags: [hg, git]
+summary: |
+ Comparsion of mercurial's and git's branching systems and why one of
+ them works better for me than the other.
+
+Git and Mercurial Branching
+===========================
+
+People using my stuff will have noticed a trend that I started using git
+(and github) for some of my newer projects. Now there are two parts of
+the story. One is obviously github which is currently unbeatable. Not
+necessarily due to the size of the community but that it gives you a
+very neat way to look at what patches are available all over the place
+(I'm referring to the fork queue). But the other reason is that despite
+all the hate for git I have, it has grown on me as the tool that does
+the job better.
+
+This however is a shame because if you remove the github component and
+some other addons like the hub command, hg is the superior tool. If you
+would download both tools and compare what they can do out of the box
+with the stuff they ship, hg has the better user interface and better
+tools for collaboration. These tools are essentially `hg incoming`, `hg