More old posts and a new one

mitsuhiko · Nov 23, 2010 · 295a87b · 295a87b
1 parent 9a7c1d3
commit 295a87b
Show file tree

Hide file tree

Showing 16 changed files with 2,861 additions and 10 deletions.
diff --git a/2007/5/21/getting-started-with-wsgi.rst b/2007/5/21/getting-started-with-wsgi.rst
@@ -0,0 +1,346 @@
+public: yes
+tags: [wsgi, python]
+summary: |
+  A friendly introduction into getting started with WSGI on Python.
+
+Getting Started with WSGI
+=========================
+
+I finally finished the written matura and have some more time to work on
+projects and write articles. One of the things I wanted to write for a
+long time is a WSGI tutorial that does not require a specific framework
+or implementation. So here we go.
+
+.. image:: http://dev.pocoo.org/~mitsuhiko/wsgi-snake.png
+   :alt: Getting started with WSGI
+
+What's WSGI?
+~~~~~~~~~~~~
+
+Basically WSGI is lower level than CGI which you probably know. But in
+difference to CGI, WSGI does scale and can work in both multithreaded
+and multi process environments because it's a specification that doesn't
+mind how it's implemented. In fact WSGI is not CGI because it's between
+your web application and the webserver layer which can be CGI,
+mod_python, FastCGI or a webserver that implements WSGI in the core like
+the python stdlib standalone WSGI server called wsgiref. 
+
+WSGI is specified in the `PEP 333
+<http://www.python.org/dev/peps/pep-0333/>`_ and adapted by various
+frameworks including the well known frameworks django and pylons. 
+
+If you are too lazy to read the pep 333 here's a short summary: 
+
+* WSGI application are callable python objects (functions or classes
+  with a `__call__` method that are passed two arguments: a WSGI
+  environment as first argument and a function that starts the response.
+* the application has to start a response using the function provided
+  and return an iterable where each yielded item means writing and
+  flushing.
+* The WSGI environment is like a CGI environment just with some
+  additional keys that are either provided by the server or a
+  middleware.
+* you can add middlewares to your application by wrapping it.
+
+Because that's a lot of information let's ignore it for now and have a
+look at a basic WSGI application: 
+
+Extended Hello World
+~~~~~~~~~~~~~~~~~~~~
+
+Here a simple, but not too simple example of a WSGI application that
+says `Hello World!` where World can be specified via url parameter. 
+
+.. sourcecode:: python
+
+    from cgi import parse_qs, escape
+
+    def hello_world(environ, start_response):
+        parameters = parse_qs(environ.get('QUERY_STRING', ''))
+        if 'subject' in parameters:
+            subject = escape(parameters['subject'][0])
+        else:
+            subject = 'World'
+        start_response('200 OK', [('Content-Type', 'text/html')])
+        return ['''Hello %(subject)s
+        Hello %(subject)s!
+
+    ''' % {'subject': subject}]
+
+As you can see the `start_response` function takes two arguments. A
+status string and a list of tuples that represent the response headers.
+What you cannot see because it's not used here and nowhere else is that
+the `start_response` function returns something. It returns a `write`
+function that directly writes to the webserver output stream. Because it
+bypasses middlewares (we'll cover that later) it's a terrible bad idea
+to use that function. For debugging purposes however it can be useful. 
+
+But how to start that application now? A webserver doesn't know how to
+handle that and neither does python because nothing calls that function.
+Because we're lazy we don't setup a server with WSGI support now but use
+the `wsgiref` WSGI standalone server bundled with python2.5 and higher.
+(You can also download it for python2.3 or 2.4) 
+
+Just add this to your file: 
+
+.. sourcecode:: python
+
+    if __name__ == '__main__':
+        from wsgiref.simple_server import make_server
+        srv = make_server('localhost', 8080, hello_world)
+        srv.serve_forever()
+
+When you now start the file you should be able to get a `Hello John!` on
+`http://localhost:8080/?subject=John`. 
+
+Path Dispatching
+~~~~~~~~~~~~~~~~
+
+You probably worked with CGI or PHP before. If you did so you know that
+you most of the time have multiple public files (`.pl` / `.php`) a user
+can access and that do something. Not so in WSGI. There you only have
+one file which consumes all paths. Thus if you have your server from the
+previous example still running you should get the same content on
+`http://localhost:8080/foo?subject=John`. 
+
+The accessed path is saved in the `PATH_INFO` variable in the WSGI
+environment, the real path to the application in `SCRIPT_NAME`. In case
+of the development server `SCRIPT_NAME` will be empty, but if you have a
+wiki that is mounted on `http://example.com/wiki` the `SCRIPT_NAME`
+variable would be `/wiki`. This information can now be used to serve
+multiple indepentent pages with nice URLs. 
+
+In this example we have a bunch of regular expressions and match the
+current request against that: 
+
+.. sourcecode:: python
+
+    import re
+    from cgi import escape
+
+    def index(environ, start_response):
+        """This function will be mounted on "/" and display a link
+        to the hello world page."""
+        start_response('200 OK', [('Content-Type', 'text/html')])
+        return ['''Hello World Application
+                   This is the Hello World application:
+
+    `continue <hello/>`_
+
+    ''']
+
+    def hello(environ, start_response):
+        """Like the example above, but it uses the name specified in the
+    URL."""
+        # get the name from the url if it was specified there.
+        args = environ['myapp.url_args']
+        if args:
+            subject = escape(args[0])
+        else:
+            subject = 'World'
+        start_response('200 OK', [('Content-Type', 'text/html')])
+        return ['''Hello %(subject)s
+                Hello %(subject)s!
+
+    ''' % {'subject': subject}]
+
+    def not_found(environ, start_response):
+        """Called if no URL matches."""
+        start_response('404 NOT FOUND', [('Content-Type', 'text/plain')])
+        return ['Not Found']
+
+    # map urls to functions
+    urls = [
+        (r'^$', index),
+        (r'hello/?$', hello),
+        (r'hello/(.+)$', hello)
+    ]
+
+    def application(environ, start_response):
+        """
+        The main WSGI application. Dispatch the current request to
+        the functions from above and store the regular expression
+        captures in the WSGI environment as  `myapp.url_args` so that
+        the functions from above can access the url placeholders.
+
+        If nothing matches call the `not_found` function.
+        """
+        path = environ.get('PATH_INFO', '').lstrip('/')
+        for regex, callback in urls:
+            match = re.search(regex, path)
+            if match is not None:
+                environ['myapp.url_args'] = match.groups()
+                return callback(environ, start_response)
+        return not_found(environ, start_response)
+
+Now that's a bunch of code. But you should get the idea how URL
+dispatching works. Basically if you now visit
+`http://localhost:8080/hello/John` you should get the same as above but
+with a nicer URL and a error 404 page if you enter the wrong url. Now
+you could improve that further by encapsulating `environ` in a request
+object and replacing the `start_response` call and the return iterator
+with a response objects. This is also what WSGI libraries like `Werkzeug
+<http://werkzeug.pocoo.org/>`_ and `Paste
+<http://www.pythonpaste.org/>`_ do. 
+
+By adding something to the environment we did something normally
+middlewares do. So let's try to write one that catches exceptions and
+renders them in the browser: 
+
+.. sourcecode:: python
+
+    # import the helper functions we need to get and render tracebacks
+    from sys import exc_info
+    from traceback import format_tb
+
+    class ExceptionMiddleware(object):
+        """The middleware we use."""
+
+        def __init__(self, app):
+            self.app = app
+
+        def __call__(self, environ, start_response):
+            """Call the application can catch exceptions."""
+            appiter = None
+            # just call the application and send the output back
+            # unchanged but catch exceptions
+            try:
+                appiter = self.app(environ, start_response)
+                for item in appiter:
+                    yield item
+            # if an exception occours we get the exception information
+            # and prepare a traceback we can render
+            except:
+                e_type, e_value, tb = exc_info()
+                traceback = ['Traceback (most recent call last):']
+                traceback += format_tb(tb)
+                traceback.append('%s: %s' % (e_type.__name__, e_value))
+                # we might have not a stated response by now. try
+                # to start one with the status code 500 or ignore an
+                # raised exception if the application already started one.
+                try:
+                    start_response('500 INTERNAL SERVER ERROR', [
+                                   ('Content-Type', 'text/plain')])
+                except:
+                    pass
+                yield '\n'.join(traceback)
+
+            # wsgi applications might have a close function. If it exists
+            # it *must* be called.
+            if hasattr(appiter, 'close'):
+                appiter.close()
+
+So how can we use that middleware now? If our WSGI application is called
+`application` like in the previous example all we have to do is to wrap
+it: 
+
+.. sourcecode:: python
+
+    application = ExceptionMiddleware(application)
+
+Now all occouring exceptions will be catched and displayed in the
+browser. Of course you don't have to do that because there are many
+libraries that do exactly that and with more features. 
+
+Deployment
+~~~~~~~~~~
+
+Now where the application is "finished" it must be installed on the
+production server somehow. You can of course use wsgiref behind
+mod_proxy but there are also more sophisticated solutions available.
+Many people for example prefer using WSGI applications on top of
+FastCGI. If you have `flup <http://trac.saddi.com/flup>`_ installed all
+you have to do is to defined a `myapplication.fcgi` with this code in: 
+
+.. sourcecode:: python
+
+    #!/usr/bin/python
+    from flup.server.fcgi import WSGIServer
+    from myapplication import application
+    WSGIServer(application).run()
+
+The apache config then could look like this: 
+
+.. sourcecode:: apache
+
+    <ServerName www.example.com>
+        Alias /public /path/to/the/static/files
+        ScriptAlias / /path/to/myapplication.fcgi/
+    </ServerName>
+
+As you can see there is also a clause for static files. If you are in
+development mode and want to serve static files in your WSGI application
+there are a couple of middlewares (werkzeug and paste as well as "static"
+from Luke Arno's tools provide that) available.
+
+NIH / DRY
+~~~~~~~~~
+
+Avoid the "Not Invented Here" problem and don't repeat yourself. Use
+the libraries that exist and their utilities! But there are so many!
+Which one to use? Here my suggestions:
+
+Frameworks
+^^^^^^^^^^
+
+Since Ruby on Rails appeared on the web everybody is talking about
+frameworks. Python has two major ones too. One that abstracts stuff very
+much and is called `Django <http://www.djangoproject.com/>`_ and the other
+that is much nearer to WSGI and called `pylons
+<http://www.pylonshq.com/>`_. Django is an awesome framework but only as
+long as you don't want to distribute your application. It's if you have to
+create a webpage in no time.  Pylons on the other hand requires more
+developer interaction and your applications are a lot easier to deploy.
+
+There are other frameworks too but **my** experiences with them are quite
+bad or the community is too small.
+
+Utility Libraries
+^^^^^^^^^^^^^^^^^
+
+For many situations you don't want a full blown framework. Either because
+it's too big for your application or your application is too complex that
+you can solve it with a framework. (You can solve any application with a
+framework but it could be that the way you have to solve it is a lot more
+complex than without the "help" of the framework)
+
+For that some utility libraries exist:
+
+* `Paste <http://www.pythonpaste.org/>`_ — used by pylons behind the scenes.
+  Implements request and response objects. Ships many middlewares.
+
+* `Werkzeug <http://werkzeug.pocoo.org/>`_ — minimal WSGI library we wrote
+  for pocoo. Ships unicode away request and response objects as well as an
+  advanced URL mapper and a interactive debugger.
+
+* `Luke Arno's WSGI helpers <http://lukearno.com/projects/>`_ —
+  various WSGI helpers in independent modules by Luke Arno.
+
+There are also many middlewares out there. Just look for them at
+the `Cheeseshop
+<http://cheeseshop.python.org/pypi?:action=search&term=wsgi>`_.
+
+Template Engines
+^^^^^^^^^^^^^^^^
+
+Here a list of template engines I often use and recommend:
+
+
+* `Genshi <http://genshi.edgewall.org/>`_ — the world's best XML template
+  engine. But quite slow, so if you need a really good performance you have
+  to go with something else.
+
+* `Mako <http://www.makotemplates.org/>`_ — stupidely fast text based
+  template engine. It's a mix of ERB, Mason and django templates.
+
+* `Jinja2 <http://jinja.pocoo.org/>`_ — sandboxed, designer friendly and
+  quite fast, text based template engine. Of course my personal choice :D
+
+Conclusion
+~~~~~~~~~~
+
+WSGI rocks. You can simply create your own personal stack. If you think
+it's too complicated have a look at werkzeug and paste, they make things a
+lot easier without limiting you.
+
+I hope this article was useful.