Skip to content

Architectural Design

Alice Zoë Bevan–McGregor edited this page Jun 30, 2020 · 4 revisions

There are quite a number of possible approaches to web application development in Python. Luckily, some standardization has occurred, resulting in the definition of the Web Server Gateway Interface (WSGI) for Python, and with this clear definition of the roles and responsibilities of web servers, and applications, the ability to define middleware which sits between the two, supplying a wide variety of functionality.

As developers embraced WSGI, a problem arose. Many of these "middleware components" were, essentially, only providing some form of prepared value or API interface relating to the request, or transformation of the outbound response. Invariably the former would end up stuffed into the WSGI environment dictionary to be later "picked up" (extracted) and used, while the latter suffered from problems (and complexity) interposing the start_response HTTP status code and header emitter function passed to the WSGI application, let alone the complexity of interposing a return value, or calls to the body writer returned by start_response. No major categorical standards resulted, e.g. an agreed-upon standardized API to implement sessions, or database connectivity, or template rendering, or serialization, though the capability was there to support such standardization.

As a result, each framework has developed their own incompatible mechanisms of extension and utilization of otherwise standardized components. For example, Flask utilizes, internally, the Jinja2 template engine (which makes sense, given the authors of Flask wrote that, too) and exposes, for hosted application use, a flask.render_template function. Django provides the URL registry "helper" of TemplateView.as_view(template_name="…"), speaking its own Django Template Language (or Jinja2). Etc. Want to use your own? Go nuts, but the amount of support offered by your framework of choice may be minimal to none, and depending on the prevalence of front-end components such as "automated administration areas" and such, you might not be able to ever remove the default; or secure it, if those shared templates rely on default behaviours. (Case example: Django; it has potentially highly dangerous default behaviours for missing variables and interpolation of None values, that is, treat as empty strings. Django Admin breaks badly if this is made more explicitly explody—errors should not pass silently, according to the Zen of Python, and referencing an undefined symbol should absolutely be an error.)

WebCore is… not particularly different, on the surface. The difference lies in the weight of opinion and choices of default. WebCore has none for most components you might think of. Bring your own template engine, database layer, really think if "arbitrary unbounded key-value storage" is reasonable for "sessions"—it's probably not—etc.

Note: Links and code references to WebCore code in this document will refer to the 2.0.3 release version for the sake of consistency. Don't want the bookmarked and highlighted line numbers getting off when new versions are released!

A note on terminology: MVC, MVP, MVT, oh my!

For some time there has been a craze, the drive towards Model-View-Controller ("MVC") application design and separation—above all else. Even if it makes no damn sense. Some frameworks have pushed back against this trend, such as Django claiming to be a "model, view, template" framework. (Where endpoints are the view.) WebCore, as may have been noticeable already, refers to the executable code invoked in response to an incoming request, the "endpoint".

Let's break down the terms:

  • Model

    A lot of developers, and especially as a trap for new players, think of a model only as the data model, a description of the attributes relevant to a given resource. In actuality, it's everything related to that data, including most forms of manipulation, integrity, and validation. It's the business rules about your data, and the shape of your data. Not just that it has a state that is a string with one of several possible values, but the code describing exactly what happens as a result of that value changing in specific ways. A classic example is that of "approving" an invoice; you set its state to approved. The act of doing so ought to do (or enqueue for later doing) whatever additional work is needed as a result of that change.

    As such, regardless of what you end up calling them, the endpoints are primarily there as a bridge between the external (typically HTTP) interface and that model. They translate the restricted form of incoming data (it's all just Unicode text) to the more nuanced back-end API specification, then present the result of that work back to the user agent in some form, even if it's just to say "It's all OK, boss."

  • View

    There seems to be some competing thought about this one. Many developers think of a view and immediately their brain slides over to the template used to generate a view. More generally, a view is a representation of a resource. This view may be user-centric, such as an HTML template with accompanying CSS and JS for front-end use, but representations aren't restricted to that. JSON is a representation, the code transforming a resource (object) into that form… the view.

    This is partially why Django just tossed the term "controller" out the door entirely; their endpoints really are views, applying changes to the response directly, or indirectly through invocation of template support functions.

  • Controller

    Seen far more often in "native app" development and front-end, true controllers are objects whose existence is persistent. They continue to exist so that they may handle (with reasonable speed) events sent from the view. Classical example being "clicking a button".

    Client-side frameworks can legitimately claim use of controllers, server-side can only in very specific circumstances, such as many WebSocket integrations where the handler persists much longer.

  • Presenter

    On the web, browsers are a bit smarter than the average puppy. They know what to do, and have predefined behaviour for clicking on many things, thus there is no need whatsoever for a discrete controller to persist for every page (one is already provided by the user agent).

    This also helps avoid the delays involved in networked round-trips; if every movement of the mouse required confirmation server-side before your cursor visually updated, and your browser only knew what you clicked on after the server got back to it to resolve the surface under your cursor, your pointing device would be rendered absolutely, frustratingly useless. (Try X11 over a 3G or 2G connection some time for a keen demonstration of this!) It also reduces server-side overhead, as connections (and the objects involved in resolving requests) can be ephemeral and short-lived: accept, service, respond, die. This makes the web economical!

    Back-end HTTP endpoints are Meeseeks, to whom existence is pain, wanting nothing more than to complete their given, focused task, then cease being.

    This makes them presenters by definition.

  • Endpoint

    An endpoint is what WebCore refers to the object (typically a callable function or method, but static values are allowed, too) resolved by dispatch as the handler for the given request. It is the code that is passed input from the request such as query string and form body as gathered by extension collect callbacks, and returns a value to be applied to the response by a type-associated view. Such views may then additionally utilize Accept header negotiation to determine specific serialization.

    It is generally best practice to ensure your endpoints are as isolated from the whole concept of HTTP as possible. This permits easier isolated unit testing, as well as permits use as a "native Python API", possibly exposed via other mechanisms, such as via command-line script.

There are a few additional terms we make use of that will be important to understand:

  • Ingress The "inbound" side of the request/response cycle. Data coming in from the user agent.

  • Egress The "outbound" side of the request/response cycle. Data going out to the user agent.

Contexts

There are two primary scopes, or "contexts" currently utilized:

  • Application Context
    The global state of the application, outside of any given client request. Essentially, this is the information (and APIs) that are stable between requests, populated once by WebCore extension objects on application startup.

    It is not common to access this from application code, as everything represented within will be exposed (or, at least, accessible) within the request context. Extension authors, however, will be provided this instance on application-level callbacks such as startup and shutdown. Note that the RequestContext attribute of the Application instance is the promoted-to-class version of this cooperatively populated mapping.

    A key feature to this is the ability for extensions to define, on startup, descriptor attributes that can be utilized during the request. Essentially, "lazy attributes", saving the need to eagerly populate those attributes on every request, regardless of use. Without the capability of "promotion to a class", descriptors would be entirely non-functional.

  • Request Context
    The request-specific state of the application, incorporating the WSGI environment, and otherwise populated by WebCore extension objects that "contribute" to this shared namespace. On each request, a new instance of the pre-built RequestContext object (representing the prepared application context) is instantiated, then registered extension callbacks invoked to populate.

    Attributes not overridden by an extension during the prepare phase that were present in the ApplicationContext will remain accessible, by virtue of class-based inheritance.

Other contexts are possible, including contexts completely divorced from the interface that is the web. Examples include:

  • Deferred Context
    A context of execution divorced from the initial request that triggered its use. Used for background task execution, possibly even in an entirely different thread or process (or server) than processed the originating request.

  • Command-Line Context
    Used when processing "requests" triggered through command-line invocation.

  • Interactive Shell Context
    When invoking an interactive REPL shell the "active request" occurring during that interactive session is a shell context, noting that this, like the command-line context and possibly like many deferred contexts, is entirely divorced from HTTP communication.

  • Debugger Context
    In the event of an interactive debugging session caused by an uncaught exception, or by explicit request for a web-based REPL, a "debugger context" may be constructed.

Python Web App Basics: WSGI Application

At the lowest level of communication exists the functional invocation interface of WSGI. Some how some form of web server has accepted and parsed a request, and is prepared to ask your application for the appropriate response. To do so, it invokes your "WSGI application", a callable (e.g. function or "callable object", one having a __call__ method) passing two arguments. This callable eventually invokes the second argument, passing the response code and list of headers to transmit back to the client, then (to keep things simple) returns the response body iterable. The most basic example:

def hello(environ, start_response):
	start_response(b"200 OK", [(b"Content-Type", b"text/plain")])
	yield b"Hello world!"

This is a generator function, whose invocation results in an iterable object. You could also straight return an iterable, such as a list of the body chunks. I generally prefer to not whole-buffer responses before delivering anything at all to the client. It improves apparent responsiveness, and, in some cases, can mean the difference between the client getting a response, or not getting one at all, if timeout limits are reached during generation of that concrete list.

The WebCore Application Object

Like other frameworks, WebCore features a specific and singular "entry point" for configuration and request processing: the Application class, representing a WSGI application. Your WSGI application. In your __main__.py you might have:

from web.core import Application


app = Application("Hi there.")


if __name__ == '__main__':
	app.serve('wsgiref', host='127.0.0.1', port=8080)

Now, you may notice there's not much in the way of "code", here. This is almost literally the smallest possible definition: a single line if you really try and don't mind it being ugly and non-conditional. (This form works better for hosting under environments like uWSGI, which need access to that application instance.) On each request, WebCore sees the string literal "Hi there." and just hands it back, as if you had a functional endpoint which always returned that value when invoked, instead.

Let's explore exactly how that happens; welcome to your tour.

Request Processing Lifecycle

Formulating a Request

The user clicks a link or enters an address targeting one of your application's publicly accessible URLs. Their web browser ("user agent") constructs a request comprised, primarily, of the HTTP request line and collection of request headers, such as cookies, a version string for the user agent, the allowable return types, acceptable natural languages, and possibly "cache headers" if the resource has been requested in the past, etc.

I'm being careful to not always refer to the user agent as a "browser", as many user agents are not graphical browsers. Spiders, some screen readers, and application API clients may speak HTTP to your application but might not have a visual user interface of any kind.

An example HTTP request might be as simple as: (literally everything other than this is optional)

GET / HTTP/1.1
Host: example.com

The "front-end load balancer" (FELB) that is the initial responder to user requests for your application takes this request, parses the HTTP method, path, and version, parses the remaining headers, then uses this information to dispatch (or route) the request internally. For example, the Host header will likely be used to look up the relevant virtual server declaration. Eventually this will get directed at a "reverse proxy" (speaking HTTP to your application process), FastCGI proxy (if speaking FastCGI), uWSGI, or other.

Invoking in Response

Ultimately, the parsed HTTP headers and some aspects of the initial HTTP request line are placed in a dictionary, and normalized to conform to the WSGI specification for key naming. When this task is completed (by a Python HTTP server, Python FastCGI host, uWSGI, etc.) the application object is invoked (called) passing this dictionary and a callback function to start the response.

In WebCore, the Application class is this WSGI application. Within your own code, as can be seen in the demonstration above, you instantiate Application and pass it your application root; that is, the entry point into the "endpoint hierarchy" for your application, what many other frameworks call your controllers or controller tree. In back-end Python frameworks "controllers" are better called presenters (it's MVP, not MVC!) as the instance itself does not persist to process events from the view it… presents to the requesting client.

Ignoring most of the configuration processes contained within the class for now, the actual WSGI application callable is the application method. Equivalent to our trivial single-function hello example above. During initialization, this, or rather, the outermost WSGI middleware layer (or the application method, if no middleware is utilized,) becomes the __call__ method. This makes the instance itself invokable as if it were a function. The process implementing this is surprisingly simple, starting on line 94:

# Handle WSGI middleware wrapping by extensions and point our __call__ at the result.
app = self.application
for ext in exts.signal.middleware: app = ext(context, app)
self.__call__ = app

Track the "current" WSGI application callable starting with our application method, iterate configured extension objects which declare middleware callbacks, call them passing in the current object, updating that reference with the values returned as we go. Extensions may choose to not wrap or apply middleware (e.g. can return the original, possibly unmodified callable) based on their own configuration or environment of execution, or may do anything any other Python WSGI middleware layer can do, such as ignore the provided inner WSGI application object and use something else entirely during the request, if it so chooses.

The primary component of WebCore that utilizes the capabilities and functionality of middleware is something that is exceptionally difficult to implement any other way: the interactive debugger and uncaught exception handler. WSGI applications may explode for all sorts of reasons—including faulty framework code!—and we definitely want to be able to catch and diagnose those errors. If it were up to the core framework to handle this, then only application-generated exceptions would likely be covered, not ones internal to the framework itself, unless very carefully built. Ultimately, doing such ourselves would have added greatly to the size and complexity of the codebase.

For the rest, WebCore intentionally avoids the over-broad brush of middleware for providing smaller, less invasive components. This has had a dramatic impact on the modularity and testability of suddenly and surprisingly independent components. We will explore some of these aspects as we continue processing our hypothetical request though the code. Most applications will have the debug middleware called by the web server or bridge, which will in turn call the application method of the Application object, wrapped in a "safety net" (post-mortem) debugger.

1. Preparation

Starting on line 214 the application method immediately starts off by preparing the "request context", WebCore API's representation for the state of processing during a request. WebCore does not utilize thread locals, sometimes known as "superglobals", objects you can import from framework modules which magically become the "current" version of themselves during a request. We don't use this mechanism any more, as the difficulties involved in use outside of the context of a request (e.g. concurrent.futures), and when you have multiple applications you wish to host within one process, should be easy to imagine.

We used to provide web.core.request, web.core.response, etc., today, all of these per-request variables are tracked within a single mapping object, the RequestContext instance for the current request. An optional extension is provided to continue to provide these "superglobals", if really needed in your own application code. No shame if it's a real need! We just prefer to avoid it to keep things simpler and more directly obvious.

Where things get a little strange or hard-to-follow is that Context objects are not normal mappings, like dictionaries, but are attribute access dictionaries that extensively exploit Python's class-based inheritance and method resolution order (MRO)—providing the core ability for an instance to be promoted to a new class, for later instantiation and re-use. The purpose of promotion to a class is not just to exploit inheritance, but also to make use of Python's expressive (data) descriptor protocol for attributes contributed by extensions. (Simple example: "lazily loaded" or otherwise "calculated" values.)

The more Python you learn, the more you realize the object model is pretty much dictionaries all the way down. At least, until you hit the lowest level C-backed objects/types or ones that declare __slots__—essentially "packed instances"—a boost to efficiency of storage by eliminating the dynamism of dictionary backing.

2. Dispatch

After all extensions with a prepare callback have been executed, in the appropriate order based on their declared dependencies, the bulk of the operation can begin.

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L220

3. Mutation (Argument Collection)

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L177

4. Invocation

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L193

5. Result Transformation

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L199

5. View Application

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L244

7. Request/Response Cycle Cleanup

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L257

8. Response Streaming

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L267

9. Post-Response Activity

https://github.com/marrow/WebCore/blob/2.0.3/web/core/application.py?ts=4#L263