Skip to content

Traps for New Players

Alice Zoë Bevan–McGregor edited this page Jul 3, 2020 · 28 revisions

These are issues that have arisen and been discussed, collected together and polished up for collective benefit. Almost a "frequently asked questions", but more problem-driven.

Contents

  1. The data coming in appears missing or mangled. TL;DR

  2. The extension system is difficult to follow and understand, making input processing difficult.

  3. View logic appears hard-coded within WebCore and was confusing to understand and extend.


The data coming in appears missing or mangled.

A common pattern when processing large amounts of data is to "glob" all keyword arguments into a dictionary for processing within the endpoint. This can be a useful shortcut when describing an entire form structure using an argument list specification is tedious or excessive, when you truly accept an unrestricted potential set of key-value pairs, or for when you are utilizing a third-party tool for form processing and just need "all of the submitted data at once" to process.

The pattern in its simplest functional form appears as:

def endpoint(**data):
	...  # Do something with `data`.

This will collect all (named) keyword arguments into a dictionary, see the example function declarations from the tutorial for demonstrations of use.

Dictionary Interactions

The common trap comes from a frequent misunderstanding as to how to interact with dictionaries (and mappings more generally) in Python, especially for developers coming from languages such as JavaScript where there is essentially no differentiation between an object (in relation to method of access) and a dictionary or mapping. Use of obj as a temporary variable name, actually containing a dictionary, is a common warning sign. This can lead to an attempt to utilize object introspection and manipulation tools against a vastly different fundamental type:

def endpoint(**data):
	for k in dir(data):
		getattr(data, k)

Despite not really doing anything, this demonstrates the signature of the problem: use of dir() to enumerate the attributes of an object, and use of getattr() to retrieve the value of an attribute by name. Fundamental types (scalars and containers) in Python are dual-natured, so unless explicitly utilizing a dictionary-like object advertised as an "attribute access dictionary", data elements are not attributes.

Deeper into Python Datatypes

The constant 27 is the number twenty seven to humans, an instance of an integer containing the value twenty seven, and a label used to retrieve the integer value twenty seven. Many scalar types (integers, floats, so forth) are understandably "read-only", in that you can't redefine the value of two. Strings are also read-only structures; they are not mutable arrays of characters as in C. Lists are technically pre-allocated packed arrays, and dictionaries are, of course, hash maps.

"Objects" in Python are, in general, built on top of dictionaries. ("Turtles all the way down.") Most user-defined classes and instances will have a __dict__ attribute that is a dictionary mapping of values defined at that scope. The core types do not, as they are C-backed and not dynamically allocated. (For further reading, look into the difference between class __init__ and __new__, and __slots__.) As a result, a number of standardized protocols have arisen to implement higher-level functionality such as the syntax of dictionary and list indexing/dereferencing (__getitem__) and generalized iteration (__iter__).

This is all a very long-winded way of approaching while explaining the how and the why of it, the simple idea that as you can "iterate a list" to iterate the values contained within that container type, you can "iterate a dictionary" to iterate the keys contained within that dictionary:

Dictionary Iteration TL;DR

Python dictionaries are not JavaScript objects which permit simultaneous attribute-based and index-dereferenced access. Do not iterate attributes via dir(), iterate the dictionary itself, similar to the difference between for…in and for…of.

def endpoint(**data):
	for key in data:
		data[key]

Additionally there is a .keys() method returning an iterable view upon those keys, but simple iteration is usually optimal 95% of the time. See also: .values() and .items() for iterating the values themselves, or the combined set as a series of 2-tuples:

def endpoint(**data):
	for key, value in data.items():
		...  # Do something with the key and value.

As a side-effect of being iterable, you may pass a dictionary to the constructor of other iterable container types such as list(data), tuple(data), or set(data), in order to get a list, tuple, or literal set of the keys, as appropriate, or passed to any function or method expecting an iterable.

The extension system is difficult to follow and understand, making input processing difficult.

The logic for validating and preprocessing inputs is spread across multiple extensions.

Yes, this is by design, however they are all organized into a singular module. Not every application requires the context be added to pure functional endpoints, esp. if you are using the thread-local superglobal to access it instead. Ricers may wish to disable the ValidateArgumentsExtension since the preparatory getcallargs invocation is expensive. Don't want unprocessed path elements used as priority positional arguments, don't include RemainderArgsExtension.

Some extensions are loaded by default and must be overridden to be disabled.

The sum total of the extension configuration during startup:

# We really need this to be there.
if 'extensions' not in config: config['extensions'] = list()

if not any(isinstance(ext, BaseExtension) for ext in config['extensions']):
	# Always make sure the BaseExtension is present since request/response objects are handy.
	config['extensions'].insert(0, BaseExtension())

if not any(isinstance(ext, arguments.ArgumentExtension) for ext in config['extensions']):
	# Prepare a default set of argument mutators.
	config['extensions'].extend([
			arguments.ValidateArgumentsExtension(),
			arguments.ContextArgsExtension(),
			arguments.RemainderArgsExtension(),
			arguments.QueryStringArgsExtension(),
			arguments.FormEncodedKwargsExtension(),
			arguments.JSONKwargsExtension(),
		])

config['extensions'].append(self)  # Allow the application object itself to register callbacks.

Broken down, if there is no BaseExtension already configured and included in the set of extensions passed in, instantiate one with default configuration:

if not any(isinstance(ext, BaseExtension) for ext in config['extensions']):
	# Always make sure the BaseExtension is present since request/response objects are handy.
	config['extensions'].insert(0, BaseExtension())

Then, check for the existence of any ArgumentExtension subclass within the already configured set of extensions; there will be none except those explicitly configured by the application utilizing the framework:

if not any(isinstance(ext, arguments.ArgumentExtension) for ext in config['extensions']):

Only if there aren't any already configured, will the default set be loaded, with default configurations. Override any input data processing behavior, WebCore steps back and trusts your take on it. In the upcoming WebCore 3 next branch, things get more… lazy? In that the application object declares "needs" flags, and WebCore will work out the extension graph needed to fulfill the declared requirements.

For example the ArgumentExtension must be overridden…

If you simply wish to contribute additional datasources, your extension does not need to subclass ArgumentExtension and trigger that conditional. You can cohabit without exclusion trivially. This can be accomplished without even writing a dedicated, discrete "extension". Instead, you can subclass Application and add any extension callback methods you wish there, which is an excellent choice for any truly application-specific and non-reusable processes.

Dependency execution ordering is also specifiable, but distributed across the ext files via dependency lists.

In WebCore 2 the needs, uses, provides "feature flags" are only used for sorting—via Tarjan "strongly connected component" sorting—not automatic inclusion. WebCore 3 extends this; declaring a need or extension as the string name of that extension will enable the extension if not otherwise configured, and able to do so.

…it's difficult to understand what is going on and when.

Highly recommend PUDB or WDB step debugging. Being familiar with a step debugger is an empowering experience, and lets you REPL any line of execution at will and literally walk through the code statement by statement as it processes to see what's happening.

WDB additionally lets you do this remotely. I also use PUDB with Pytest for interactive debugging of test failures via pytest-pudb.

View logic and error page representations appear hard-coded within WebCore and was confusing to understand and extend.

The processing of context.view._map prevents overriding context.response serialization.
For example, HTTPException inherits webob.Response and is therefore not differentiable in the serialization handler selection.

This possibly sounds like an unusual requirement or XY problem?

Returning an HTTPException subclass is intended to operate identically to returning a "full" / "custom" Response instance, as that's what it is, a response. (It's literally a subclass.)

Returning from an endpoint is the non-exceptional way of making use of these tools for expected states. I.e. return an HTTPNotFound instance if this state is expected, raise it if it's surprising or exceptional that this is the case, or possibly in situations where you need to escape multiple levels of stack. This is discouraged over improved code structure to make that unnecessary.

It helps to convey intent / surprise. Additionally, since these are used for non-normal responses (non-200), most of the time the response code will be caught by a front-end load balancer and internally directed at an appropriate "attractive" error page, rather than having the returned response used bare. Reference any 404 or 500 on Google services for examples.

Sample nginx.conf Directives
error_page 404 /404.html;
error_page 500 502 503 504 /50x.html;

# More complex and interesting arrangements are possible.
# Please reference the documentation, linked above.

Without a FELB, this type of exceptional status → page/secondary request mapping is accomplishable using WSGI middleware. A FELB is recommended, for this, and to optimize delivery of static assets.

Where it happens…

The view mapping is a "multi-dict". The framework looks up the list of possible views from the mapping of views keyed on the literal type (class) of the object returned by the endpoint. Essentially and almost literally:

yield from available_views[type(result)]

Except using .getall() to avoid dict-compatibility with single-value mappings. (We actually do want all possibilities for that key, not just the first.) This "short-circuits" because it doesn't require expensive iterative isinstance() calls to validate. type(), look up, done.

The fallback requires iterating all registered views, isinstance(result, candidate_view_type) them to see if it might work, yielding only those that match. This covers the cases for "abstract" classes and such, where isinstance() is supported, but nothing will ever be an actual subclass or instance of it, nor will anything ever have a type() of that abstract class.

…difficulty understanding which handler will be called when.

The _map can be probed and examined; its operation should be quite straight-forward, even if this pattern seems unusual or new. It's based entirely on a) the type() of the value you return, if possible, and b) which view (keys) match the isinstance() test, to cover abstract cases, subclasses, etc.

Extending WebCore Views

Endpoints are the handlers for a given dispatched URI, usually to retrieve or manipulate a resource identified by that URI. They are generally intended to return (present) the resource they represent, or for more RPC-like APIs, the result of the RPC call. It is thus highly encouraged to either define standard serialization protocol methods on your resource classes ("models", such as __json__ or __html__) or to register custom views which can apply your model to the response.

You can create a full custom extension to contain these, however, for views and other extension functionality that are application-specific, a shortcut is provided. Instead of using web.core:Application and directly instantiating it, you can subclass it and attach extension callback methods to it directly, then use your customized subclass. For example:

class FrobozzCoFribulator(Application):
	def start(self, context:Context) -> None:
		context.view.register(Fribulator, self.render_frib)
	
	def render_frib(self, context:Context, result:Fribulator) -> bool:
		context.response.body = f"This is a {result.__class__.__name__}™ by FrobozzCo."
		return True

Even if Fribulator is a subclass of another type with a registered view, registering the subclass specifically should utilize the "short circuit" for direct lookup, and not match the superclass' view.


"Trap for new players" stolen without hesitation from EEVblog, as a fairlyfrequently utilized term.