# Descriptors: secret weapon of frameworks

A talk presented at [Python Cerrado / Plone Conference 2024](https://2024.ploneconf.org/en/schedule/talks/descriptors-secret-weapon-of-frameworks)

## Motivating example

<img src="bulk_food_by_Dan_Bruell.jpg" alt="Bulk food bins, by Dan Bruell" width="600"> 

Bulk food bins, by Dan Bruell (CC BY-NC-SA 2.0)

### Class for an item in an order of bulk food

Imagine an app for a store that sells organic food in bulk,
where customers can order nuts, dried fruit, or cereals by weight.
In that system, each order would hold a sequence of line items,
and each line item could be represented by an instance of a
class like this:

In [1]:
class LineItem:

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price

That's nice and simple:

In [2]:
raisins = LineItem('Golden raisins', 10, 6.95)
raisins.subtotal()

69.5

But there's a problem:

In [3]:
raisins.weight = -20
raisins.subtotal()

-139.0

This is a toy example, but not as fanciful as you may think.
Here is a story from the early days of Amazon.com:

>  We found that customers could order a negative quantity of books! And we would
  credit their credit card with the price and, I assume, wait around for them to ship the
  books.<br>
  — Jeff Bezos, founder and CEO of Amazon.com, interviewed by Wall Street Journal in “Birth of a Salesman” (October 15, 2011)

### Type checkers won't save us

In [None]:
from dataclasses import dataclass

@dataclass
class LineItem:
    description: str
    weight: float
    price: float

    def subtotal(self):
        return self.weight * self.price

There is no way to specify that `weight` must be a `float` greater than zero.

Only [dependend types](https://en.wikipedia.org/wiki/Dependent_type) could fix this,
but that's a feature that exists only in obscure, academic functional languages like Agda and Idris.

### Why not getters and setters?

The traditional way of controlling attribute access in C++ or Java is to use _getters_ and _setters_: methods that get and set values for attributes that are declared private or protected in some way.

But that is verbose, and produces what the famous Pythonista Alex Martelli calls "goofy idioms" like this:

```python
item.set_quantity(item.get_quantity() + 1)
```

Using public attributes, properties, or descriptors, the uglyness of getters and setters is hidden from the users of your class, and they can accomplish the same result writing this:

```python
item.quantity += 1
```

### Properties to the rescue

> The crucial importance of properties is that their existence makes it perfectly safe and
 indeed advisable for you to expose public data attributes as part of your class’s public
 interface. <br>
  — Martelli, Ravenscroft, and Holden, “Why properties are important”, _Python in a Nutshell, 2nd edition_


Properties enable the implementation of getters and setters without
changing the public interface of a class that previously
allowed reading and writing public attributes via `item.quantity` notation.

In [4]:
class LineItem:

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight  # (1)
        self.price = price

    def subtotal(self):
        return self.weight * self.price

    @property  # (2)
    def weight(self):  # (3)
        return self.__weight  # (4)

    @weight.setter  # (5)
    def weight(self, value):
        if value > 0:
            self.__weight = value  # (6)
        else:
            raise ValueError('weight must be > 0')  # (7)
            
    @property
    def price(self):
        return self.__price

    @price.setter  # (5)
    def price(self, value):
        if value > 0:
            self.__price = value
        else:
            raise ValueError('price must be > 0')

Uncomment to see a demonstration:

In [6]:
raisins = LineItem('Golden raisins', -10, 6.95)

ValueError: weight must be > 0

## Descriptors to the rescue

The previous example is fine, but there's a lot of repetition.
A descriptor class allows reuse of attribute validation logic.

By the way, `@property` itself is implemented as a descriptor class.

### `Quantity`: a validation descriptor

We'll implement a descriptor class named `Quantity` that enforces the rule that a numeric attribute must be greater than zero.

Here's how we want to write `LineItem` with quantity:

```python
class LineItem:
    weight = Quantity()
    price = Quantity()

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price
```

### A descriptor class and a managed class

Before looking a the implementation of `Quantity`, consider just these lines of `LineItem`:

```python

class LineItem:
    weight = Quantity()
    price = Quantity()
```

What is happening here?

Both `weight` and `price` are attributes of the `LineItem` **class**.

Also, they are two separate instances of `Quantity`.

Here is a UML class diagram for that:

<img src="flpy_2301.png" alt="UML class diagram showing relationship between `LineItem` and `Quantity`" width="800"> 


Metaprogramming may involve unusual relationships between classes and instances that may be hard to read in UML.

So, I invented the Mills & Gizmos Notation to enhance UML diagrams. Here is a sample of MGN:

<img src="flpy_2303.png" alt="Classes and instances drawn as mills and gizmos." width="800"> 


Here is that same UML diagram annotated with Mills & Gizmos:

<img src="flpy_2302.png" alt="UML class diagram showing relationship between `LineItem` and `Quantity` with mills and gizmos" width="800"> 


### Implementing `Quantity`, the simplest way

Descriptor classes are implemented in frameworks like Django or SQLAlchemy, not application code.

For example, this could be part of a `validators.py` module in a framework:

In [7]:
# part of a validators.py module

class Quantity:  # (1)

    def __init__(self, storage_name):  # (2)
        self.storage_name = storage_name

    def __set__(self, instance, value):   # (3)
        if value > 0:
            instance.__dict__[self.storage_name] = value
        else:
            msg = f'{self.storage_name} must be > 0'
            raise ValueError(msg)

**(1)** descriptor is a protocol-based feature: there's no need to subclass anything, just implement one of the relevant special methods.

**(2)** The `storage_name` is the name of the attribute in each managed class instance that will hold the value of the attribute controlled by the descriptor.

**(2)** Implementing `__set__` makes a class behave as a descriptor. `self` is the descriptor instance, `instance` is the managed instance, and `value` is the value being set.

**(4)** If we have a valid `value`, then we save it in the `instance.__dict__`. We can't use `settattr(instance, self.storage_name, value`) because that would trigger the descriptor `__set__` again, leading to uncontrolled recursion.

**(5)** We build a user friendly message explaining how to fix the problem, and raise `ValueError`.


Here is the application code that would use that:

In [8]:
# from validators import Quantity

class LineItem:
    weight = Quantity('weight')
    price = Quantity('price')

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price

In [10]:
granola = LineItem('Bacon granola', 10, 0)

ValueError: price must be > 0

Now consider this example. Can you spot the bug?

In [11]:
class Dog:
    weight = Quantity('weigth')

    def __init__(self, name, weight):
        self.name = name
        self.weight = weight


rex = Dog('Rex', 8.5)
rex.weight

<__main__.Quantity at 0x73cdb06eeb70>

### `Quantity` take #2: automatic storage_name

In [12]:
# part of a validators.py module

class Quantity:

    # (1)
    
    def __set_name__(self, owner, name):  # (2)
        self.storage_name = name          # (3)

    def __set__(self, instance, value):   # (4)
        if value > 0:
            instance.__dict__[self.storage_name] = value
        else:
            msg = f'{self.storage_name} must be > 0'
            raise ValueError(msg)

**(1)** we don't need the `__init__` anymore in this example.

**(2)** `__set_name__` is the newest special method of the descriptor protocol: `self` is the descriptor instance, owner is the managed class (where the descriptor is instantiated as a class attribute) and `name` is the name of the managed class attribute to which the descriptor is assigned, for example `"weight"` or `"price"` in the `LineItem` example.

**(3)** here we set `self.storage_name`, which we previously did in `__init__`.

**(4)** the rest of the code is the same as before.

The `__set_name__` special method is called by the `type` metaclass, the factory of all Python classes, when a descriptor is assigned to a class attribute. It was added to the descriptor protocol in Python 3.6. Solving the problem of the `storage_name` was one of the main uses cases for custom metaclasses. Now we don't need them for this reason.

Uncomment to see a test that will generate a validation error due to a negative `weight`:

In [None]:
# raisins = LineItem('Golden raisins', -10, 6.95)

### `Quantity` is an overriding descriptor

Any descriptor that implements the `__set__` or `__delete__` methods is an overriding descriptor,
because although it is a class attribute, that descriptor will override
attempts to assign or delete the corresponding instance attribute.

The terminology about descriptors varied accross the Python documentation and books over the years.
I adopted the terminology of _Python in a Nutshell_.

| current term | alternative terms | definition | examples |
| :- | :- | :- | :- |
| **overriding descriptor** | data descriptor, enforced descriptor | A descriptor with `__set__` or `__delete__` | validators |
| **non-overriding descriptor** | non-data descriptor, shadowable descriptor | A descriptor with `__get__` only | methods, caches |


**TIP**: Properties are also **overriding descriptors**:
if you don’t decorate a setter function,
the default __set__ from the property class will
raise AttributeError to signal that the
attribute is read-only.

## A non-overriding descriptor for caching

In [13]:
from time import sleep
from random import randint

class Cached:

    def __set_name__(self, owner, name):
        self.storage_name = name

    def __get__(self, instance, owner):
        value = instance.__dict__.get(self.storage_name, ...)
        if value is ...:
            sleep(5)  # pretend to do a lot of work
            value = 42
            setattr(instance, self.storage_name, value)
            return value

class Question:
    answer = Cached()

In [14]:
%%time
q = Question()
q.answer

CPU times: user 1.06 ms, sys: 2.24 ms, total: 3.31 ms
Wall time: 5 s


42

In [15]:
%%time
q.answer

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 8.11 µs


42

`Cached` is a **non-overriding** descriptor.

The `__get__` method sets the attribute named `self.storage_name`, which is `"answer"` in this example.

When the attribute `answer` exists in the managed instance `__dict__`, its value is returned directly,
_without using the descriptor logic_.

That's why it's a **non-overriding** descriptor. It does not override the instance attribute with the same name.

The `@functools.cached_property` decorator from the standard library
produces a nonoverriding descriptor.

## There's also a `__delete__` special method

If present, it makes an **overriding descriptor**. The `__delete__` method handles attempts to delete an attribute:

```python
del item.weight
```

`del` and `__delete__` are rarely used in practice, but if you need them, now you know they exist. 😉

## Functions are descriptors

Every Python function has a `__get__` method.

Recall the signature of `__get__`:



In [None]:
import collections

class Text(collections.UserString):
    def __repr__(self):
        return 'Text({!r})'.format(self.data)
    def reverse(self):
        return self[::-1]

word = Text('forward')
word

In [None]:
word.reverse()

In [None]:
Text.reverse(Text('backward'))

In [None]:
Text.reverse, word.reverse

In [None]:
type(Text.reverse), type(word.reverse)

In [None]:
list(map(Text.reverse, ['repaid', (10, 20, 30), Text('stressed')]))

In [None]:
word.reverse.__self__

In [None]:
word.reverse.__func__

## Descriptor Usage Tips

### Use property to keep it simple

The property built-in creates overriding descriptors implementing `__set__` and
`__get__` even if you do not define a setter method. The default `__set__` of a
property raises AttributeError: can't set attribute, so a property is the
easiest way to create a read-only attribute, avoiding the issue described next.

### Read-only descriptors require `__set__`

If you use a descriptor class to implement a read-only attribute, you must
remember to code both `__get__` and `__set__`, otherwise setting a namesake
attribute on an instance will shadow the descriptor.

The `__set__` method of a
read-only attribute should just raise AttributeError with a suitable message.8

### Validation descriptors can work with `__set__` only

In a descriptor designed only for validation, the `__set__` method should check
the value argument it gets, and if valid, set it directly in the instance `__dict__`
using the descriptor instance name as key. That way, reading the attribute with
the same name from the instance will be as fast as possible, because it will not
require a `__get__`.

### Caching can be done efficiently with `__get__` only

If you code just the `__get__` method, you have a nonoverriding descriptor.
These are useful to make some expensive computation and then
cache the result by setting an attribute by the same name on the instance.
The namesake instance attribute will shadow the descriptor,
so subsequent access to that attribute will fetch it
directly from the instance `__dict__` and not trigger the descriptor
`__get__` anymore.