# Descriptors: secret weapon of frameworks

A talk presented at [Python Cerrado / Plone Conference 2024](https://2024.ploneconf.org/en/schedule/talks/descriptors-secret-weapon-of-frameworks)

## Motivating example

<img src="bulk_food_by_Dan_Bruell.jpg" alt="Bulk food bins, by Dan Bruell" width="600"> 

### Class for an item in an order of bulk food

Imagine an app for a store that sells organic food in bulk,
where customers can order nuts, dried fruit, or cereals by weight.
In that system, each order would hold a sequence of line items,
and each line item could be represented by an instance of a
class like this:

In [1]:
class LineItem:

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price

That's nice and simple:

In [2]:
raisins = LineItem('Golden raisins', 10, 6.95)
raisins.subtotal()

69.5

But there's a problem:

In [3]:
raisins.weight = -20
raisins.subtotal()

-139.0

This is a toy example, but not as fanciful as you may think.
Here is a story from the early days of Amazon.com:

>  We found that customers could order a negative quantity of books! And we would
  credit their credit card with the price and, I assume, wait around for them to ship the
  books.<br>
  — Jeff Bezos, founder and CEO of Amazon.com, interviewed by Wall Street Journal in “Birth of a Salesman” (October 15, 2011)

### Type checkers won't save us

In [4]:
from dataclasses import dataclass

@dataclass
class LineItem:
    description: str
    weight: float
    price: float

    def subtotal(self):
        return self.weight * self.price

There is no way to specify that `weight` must be a `float` greater than zero.

Only [dependend types](https://en.wikipedia.org/wiki/Dependent_type) could fix this,
but that's a feature that exists only in obscure, academic functional languages like Agda and Idris.

### Why not getters and setters?

The traditional way of controlling attribute access in C++ or Java is to use _getters_ and _setters_: methods that get and set values for attributes that are declared private or protected in some way.

But that is verbose, and produces what the famous Pythonista Alex Martelli calls "goofy idioms" like this:

```python
item.get_quantity(item.set_quantity() + 1)
```

Using public attributes, properties, or descriptors, the uglyness of getters and setters is hidden from the users of your class, and they can accomplish the same result writing this:

```python
item.quantity += 1
```

### Properties to the rescue

> The crucial importance of properties is that their existence makes it perfectly safe and
 indeed advisable for you to expose public data attributes as part of your class’s public
 interface. <br>
  — Martelli, Ravenscroft, and Holden, “Why properties are important”1


Properties enable the implementation of getters and setters without
changing the public interface of a class that previously
allowed reading and writing public attributes via `item.quantity` notation.

In [5]:
class LineItem:

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight  # (1)
        self.price = price

    def subtotal(self):
        return self.weight * self.price

    @property  # (2)
    def weight(self):  # (3)
        return self.__weight  # (4)

    @weight.setter  # (5)
    def weight(self, value):
        if value > 0:
            self.__weight = value  # (6)
        else:
            raise ValueError('weight must be > 0')  # (7)
            
    @property
    def price(self):
        return self.__price

    @price.setter  # (5)
    def price(self, value):
        if value > 0:
            self.__price = value
        else:
            raise ValueError('price must be > 0')

Uncomment to see a demonstration:

In [6]:
# raisins = LineItem('Golden raisins', -10, 6.95)

## Descriptors to the rescue

The previous example is fine, but there's a lot of repetition.
A descriptor class allows reuse of attribute validation logic.

By the way, `@property` itself is implemented as a descriptor class.

### Quantity: a validation descriptor

We'll implement a descriptor class named `Quantity` that enforces the rule that a numeric attribute must be greater than zero.

Here's how we want to write `LineItem` with quantity:

```python
class LineItem:
    weight = Quantity()
    price = Quantity()

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price
```

### A descriptor class and a managed class

Before looking a the implementation of `Quantity`, consider just these lines of `LineItem`:

```python

class LineItem:
    weight = Quantity()
    price = Quantity()
```

What is happening here?

Both `weight` and `price` are attributes of the `LineItem` **class**.

Also, they are two separate instances of `Quantity`.

Here is a UML class diagram for that:

<img src="flpy_2301.png" alt="UML class diagram showing relationship between `LineItem` and `Quantity`" width="800"> 


Metaprogramming may involve unusual relationships between classes and instances that may be hard to read in UML.

So, I invented the Mills & Gizmos Notation to enhance UML diagrams. Here is a sample of MGN:

<img src="flpy_2303.png" alt="Classes and instances drawn as mills and gizmos." width="800"> 


Here is that same UML diagram annotated with Mills & Gizmos:

<img src="flpy_2302.png" alt="UML class diagram showing relationship between `LineItem` and `Quantity` with mills and gizmos" width="800"> 


### Implementing Quantity, take #1

Descriptor classes are implemented in frameworks like Django or SQLAlchemy, not application code.

For example, this could be part of a `validators.py` module in a framework:

In [7]:
# part of a validators.py module

class Quantity:  # (1)

    def __init__(self, storage_name):  # (2)
        self.storage_name = storage_name

    def __set__(self, instance, value):   # (3)
        if value > 0:
            instance.__dict__[self.storage_name] = value
        else:
            msg = f'{self.storage_name} must be > 0'
            raise ValueError(msg)

**(1)** descriptor is a protocol-based feature: there's no need to subclass anything, just implement one of the relevant special methods.

**(2)** The `storage_name` is the name of the attribute in each managed class instance that will hold the value of the attribute controlled by the descriptor.

**(2)** Implementing `__set__` makes a class behave as a descriptor. `self` is the descriptor instance, `instance` is the managed instance, and `value` is the value being set.

**(4)** If we have a valid `value`, then we save it in the `instance.__dict__`. We can't use `settattr(instance, self.storage_name, value`) because that would trigger the descriptor `__set__` again, leading to uncontrolled recursion.

**(5)** We build a user friendly message explaining how to fix the problem, and raise `ValueError`.


Here is the application code that would use that:

In [8]:
# from validators import Quantity

class LineItem:
    weight = Quantity()
    price = Quantity()

    def __init__(self, description, weight, price):
        self.description = description
        self.weight = weight
        self.price = price

    def subtotal(self):
        return self.weight * self.price

### Quantity take #2: automatic storage_name

In [7]:
# part of a validators.py module

class Quantity:

    def __set_name__(self, owner, name):  # (1)
        self.storage_name = name          # (2)

    def __set__(self, instance, value):   # (3)
        if value > 0:
            instance.__dict__[self.storage_name] = value
        else:
            msg = f'{self.storage_name} must be > 0'
            raise ValueError(msg)

Uncomment to see a test that will generate a validation error due to a negative `weight`:

In [10]:
# raisins = LineItem('Golden raisins', -10, 6.95)

### Quantity, line by line

In [7]:
# part of a validators.py module

class Quantity:

    def __set_name__(self, owner, name):  # (1)
        self.storage_name = name          # (2)

    def __set__(self, instance, value):   # (3)
        if value > 0:
            instance.__dict__[self.storage_name] = value  # (4)
        else:
            msg = f'{self.storage_name} must be > 0'  # (5)
            raise ValueError(msg)

**(1)** The `__set_name__` special method was introduced in Python 3.6. It's called by the `type` metaclass, the factory of all Python classes, when a descriptor is assigned to a class attribute. `self` is the descriptor instance, owner is the managed class (where the descriptor is instantiated as a class attribute) and `name` is the name of the managed class attribute to which the descriptor is assigned, for example `"weight"` or `"price"` in the `LineItem` example.

**(2)** The `storage_name` is the name of the attribute in each managed class instance that will hold the value of the attribute managed by the descriptor.

**(3)** Implementing `__set__` and/or `__get__` is what makes a class behave as a descriptor; `self` is the descriptor instance, `instance` is the managed instance, and `value` is the value being set.

**(4)** If we have a valid `value`, then we save it in the `instance.__dict__`. We can't use `settattr(instance, self.storage_name, value`) because that would trigger the descriptor `__set__` again, leading to uncontrolled recursion.

**(5)** We build a user friendly message explaining how to fix the problem, and raise `ValueError`.