# 01 - Descriptors

These are the underpinning mechanism for properties, methods, slots, and even functions!

Suppose we want a `Point2D` class whose coordinates must always be **integers**.

Since plain attributes cannot guarantee this, we typically use `property` with getter and setter methods. After adding multiple properties, we end up with a lot of very similar boiler-plate code..

What we would like is to have some class e.g. `IntegerValue` that defines these get and set methods according to how we want. Then, we could use a number of `IntegerValue` instances within our class and have them **bound** to our instance just like attributes.

The solution is the **descriptor protocol**.

There are 4 main methods that make up the protocol - not all are required.
- `__get__` -> `p.x`
- `__set__` -> `p.x = 100`
- `__delete__` -> `del p.x`
- `__set_name__` -> new in Python 3.6 - we'll come back to this later.

There are two types of descriptors:

- Non-data descriptors: Implement only `__get__` **only** (and optionally `__set_name__`). If called from an instance, return a bound method. Otherwise, return the descriptor itself.
- Data descriptors: Implement `__set__` and/or `__delete__` (and often `__get__`).

This distinction affects how Python accesses data.

#### Example 1

Let's create a simple non-data descriptor (don't worry about `instance` and `owner_class` for now):

In [10]:
from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

So `TimeUTC` is a class that implements the `__get__` method only, and is therefore considered a non-data descriptor.

We can now use it to create properties in other classes:

In [11]:
class Logger:
    current_time = TimeUTC()

Note that `current_time` is a class attribute:

In [12]:
l = Logger()
l.current_time

'2024-08-05T14:09:20.516021'

This should seem quite odd. All `l.current_time` should do is return the (`repr` of our) `TimeUTC` instance. 

Instead it **calls** the `__get__` method.

This works when we access the class attribute through the class itself:

In [13]:
Logger.current_time

'2024-08-05T14:09:20.875894'

#### Example 2

Lets create a `Deck` class that will return a **random** card from 2 to Ace and a **random** suit from the 4 suits. 

Since both attributes, card and suit, are effectively making a random choice from an iterable, we can cut down on the repeated code using a descriptor.

We'll do this example both the traditional and the descriptor way to demonstrate their similarities/differences.

##### Traditional

In [14]:
from random import seed, choice

In [22]:
seed(0)

class Deck:
    @property
    def card(self):
        return choice(tuple('23456789JQKA') + ('10',))

    @property
    def suit(self):
        return choice(('Spade', 'Heart', 'Diamond', 'Club'))

d = Deck()

for _ in range(5):
    print(d.card, d.suit)

8 Club
2 Diamond
J Club
8 Diamond
9 Diamond


##### Non-data Descriptor

In [24]:
seed(0)

class Choice:
    def __init__(self, *choices):
        self.choices = choices

    def __get__(self, instance, owner_class):
        return choice(self.choices)

class Deck:
    card = Choice(*'23456789JQKA', '10')
    suit = Choice('Spade', 'Heart', 'Diamond', 'Club')

d = Deck()

for _ in range(5):
    print(d.card, d.suit)

8 Club
2 Diamond
J Club
8 Diamond
9 Diamond


So **non-data descriptors** are very similar to `property` without a lot of the boilerplate. 

**Non-data descriptors** are therefore very useful if we have several properties with very similar validation. The end result is a lot less code in the target class.

# 02 - Getters and Setters

In the previous subsection, we saw the two ways of accessing the attribute:

#### The `__get__` method

In [29]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

In [30]:
l.current_time, Logger.current_time

('2024-08-05T14:59:26.162925', '2024-08-05T14:59:26.162925')

When `__get__` was called, we may want to know 

- which **instance** was used or `None` if called from the class.
- what class owns the `TimeUTC` (descriptor) instance. In our case, it belongs to the `Logger` class.

This is why we have the **signature**: `__get__(self, instance, owner_class)`

(Note: A **signature** refers to the structure and components of a function or method definition, detailing how it can be called.)

These components are passed into the `__get__` method when it's called. This means we can control the return based on whether it was 
- called from the class.
- called from the instance.

Very often, we choose to:
- return the descriptor `TimeUTC` **instance** when called from the **class** (`Logger`). This gives us an easy handle to the descriptor instance.
- return the attribute **value** (`datetime.now()`) when called from an instance of the class (`l`).

In [31]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

In [32]:
l.current_time, Logger.current_time

('2024-08-05T14:59:28.312074', <__main__.TimeUTC at 0x2a476fbb6a0>)

Returning the descriptor instance when called from the class is consistent with how `property` works:

In [33]:
class Logger:
    @property
    def current_time(self):
        return datetime.utcnow().isoformat()

Logger.current_time

<property at 0x2a4771dd530>

So `property` **implements** the **descriptor protocol**. It might be a little easier to see if it's not used as a decorator:

In [37]:
class Logger:
    def current_time(self):
        return datetime.utcnow().isoformat()

    current_time = property(fget=current_time)

l = Logger()

Logger.current_time, l.current_time

(<property at 0x2a4771fc900>, '2024-08-05T15:32:03.094962')

#### Caveat

There's an important caveat with these descriptors that we'll soon see as less of an issue.

You'll notice that these descriptors are class attributes: `current_time = TimeUTC()`.

Since only one instance of `TimeUTC` is being made, all `Logger` instances will share this one instance. 

For this particular example where we only get a constant value, there's no issue.

But what if want to "store" and retrieve" instance-specific data using `__set__`? After all, setting a value should be specific to the instance. 

In fact, this is not an issue because both the `__get__` and `__set__` methods need to know the `instance` and we can use this information to store instance-specific data.

Let's demonstrate the issue with another example:

In [55]:
class Countdown:
    'non-data descriptor'
    def __init__(self, start):
        self.start = start + 1

    def __get__(self, instance, owner):
        if instance is None:
            return self
        self.start -= 1
        return self.start

In [56]:
class Rocket:
    countdown = Countdown(10)

In [57]:
rocket1 = Rocket()
rocket2 = Rocket()

In [58]:
rocket1.countdown, rocket1.countdown, rocket1.countdown, rocket1.countdown, rocket1.countdown,

(10, 9, 8, 7, 6)

In [59]:
rocket2.countdown

5

#### The `__set__` method

The signature is as follows: `__set__(self, instance, value)`
- `self`: this references the descriptor instance, just like we had for the `__get__` example (e.g. `TimeUTC()`).
- `instance`: the instance that the descriptor is *bound* to. This will be `None` if the descriptor was called from the class that the descriptor belongs to.
- `value`: the value we want to assign to the attribute.

Why is there no `owner_class` like we have in `__get__`?

Setters (and deleters) are **always** called from instances. We never want to set a class attribute from an instance, we only want to affect the instance.

In [60]:
class IntegerValue:    
    def __set__(self, instance, value):
        print(f'__set__ called, instance={instance}, value={value}')

class Point2D:
    x = IntegerValue()

p = Point2D()
p.x = 100

__set__ called, instance=<__main__.Point2D object at 0x000002A4773ACFA0>, value=100


The reason why I haven't elaborated on how to actually set the value for the instance is because its *not* straightforward.

Currently, we are suffering from the caveat outlined earlier where different instances are sharing the same `IntegerValue` instance.

We might naively think to set the value on the instance like `instance.x = value`, but we don't have access to the symbol `x`. So what symbol would we use? What if our class uses slots? Then we can't store things in the instance dictionary because we don't *have* an instance dictionary.

There's plenty of other issues to consider.

The next subsection will explore the various solutions.

# 03 - Using as Instance Properties

#### Storing in descriptor's local dictionary

We could store it as:

key = object (problem if object is not hashable)

value = attribute value

In [1]:
class IntegerValue:
    def __init__(self):
        self.data = {}

    def __set__(self, instance, value):
        self.data[instance] = int(value)

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data.get(instance)

In [2]:
class Point2D:
    x = IntegerValue()
    y = IntegerValue()


p1 = Point2D()

p1.x = 100.1
p1.y = 200.2
p1.x, p1.y

(100, 200)

We now essentially have a dictionary for each variable that's shared over all instances of this class.

In [7]:
p2 = Point2D()
p2.x = 1000.1

p3 = Point2D()
p3.x = 10000.1

Point2D.x.data

{<__main__.Point2D at 0x2489f9cff10>: 100,
 <__main__.Point2D at 0x2489f9cf700>: 1000,
 <__main__.Point2D at 0x2489f9cffd0>: 10000}

But there's a subtle issue with this. 

After the line: `p = Point2D()`, we have **1 reference** to the object `p`.

After `p.x = 100.1`, we store our instance in our dictionary and therefore have **2 references** to the object `p`.

If we `del p`, we will delete the local symbol, but **not** the reference in the dictonary - we have a **memory leak**.

The solution is to use **weak references**...

# 04 - Strong and Weak References

Throughout this entire course, we have only used **strong** references.

If we have:
```python
p1 = Person()
p2 = p1
```
we have the following arrangement: `p1 ---> [Person Object] <--- p2`

When we `del p1`, at least one **strong** reference remains, so the object is *not* garbage collected.

A **weak** reference is another type of object in Python. It **does not affect the reference count as far as the memory manager is concerned**.

Using a weak reference, we will have the following arrangement: `p1 --[strong]--> [Person Object] <--[weak]-- p2`

`del p1` will result in **garbage collection** and `p2` will be considered **"dead"**.


So the solution in our code is to store a **weak** reference to the object as the key, rather than the object itself.

Here's how to **create** a weak reference.

In [10]:
class Person:
    pass

In [11]:
import weakref

p1 = Person()
p2 = weakref.ref(p1)
p2

<weakref at 0x0000024F2A82BD30; to 'Person' at 0x0000024F295CC3A0>

Our weak reference is now a **callable** which when called, Python will see if that object is still around and if so, will return it. Otherwise it will return `None`.

In [12]:
p2()

<__main__.Person at 0x24f295cc3a0>

Going back to our original problem, we can now store all of these **weak** references in a dictionary. 

But, `weakref` has a `WeakKeyDictionary` just for that! 

This is far more convenient to use for two reasons:

1. We don't need to create the weak reference ourselves. Instead, we pass our strong reference as the key and this dictionary will **automatically convert** it into a **weak** reference.
2. The item is **automatically removed** from the weak key dictionary once the object is **dead**.

In [13]:
from weakref import WeakKeyDictionary

p1 = Person()
d = WeakKeyDictionary()
d[p1] = 'some value'

While `d.keys()` returns the actual objects, we can find the **weak** references via `d.keyrefs()`: (I am not running `list(d.keys())` because that list will store *another* reference to `p1` which will mess up reference counting later.)

In [14]:
d.keyrefs()

[<weakref at 0x0000024F293F6020; to 'Person' at 0x0000024F2A7C64A0>]

In [15]:
import weakref
print(weakref.getweakrefcount(p1))

1


Where is Python getting this information?

It's actually stored in the object itself! But this is an internal implementation (as a doubly-linked list) so we can't do much with it.

In [16]:
p1.__weakref__

<weakref at 0x0000024F293F6020; to 'Person' at 0x0000024F2A7C64A0>

Let's delete this reference just to confirm that the `WeakKeyDictionary` handles the cleanup:

In [17]:
del p1

In [18]:
print(list(d.keys()))
print(d.keyrefs())

[]
[]


**Caveat**
- If we use **slots**, then we won't have an instance dictionary so we'd be unable to have a `.__weakref__` key.

- The object must be hashable, so if we implement `__eq__`, we must remember to implement `__hash__` too.

- We cannot create weak references for most types. This includes: `str`, `list`, `int`, `dict`.



In [19]:
l = [1, 2, 3]
w = weakref.ref(l)

TypeError: cannot create weak reference to 'list' object

In [20]:
d = WeakKeyDictionary()
d['python'] = 'some value'

TypeError: cannot create weak reference to 'str' object

# 05 - Back to Instance Properties

#### Penultimate "Good Enough" Solution

This is our **Penultimate Approach** of implementing **data descriptors** that:
- have **instance specific** storage
- do not use the instance itself for storage (`__slots__` problem)
- handles cleanup after objects get **finalised** (i.e. garbage collected)

but cannot:
- handle **non-hashable** instances.

This approach will be fine for 99% of cases.

In [3]:
from weakref import WeakKeyDictionary

class IntegerValue:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __set__(self, instance, value):
        self.data[instance] = int(value)

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data.get(instance)

In [4]:
class Point2D:
    x = IntegerValue()
    
    def __init__(self, x):
        self.x = x  # this is calling x.__set__

p1 = Point2D(100.1)
p1.x

100

In [5]:
del p1

In [6]:
list(Point2D.x.data)

[]

**Note** that, there is one extra improvement that can be made which is to implement `__set_name__`. See subsection 6 for understanding how this works in more detail.

In [12]:
from weakref import WeakKeyDictionary

class IntegerValue:
    def __set__(self, instance, value):
        instance.__dict__[self.property_name] = int(value)

    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return instance.__dict__.get(self.property_name, None)

In [15]:
class Point2D:
    x = IntegerValue()
    
    def __init__(self, x):
        self.x = x  # this is calling x.__set__

p1 = Point2D(100.1)
p1.x

100

#### Final Perfect Solution

Before we can implement the perfect solution, we need to learn about once extra thing.

`weakref.ref()` has an additional parameter, that takes a **callable**. 

This **callable** receives the **weak reference**, *not* the object to it, and is called immediately after the object is garbage collected; if there's **at least one** reference remaining, this callable *won't* be called. This is how the `WeakRefDictionary` knows when to remove the weak reference from the dictionary - presumably the callable is going into the dictionary and removing it. We will use this mechanism to remove **dead** references from our regular dictionary instead of using a `WeakRefDictionary`

Let's show a simple demonstration of how it works:

In [20]:
import weakref

def obj_destroyed(obj):
    print(f'{obj} has just been destroyed.')

p = Point2D(100.1)
p_weak = weakref.ref(p, obj_destroyed)

In [21]:
del p

<weakref at 0x000001E1F2666D40; dead> has just been destroyed.


Now we can move onto our **Perfect Approach** of implementing **data descriptors** that:
- have **instance specific** storage
- do not use the instance itself for storage (`__slots__` problem)
- handles cleanup after objects get **finalised** (i.e. garbage collected)
- handles **non-hashable** instances.

To implement this we need to:
1. Use a **regular** dictionary instead of a **WeakRefDictionary**.
2. Use the `id` of an instance as the key (because it's unique), not a weak reference to it. Note that the `id` value of an object is **not** a strong reference to the obect itself.
3. Instead of storing just the value, we will store a `tuple` containing the value *and* a **weak reference**. This **weak reference** will take an additional parameter called `self._finalise_instance` which will be responsible for eventually removing the associated key from the dictionary upon **finalisation**.
4. Create a method called `self._finalise_instance` which will receive the weak object and then iterate through all values in our dictionary until it finds the tuple containing the **same** weak reference (reverse lookup). Once this has been found, delete the associated key.
5. Get the desired value back via the instance's `id`. This will return a `tuple` so ensure you return the 2nd value in it.
6. If your class `Point2D` uses **slots**, make sure to add `'__weakref__`' to the `__slots__` attribute. This will enable use to make **weak** references of instances of this class.

In [41]:
import weakref

class IntegerValue:
    def __init__(self):
        self.data = {}

    def __set__(self, instance, value):
        self.data[id(instance)] = (weakref.ref(instance, self._finalise_instance), int(value))

    def __get__(self, instance, owner):
        if instance is None:
            return self
        value_tuple = self.data.get(id(instance))
        return value_tuple[1]

    def _finalise_instance(self, weak_ref):
        for key, value in self.data.items():
            if value[0] is weak_ref:
                del self.data[key]
                break


In [42]:
class Point2D:
    __slots__ = '__weakref__',
    x = IntegerValue()
    
    def __init__(self, x, y=None):
        self.x = x  # this is calling x.__set__

    def __eq__(self, other):  # class no longer hashable as __hash__ hasn't been implemented.
        return isinstance(other, Point) and self.x == other.x

p1 = Point2D(100.1)
p1.x

100

In [43]:
del p1

In [44]:
list(Point2D.x.data)

[]

#### Practical Example

Below is an example that will demonstrate how descriptors result in much simpler code compared to creating numerous properties which all call the same refactored method:

In [1]:
import weakref

class ValidString:
    def __init__(self, min_length=0, max_length=255):
        self.data = {}
        self._min_length = min_length
        self._max_length = max_length
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError('Value must be a string.')
        if len(value) < self._min_length:
            raise ValueError(
                f'Value should be at least {self._min_length} characters.'
            )
        if len(value) > self._max_length:
            raise ValueError(
                f'Value cannot exceed {self._max_length} characters.'
            )
        self.data[id(instance)] = (weakref.ref(instance, self._finalize_instance), value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            value_tuple = self.data.get(id(instance))
            return value_tuple[1]  
        
    def _finalize_instance(self, weak_ref):
        for key, value in self.data.items():
            if value[0] is weak_ref:
                del self.data[key]
                break

We can now use `ValidString` as many times as we need:

In [2]:
class Person:
    __slots__ = '__weakref__',
    
    first_name = ValidString(1, 100)
    last_name = ValidString(1, 100)
    
    def __eq__(self, other):
        return (
            isinstance(other, Person) and 
            self.first_name == other.first_name and 
            self.last_name == other.last_name
        )

In [3]:
p1, p2 = Person(), Person()

p1.first_name, p1.last_name = 'Guido', 'van Rossum'
p2.first_name, p2.last_name = 'Raymond', 'Hettinger'

In [4]:
class BankAccount:
    __slots__ = '__weakref__',
    
    account_number = ValidString(10, 255)
    
    def __eq__(self, other):
        return (
            isinstance(other, BankAccount) and 
            self.account_number == other.account_number
        )

In [5]:
b1, b2 = BankAccount(), BankAccount()
b1.account_number = 'tooshort'

ValueError: Value should be at least 10 characters.

In [6]:
del p1
del p2
del b1
del b2

In [7]:
Person.first_name.data

{}

# 06 - The `__set_name__` Method

Let's go back to our penultimate "good enough" solution from the last section. The `set_name` method (only Python 3.6+) is useful for validation-based descriptors. 

The `__set_name__` method allows us to store the attribute name e.g. `x` in the **instance dictionary**. For example, for the `Point2D` class, we may have: `p.x -> 10`.

Wait a minute! Wouldn't the instance dictionary *shadow* the class attribute? After all, these data descriptors are instantiated as class attributes when creating the class, not when creating the class attributes. So, shouldn't `p.x = 20` modify the instance dictionary and completely sidestep the data descriptor? 

*Not always..*

Now we need to clearly distinguish between **non-data** descriptors and **data** descriptors.

Let's first look at a simple example:

It gets called once when the descriptor instance is created (so when the class containing it is compiled), and passes the property name as the argument.

Let's see a simple example illustrating this:

In [1]:
class ValidString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__ called: owner={owner_class}, prop={property_name}')

In [2]:
class Person:
    name = ValidString()

__set_name__ called: owner=<class '__main__.Person'>, prop=name


As you can see `__set_name__` was called when the `Person` class was created passing in the `owner_class` and `property_name`. This is the **only** time it gets called.

To reiterate, the `__set_name__` was called *because* it was defined in a class.

The main advantage of this is that we can capture the property name:

In [2]:
class ValidString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__ called: owner={owner_class}, prop={property_name}')
        self.property_name = property_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            print(f'__get__ called for property {self.property_name} '
                  f'of instance {instance}')

In [3]:
class Person:
    first_name = ValidString()
    last_name = ValidString()

__set_name__ called: owner=<class '__main__.Person'>, prop=first_name
__set_name__ called: owner=<class '__main__.Person'>, prop=last_name


Now watch what happens when we get the property form the instances:

In [4]:
p = Person()

In [5]:
p.first_name

__get__ called for property first_name of instance <__main__.Person object at 0x000001664B06D240>


Here's the chronology:

1. When `first_name = ValidString()` runs, the `__set_name__` property will be called.
2. In this particular instance of `ValidString`, the `self.property_name` is equal to "first_name".
3. We get *another* instance of `ValidString` whose `self.property_name` is equal to "last_name".
4. When we create an instance of `Person` and access `p.first_name`, we get a handle on the first instance of `ValidString` - whose `self.property_name`is equal to "first_name".

This has solved our problem that we faced in "Section 2: The `__set__` method" where we had no way of getting a handle on the property name.

Now let's walk through 2 examples in this section. 

In the first example, we will prepend attributes with an `_` so that we're certain we aren't clashing instance attributes with class attributes.

In the second example, we will **not** prepend attributes with `_`. 

#### Approach 1: Attributes prepended with `_`

In [6]:
class ValidString():
    def __init__(self, min_length):
        self.min_length = min_length
        
    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name

    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a string.')
        if len(value) < self.min_length:
            raise ValueError(f'{self.property_name} must be at least {self.min_length} characters')
        key = '_' + self.property_name
        setattr(instance, key, value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            key = '_' + self.property_name
            return getattr(instance, key, None)

In [7]:
class Person:
    first_name = ValidString(1)
    last_name = ValidString(2)

In [10]:
p = Person()
p.first_name = 'Alex'
p.first_name

'Alex'

In [11]:
p.last_name = 'M'

ValueError: last_name must be at least 2 characters

All looks fine so far.. Let's look at the instance dictionary to see how the successfully made attribute was saved:

In [12]:
p.__dict__

{'_first_name': 'Alex'}

This is perfectly fine except that we run the risk of overwriting another attribute that happens to be called `_first_name`. This issue won't exist for the second approach.

#### Approach 2: Attributes *not* prepended with `_`

In [22]:
class ValidString:
    def __init__(self, min_length):
        self.min_length = min_length
        
    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name

    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a string.')
        if len(value) < self.min_length:
            raise ValueError(f'{self.property_name} must be at least '
                             f'{self.min_length} characters'
                            )
        instance.__dict__[self.property_name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            print (f'calling __get__ for {self.property_name}')
            return instance.__dict__.get(self.property_name, None)

In [23]:
class Person:
    first_name = ValidString(1)
    last_name = ValidString(2)

In [24]:
p = Person()

Firstly, you might be wondering why we had to use `instance.__dict[self.property_name] = value` instead of `setattr(instance, self.property_name, value)`.

The reason is because `instance.<property_name> = value` would return an instance of the descriptor and call the `__set__` method which will lead us into infinite recursion.

In [25]:
p.first_name = 'Alex'

In [26]:
p.__dict__

{'first_name': 'Alex'}

Now, what will happen when we call `p.first_name`? Will we:

(A) Look in the instance dictionary and immediately return `'Alex'` - sidestepping the entire descriptor, or

(B) Retrieve the descriptor associated with `first_name` and call `__get__`?

We've added a print statement in the `__get__` to see which:

In [28]:
p.first_name

calling __get__ for first_name


'Alex'

This is *very odd*...

Since we know that Python always prioritises looking in the instance dictionary before class attributes, option (A) is what we would expect. 

In fact, option (A) is *chosen* sometimes. We'll see how Python makes its choice in the next section.

# 07 - Property Lookup Resolution

The choice of (A) and (B) is actually quite simple.

- Option (A) - **_prioritise_ the instance dictionary** - for **non-data** descriptors. If not present, use the data descriptor.
- Option (B) - **override the instance dictionary** - for **data** descriptors

We showed Option (B) in action in the last subsection. Now we'll show the Option (A) in action...

In [60]:
class ValidString:
    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name

    def __get__(self, instance, owner):
        print('calling __get__')
        if instance is None:
            return self
        return instance.__dict__.get(self.property_name, None)
    
class Person:
    first_name = ValidString()

p = Person()
p.first_name = 'Alex'
p.__dict__

{'first_name': 'Alex'}

Since `ValidString` is a **non-data** descriptor, Python will look inside the instance dictionary first:

In [61]:
p.first_name

'Alex'

Now, let's remove `first_name` from the instance dictionary and show that Python will default to the descriptor in this case:

In [62]:
del p.__dict__['first_name']
p.first_name

calling __get__


**Conclusion**: We have no real need to prepend property names with `_`.

# 08 - Properties and Descriptors

With everything that we know now, we can finally fully understand how `property` works.

Let's say we have a property called `@age`. We'll use the non-decorator approach to make it more clear.
We might have something like this:
```python
age = property(fget=get_age, fset=set_age)
```
When we do `p.age`, that will call the getter `__get__` which **delegates** to `get_age(instance=p)`.

When we do `p.age = 10`, that will call the setter `__set__` which **delegates** to `set_age(instance=p, value=10)`. If `fset` is not defined, we still call `__set__` but raise an `AttributeError` instead.

We can now make our very own version of the `property` class. We'll call it `MakeProperty`:

In [7]:
class MakeProperty:
    def __init__(self, fget=None, fset=None):
        self.fget = fget
        self.fset = fset
        
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        if self.fget is None:
            raise AttributeError(f'{self.prop_name} is not readable.')
        return self.fget(instance)  # looking at the Person class, `first_name` takes the arg `self` which refers to a Person instance
                                    # `instance` on this line refers to the exact same thing (while `self` here refers to the descriptor instance).
            
    def __set__(self, instance, value):
        if self.fset is None:
            raise AttributeError(f'{self.prop_name} is not writable.')
        self.fset(instance, value)
        
    def setter(self, fset):
        self.fset = fset
        return self
        

In [8]:
class Person:
    @MakeProperty
    def first_name(self):
        return getattr(self, '_first_name', None)
    
    @first_name.setter
    def first_name(self, value):
        self._first_name = value
        
    @MakeProperty
    def last_name(self):
        return getattr(self, '_last_name', None)
    
    @last_name.setter
    def last_name(self, value):
        self._last_name = value

In [9]:
p1 = Person()

In [10]:
p1.first_name = 'Raymond'
p1.__dict__

{'_first_name': 'Raymond'}

In [11]:
p1.last_name = 'Hettinger'
p1.__dict__

{'_first_name': 'Raymond', '_last_name': 'Hettinger'}

# 09 - Application - Example 1

We have already seen that data validation works well with descriptors.

For example, we may want our object attributes to have valid values for some of it's attributes:

In [1]:
class Int:
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise ValueError(f'{self.prop_name} must be an integer.')
        instance.__dict__[self.prop_name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)
            

In [2]:
class Float:
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        if not isinstance(value, float):
            raise ValueError(f'{self.prop_name} must be a float.')
        instance.__dict__[self.prop_name] = value
        
    def __get__(self, instance, value):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)

In [3]:
class List:
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        if not isinstance(value, list):
            raise ValueError(f'{self.prop_name} must be a list.')
        instance.__dict__[self.prop_name] = value
        
    def __get__(self, instance, value):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)
        
    

We can now use these descriptors in multiple class definitions, and as many times as we want in each class:

In [4]:
class Person:
    age = Int()
    height = Float()
    tags = List()
    favourite_foods = List()

In [13]:
p = Person()
p.age = 20
p.height = 1.74
p.tags = ['a', 'b']
p.favourite_foods = ['Pizza', 'Chocolate']
p.__dict__

{'age': 20,
 'height': 1.74,
 'tags': ['a', 'b'],
 'favourite_foods': ['Pizza', 'Chocolate']}

If we wanted to be even more generic and prevent all this copying and pasting, we could write a generic class that takes a specific type and does the above behaviour:

In [14]:
class ValidType:
    def __init__(self, type_):
        self._type = type_
        
    def __set_name__(self, owner_class, prop_name):
        self.prop_name = prop_name
        
    def __set__(self, instance, value):
        if not isinstance(value, self._type):
            raise ValueError(f'{self.prop_name} must be of type '
                             f'{self._type.__name__}'
                            )
        instance.__dict__[self.prop_name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.prop_name, None)

And now we can achieve the same functionality as before:

In [21]:
import numbers

class Person:
    age = ValidType(int)
    height = ValidType(numbers.Real)
    tags = ValidType(list)
    favorite_foods = ValidType(tuple)
    name = ValidType(str)

In [22]:
p = Person()
p.age = 20
p.height = 1.74
p.tags = ['a', 'b']
p.favourite_foods = ['Pizza', 'Chocolate']
p.__dict__

{'age': 20,
 'height': 1.74,
 'tags': ['a', 'b'],
 'favourite_foods': ['Pizza', 'Chocolate']}

One other small change we've made here is to use `numbers.Real` instead of `float` because a value of `1` for example is not a float, but it is a `Real` number.

# 10 - Application - Example 2

Suppose we have a `Polygon` class that has a vertices property that needs to be defined as a sequence of `Point2D` instances. So here, not only do we want the `vertices` attribute of our `Polygon` to be an iterable of some kind, we also want the elements to all be instances of the `Point2D` class. In turn we'll also want to make sure that coordinates for `Point2D` are non-negative integer values (as might be expected in computer screen coordinates):

Let's start by defining the `Point2D` class, but we'll need a descriptor for the coordinates to ensure they are integer values, possibly bounded between min and max values:

In [24]:
class Int:
    def __init__(self, min_value=None, max_value=None):
        self.min_value = min_value
        self.max_value = max_value
        
    def __set_name__(self, owner_class, name):
        self.name = name
        
    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise ValueError(f'{self.name} must be an int.')
        if self.min_value is not None and value < self.min_value:
            raise ValueError(f'{self.name} must be at least {self.min_value}')
        if self.max_value is not None and value > self.max_value:
            raise ValueError(f'{self.name} cannot exceed {self.max_value}')
        instance.__dict__[self.name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return instance.__dict__.get(self.name, None)

In [25]:
class Point2D:
    x = Int(min_value=0, max_value=800)
    y = Int(min_value=0, max_value=400)
    
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return f'Point2D(x={self.x}, y={self.y})'
    
    def __str__(self):
        return f'({self.x}, {self.y})'

Next let's create a validator that checks that we have a sequence (mutable or immutable, does not matter) of `Point2D` objects. 

To check of something is a sequence, we can use the abstract base classes defined in the `collections` module:

In [26]:
import collections

In [27]:
isinstance([1, 2, 3], collections.abc.Sequence)

True

In [28]:
isinstance([1, 2, 3], collections.abc.MutableSequence)

True

In [29]:
isinstance((1, 2, 3), collections.abc.Sequence)

True

In [30]:
isinstance((1, 2, 3), collections.abc.MutableSequence)

False

So let's write the validator:

In [31]:
class Point2DSequence:
    def __init__(self, min_length=None, max_length=None):
        self.min_length = min_length
        self.max_length = max_length
        
    def __set_name__(self, cls, name):
        self.name = name
        
    def __set__(self, instance, value):
        if not isinstance(value, collections.abc.Sequence):
            raise ValueError(f'{self.name} must be a sequence type.')
        if self.min_length is not None and len(value) < self.min_length:
            raise ValueError(f'{self.name} must contain at least '
                             f'{self.min_length} elements'
                            )
        if self.max_length is not None and len(value) > self.max_length:
            raise ValueError(f'{self.name} cannot contain more than  '
                             f'{self.max_length} elements'
                            )
        for index, item in enumerate(value):
            if not isinstance(item, Point2D):
                raise ValueError(f'Item at index {index} is not a Point2D instance.')
                
        # value passes checks - want to store it as a mutable sequence so we can 
        # append to it later
        instance.__dict__[self.name] = list(value)
        
    def __get__(self, instance, cls):
        if instance is None:
            return self
        else:
            if self.name not in instance.__dict__:
                # current point list has not been defined,
                # so let's create an empty list
                instance.__dict__[self.name] = []
            return instance.__dict__.get(self.name)

And now we can use this for our `Polygon` class:

In [32]:
class Polygon:
    vertices = Point2DSequence(min_length=3)
    
    def __init__(self, *vertices):
        self.vertices = vertices

In [15]:
try:
    p = Polygon()
except ValueError as ex:
    print(ex)

vertices must contain at least 3 elements


In [33]:
p = Polygon(Point2D(0,0), Point2D(0, 1), Point2D(1, 0))

In [34]:
p.vertices

[Point2D(x=0, y=0), Point2D(x=0, y=1), Point2D(x=1, y=0)]

OK, so, for completeness, let's write a method that we can use to append new points to the vertices list (that's why we made it a mutable sequence type!). We'll also implement the **sequence protocol** while we're at it.

To append, we need to know what's the `max_length` which is stored in the descriptor. Luckily, the `__get__` method has been written to return `self` if no instance is provided, i.e., if we call the descriptor through the class *instead of* the instance: `Polygon.vertices` -> `Polygon.vertices.max_length`. 

But the issue with this is if we subclass `Polygon` to create e.g. `Triangle` (`max_length=3` and `min_length=3`), we won't be getting the subclassed `max_length` or `min_length`.

So we should use `type(self)` instead of `Polygon`...

In [43]:
class Polygon:
    vertices = Point2DSequence(min_length=3)
    
    def __init__(self, *vertices):
        self.vertices = vertices
        
    def append(self, pt):
        if not isinstance(pt, Point2D):
            raise ValueError('Can only append Point2D instances.')
        max_length = type(self).vertices.max_length
        if max_length is not None and len(self.vertices) >= max_length:
            # cannot add more points!
            raise ValueError(f'Vertices length is at max ({max_length})')
        self.vertices.append(pt)
                
    def __len__(self):
        return len(self.vertices)
        
    def __getitem__(self, idx):
        return self.vertices[idx]

In [44]:
p = Polygon(Point2D(0,0), Point2D(1,0), Point2D(0,1))

In [45]:
p.vertices

[Point2D(x=0, y=0), Point2D(x=1, y=0), Point2D(x=0, y=1)]

In [46]:
p.append(Point2D(10, 10))

In [47]:
p.vertices

[Point2D(x=0, y=0), Point2D(x=1, y=0), Point2D(x=0, y=1), Point2D(x=10, y=10)]

Now look how simple our subclasses look - it will inherit all methods (including the `__init__`):

In [48]:
class Triangle(Polygon):
    vertices = Point2DSequence(min_length=3, max_length=3)

In [49]:
t = Triangle(Point2D(0,0), Point2D(1,0), Point2D(0,1))
t.vertices

[Point2D(x=0, y=0), Point2D(x=1, y=0), Point2D(x=0, y=1)]

In [51]:
len(t)

3

In [52]:
t[1:3]

[Point2D(x=1, y=0), Point2D(x=0, y=1)]

# 11 - Functions and Descriptors

Have you ever wondered how Python knows that a particular function is defined inside a class and therefore treats it like a bound method? Is there something happing at the C level?

No! -> functions are objects that implement the **non-data descriptor protocol**. Just like non-data descriptors:
- They have the `__get__` method.
- Depending on how `__get__` is called, it either returns the function itself or the bound method. This is how python differentiates between the two.

In [1]:
def add(a, b):
    return a + b

hasattr(add, '__get__')

True

We know that `__get__` takes 3 parameters. Here's what they are for regular functions:
- `self`: the function itself e.g. `add` because we have `add.__get__()`.
- `instance`: either `None` or our `__main__` module; we can think of our `__main__` module as the class housing this function, so calling it directly is like calling it from the module 'class' instead of an instance.
- `owner_class`: in this case, it will be our `__main__`module

Let's prove that the function is returned if we manually call these arguments into the `__get__`

In [2]:
import sys
me = sys.modules['__main__']

f = add.__get__(None, me)  # self is implicitly passed, instance=None, me=owner_class
f

<function __main__.add(a, b)>

Now, let's compare this to calling this same function from a class. Here `instance=None` so we expect it to be pretty much identical to above. 

In [3]:
class Operation:
    def add(self, a, b):
        return a + b

Operation.add

<function __main__.Operation.add(self, a, b)>

And from an instance - here `instance=o` returned a **bound method**

In [4]:
o = Operation()
o.add

<bound method Operation.add of <__main__.Operation object at 0x000002A36A571690>>

We can get the exact same thing by doing this manually:

In [10]:
bound_method = Operation.add.__get__(o, Operation)

bound_method == o.add, bound_method is o.add

(True, False)

Note that `__get__` returns a **new** function each time it's called which is why `bound_method` is not **identical** (but is equivalent) to `o.add`.

From the bound method, where does python look for the associated function? In the `__func__` attribute:

In [11]:
bound_method.__func__

<function __main__.Operation.add(self, a, b)>

Just for some points of clarification:

When we retrieve the attribute `Operation.add`, we are in fact calling `Operation.add.__get__` with `instance` set to `None` and `owner_class` set to `Operation`. This tells Python to return the descriptor itself.

When we retrieve the attribute `o.add`, we are in fact calling `o.add.__get__` with `instance` set to `o` and `owner_class` set to `Operation`. This tells Python to return a bound method which is a `types.MethodType` object.

Here's a trimmed snippet on `MethodType` from `help`:
```
class method(object)
 |  method(function, instance)
 |  
 |  Create a bound instance method object.
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __func__
 |      the function (or other callable) implementing a method
 |  
 |  __self__
 |      the instance to which a method is bound
```

Now we mimic the behaviour of a method without actually creating one the traditional way:

In [4]:
import types

def say_hello(self):
    if self and hasattr(self, 'name'):
        return f'{self.name} says hello!'
    else:
        return 'Hello!'

class Person:
    def __init__(self, name):
        self.name = name

In [5]:
p = Person('Alex')
m = types.MethodType(say_hello, p)

In [6]:
m

<bound method say_hello of <__main__.Person object at 0x000002A6E67198A0>>

In [7]:
m()

'Alex says hello!'

This information can be particular useful when making decorator classes in the metaprogramming section.