# 01 - Descriptors

These are the underpinning mechanism for properties, methods, slots, and even functions!

Suppose we want a `Point2D` class whose coordinates must always be **integers**.

Since plain attributes cannot guarantee this, we typically use `property` with getter and setter methods. After adding multiple properties, we end up with a lot of very similar boiler-plate code..

What we would like is to have some class e.g. `IntegerValue` that defines these get and set methods according to how we want. Then, we could use a number of `IntegerValue` instances within our class and have them **bound** to our instance just like attributes.

The solution is the **descriptor protocol**.

There are 4 main methods that make up the protocol - not all are required.
- `__get__` -> `p.x`
- `__set__` -> `p.x = 100`
- `__delete__` -> `del p.x`
- `__set_name__` -> new in Python 3.6 - we'll come back to this later.

There are two types of descriptors:

- Non-data descriptors: Implement only `__get__` **only** (and optionally `__set_name__`).
- Data descriptors: Implement `__set__` and/or `__delete__` (and often `__get__`).

This distinction affects how Python accesses data.

#### Example 1

Let's create a simple non-data descriptor (don't worry about `instance` and `owner_class` for now):

In [10]:
from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

So `TimeUTC` is a class that implements the `__get__` method only, and is therefore considered a non-data descriptor.

We can now use it to create properties in other classes:

In [11]:
class Logger:
    current_time = TimeUTC()

Note that `current_time` is a class attribute:

In [12]:
l = Logger()
l.current_time

'2024-08-05T14:09:20.516021'

This should seem quite odd. All `l.current_time` should do is return the (`repr` of our) `TimeUTC` instance. 

Instead it **calls** the `__get__` method.

This works when we access the class attribute through the class itself:

In [13]:
Logger.current_time

'2024-08-05T14:09:20.875894'

#### Example 2

Lets create a `Deck` class that will return a **random** card from 2 to Ace and a **random** suit from the 4 suits. 

Since both attributes, card and suit, are effectively making a random choice from an iterable, we can cut down on the repeated code using a descriptor.

We'll do this example both the traditional and the descriptor way to demonstrate their similarities/differences.

##### Traditional

In [14]:
from random import seed, choice

In [22]:
seed(0)

class Deck:
    @property
    def card(self):
        return choice(tuple('23456789JQKA') + ('10',))

    @property
    def suit(self):
        return choice(('Spade', 'Heart', 'Diamond', 'Club'))

d = Deck()

for _ in range(5):
    print(d.card, d.suit)

8 Club
2 Diamond
J Club
8 Diamond
9 Diamond


##### Non-data Descriptor

In [24]:
seed(0)

class Choice:
    def __init__(self, *choices):
        self.choices = choices

    def __get__(self, instance, owner_class):
        return choice(self.choices)

class Deck:
    card = Choice(*'23456789JQKA', '10')
    suit = Choice('Spade', 'Heart', 'Diamond', 'Club')

d = Deck()

for _ in range(5):
    print(d.card, d.suit)

8 Club
2 Diamond
J Club
8 Diamond
9 Diamond


So **non-data descriptors** are very similar to `property` without a lot of the boilerplate. 

**Non-data descriptors** are therefore very useful if we have several properties with very similar validation. The end result is a lot less code in the target class.

# 02 - Getters and Setters

In the previous subsection, we saw the two ways of accessing the attribute:

#### The `__get__` method

In [29]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

In [30]:
l.current_time, Logger.current_time

('2024-08-05T14:59:26.162925', '2024-08-05T14:59:26.162925')

When `__get__` was called, we may want to know 

- which **instance** was used or `None` if called from the class.
- what class owns the `TimeUTC` (descriptor) instance. In our case, it belongs to the `Logger` class.

This is why we have the **signature**: `__get__(self, instance, owner_class)`

(Note: A **signature** refers to the structure and components of a function or method definition, detailing how it can be called.)

These components are passed into the `__get__` method when it's called. This means we can control the return based on whether it was 
- called from the class.
- called from the instance.

Very often, we choose to:
- return the descriptor `TimeUTC` **instance** when called from the **class** (`Logger`). This gives us an easy handle to the descriptor instance.
- return the attribute **value** (`datetime.now()`) when called from an instance of the class (`l`).

In [31]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

In [32]:
l.current_time, Logger.current_time

('2024-08-05T14:59:28.312074', <__main__.TimeUTC at 0x2a476fbb6a0>)

Returning the descriptor instance when called from the class is consistent with how `property` works:

In [33]:
class Logger:
    @property
    def current_time(self):
        return datetime.utcnow().isoformat()

Logger.current_time

<property at 0x2a4771dd530>

So `property` **implements** the **descriptor protocol**. It might be a little easier to see if it's not used as a decorator:

In [37]:
class Logger:
    def current_time(self):
        return datetime.utcnow().isoformat()

    current_time = property(fget=current_time)

l = Logger()

Logger.current_time, l.current_time

(<property at 0x2a4771fc900>, '2024-08-05T15:32:03.094962')

#### Caveat

There's an important caveat with these descriptors that we'll soon see as less of an issue.

You'll notice that these descriptors are class attributes: `current_time = TimeUTC()`.

Since only one instance of `TimeUTC` is being made, all `Logger` instances will share this one instance. 

For this particular example where we only get a constant value, there's no issue.

But what if want to "store" and retrieve" instance-specific data using `__set__`? After all, setting a value should be specific to the instance. 

In fact, this is not an issue because both the `__get__` and `__set__` methods need to know the `instance` and we can use this information to store instance-specific data.

Let's demonstrate the issue with another example:

In [55]:
class Countdown:
    'non-data descriptor'
    def __init__(self, start):
        self.start = start + 1

    def __get__(self, instance, owner):
        if instance is None:
            return self
        self.start -= 1
        return self.start

In [56]:
class Rocket:
    countdown = Countdown(10)

In [57]:
rocket1 = Rocket()
rocket2 = Rocket()

In [58]:
rocket1.countdown, rocket1.countdown, rocket1.countdown, rocket1.countdown, rocket1.countdown,

(10, 9, 8, 7, 6)

In [59]:
rocket2.countdown

5

#### The `__set__` method

The signature is as follows: `__set__(self, instance, value)`
- `self`: this references the descriptor instance, just like we had for the `__get__` example (e.g. `TimeUTC()`).
- `instance`: the instance that the descriptor is *bound* to. This will be `None` if the descriptor was called from the class that the descriptor belongs to.
- `value`: the value we want to assign to the attribute.

Why is there no `owner_class` like we have in `__get__`?

Setters (and deleters) are **always** called from instances. We never want to set a class attribute from an instance, we only want to affect the instance.

In [60]:
class IntegerValue:    
    def __set__(self, instance, value):
        print(f'__set__ called, instance={instance}, value={value}')

class Point2D:
    x = IntegerValue()

p = Point2D()
p.x = 100

__set__ called, instance=<__main__.Point2D object at 0x000002A4773ACFA0>, value=100


The reason why I haven't elaborated on how to actually set the value for the instance is because its *not* straightforward.

Currently, we are suffering from the caveat outlined earlier where different instances are sharing the same `IntegerValue` instance.

We might naively think to set the value on the instance like `instance.x = value`, but we don't have access to the symbol `x`. So what symbol would we use? What if our class uses slots? Then we can't store things in the instance dictionary because we don't *have* an instance dictionary.

There's plenty of other issues to consider.

The next subsection will explore the various solutions.

# 03 - Using as Instance Properties

#### Storing in descriptor's local dictionary

We could store it as:

key = object (problem if object is not hashable)
value = attribute value

In [1]:
class IntegerValue:
    def __init__(self):
        self.data = {}

    def __set__(self, instance, value):
        self.data[instance] = int(value)

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data.get(instance)

In [2]:
class Point2D:
    x = IntegerValue()
    y = IntegerValue()


p1 = Point2D()

p1.x = 100.1
p1.y = 200.2
p1.x, p1.y

(100, 200)

We now essentially have a dictionary for each variable that's shared over all instances of this class.

In [7]:
p2 = Point2D()
p2.x = 1000.1

p3 = Point2D()
p3.x = 10000.1

Point2D.x.data

{<__main__.Point2D at 0x2489f9cff10>: 100,
 <__main__.Point2D at 0x2489f9cf700>: 1000,
 <__main__.Point2D at 0x2489f9cffd0>: 10000}

But there's a subtle issue with this. 

After the line: `p = Point2D()`, we have **1 reference** to the object `p`.

After `p.x = 100.1`, we store our instance in our dictionary and therefore have **2 references** to the object `p`.

If we `del p`, we will delete the local symbol, but **not** the reference in the dictonary - we have a **memory leak**.

The solution is to use **weak references**...

# 04 - Strong and Weak References

Throughout this entire course, we have only used **strong** references.

If we have:
```python
p1 = Person()
p2 = p1
```
we have the following arrangement: `p1 ---> [Person Object] <--- p2`

When we `del p1`, at least one **strong** reference remains, so the object is *not* garbage collected.

A **weak** reference is another type of object in Python. It **does not affect the reference count as far as the memory manager is concerned**.

Using a weak reference, we will have the following arrangement: `p1 --[strong]--> [Person Object] <--[weak]-- p2`

`del p1` will result in **garbage collection** and `p2` will be considered **"dead"**.


So the solution in our code is to store a **weak** reference to the object as the key, rather than the object itself.

Here's how to **create** a weak reference.

In [10]:
class Person:
    pass

In [11]:
import weakref

p1 = Person()
p2 = weakref.ref(p1)
p2

<weakref at 0x0000024F2A82BD30; to 'Person' at 0x0000024F295CC3A0>

Our weak reference is now a **callable** which when called, Python will see if that object is still around and if so, will return it. Otherwise it will return `None`.

In [12]:
p2()

<__main__.Person at 0x24f295cc3a0>

Going back to our original problem, we can now store all of these **weak** references in a dictionary. 

But, `weakref` has a `WeakKeyDictionary` just for that! 

This is far more convenient to use for two reasons:

1. We don't need to create the weak reference ourselves. Instead, we pass our strong reference as the key and this dictionary will **automatically convert** it into a **weak** reference.
2. The item is **automatically removed** from the weak key dictionary once the object is **dead**.

In [13]:
from weakref import WeakKeyDictionary

p1 = Person()
d = WeakKeyDictionary()
d[p1] = 'some value'

While `d.keys()` returns the actual objects, we can find the **weak** references via `d.keyrefs()`: (I am not running `list(d.keys())` because that list will store *another* reference to `p1` which will mess up reference counting later.)

In [14]:
d.keyrefs()

[<weakref at 0x0000024F293F6020; to 'Person' at 0x0000024F2A7C64A0>]

In [15]:
import weakref
print(weakref.getweakrefcount(p1))

1


Where is Python getting this information?

It's actually stored in the object itself! But this is an internal implementation (as a doubly-linked list) so we can't do much with it.

In [16]:
p1.__weakref__

<weakref at 0x0000024F293F6020; to 'Person' at 0x0000024F2A7C64A0>

Let's delete this reference just to confirm that the `WeakKeyDictionary` handles the cleanup:

In [17]:
del p1

In [18]:
print(list(d.keys()))
print(d.keyrefs())

[]
[]


**Caveat**
- If we use **slots**, then we won't have an instance dictionary so we'd be unable to have a `.__weakref__` key.

- The object must be hashable, so if we implement `__eq__`, we must remember to implement `__hash__` too.

- We cannot create weak references for most types. This includes: `str`, `list`, `int`, `dict`.



In [19]:
l = [1, 2, 3]
w = weakref.ref(l)

TypeError: cannot create weak reference to 'list' object

In [20]:
d = WeakKeyDictionary()
d['python'] = 'some value'

TypeError: cannot create weak reference to 'str' object

# 05 - Back to Instance Properties

#### Penultimate "Good Enough" Solution

This is our **Penultimate Approach** of implementing **data descriptors** that:
- have **instance specific** storage
- do not use the instance itself for storage (`__slots__` problem)
- handles cleanup after objects get **finalised** (i.e. garbage collected)

but cannot:
- handle **non-hashable** instances.

This approach will be fine for 99% of cases.

In [3]:
from weakref import WeakKeyDictionary

class IntegerValue:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __set__(self, instance, value):
        self.data[instance] = int(value)

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data.get(instance)

In [4]:
class Point2D:
    x = IntegerValue()
    
    def __init__(self, x):
        self.x = x  # this is calling x.__set__

p1 = Point2D(100.1)
p1.x

100

In [5]:
del p1

In [6]:
list(Point2D.x.data)

[]

#### Final Perfect Solution

Before we can implement the perfect solution, we need to learn about once extra thing.

`weakref.ref()` has an additional parameter, that takes a **callable**. 

This **callable** receives the **weak reference**, *not* the object to it, and is called immediately after the object is garbage collected; if there's **at least one** reference remaining, this callable *won't* be called. This is how the `WeakRefDictionary` knows when to remove the weak reference from the dictionary - presumably the callable is going into the dictionary and removing it. We will use this mechanism to remove **dead** references from our regular dictionary instead of using a `WeakRefDictionary`

Let's show a simple demonstration of how it works:

In [20]:
import weakref

def obj_destroyed(obj):
    print(f'{obj} has just been destroyed.')

p = Point2D(100.1)
p_weak = weakref.ref(p, obj_destroyed)

In [21]:
del p

<weakref at 0x000001E1F2666D40; dead> has just been destroyed.


Now we can move onto our **Perfect Approach** of implementing **data descriptors** that:
- have **instance specific** storage
- do not use the instance itself for storage (`__slots__` problem)
- handles cleanup after objects get **finalised** (i.e. garbage collected)
- handles **non-hashable** instances.

To implement this we need to:
1. Use a **regular** dictionary instead of a **WeakRefDictionary**.
2. Use the `id` of an instance as the key (because it's unique), not a weak reference to it. Note that the `id` value of an object is **not** a strong reference to the obect itself.
3. Instead of storing just the value, we will store a `tuple` containing the value *and* a **weak reference**. This **weak reference** will take an additional parameter called `self._finalise_instance` which will be responsible for eventually removing the associated key from the dictionary upon **finalisation**.
4. Create a method called `self._finalise_instance` which will receive the weak object and then iterate through all values in our dictionary until it finds the tuple containing the **same** weak reference (reverse lookup). Once this has been found, delete the associated key.
5. Get the desired value back via the instance's `id`. This will return a `tuple` so ensure you return the 2nd value in it.
6. If your class `Point2D` uses **slots**, make sure to add `'__weakref__`' to the `__slots__` attribute. This will enable use to make **weak** references of instances of this class.

In [41]:
import weakref

class IntegerValue:
    def __init__(self):
        self.data = {}

    def __set__(self, instance, value):
        self.data[id(instance)] = (weakref.ref(instance, self._finalise_instance), int(value))

    def __get__(self, instance, owner):
        if instance is None:
            return self
        value_tuple = self.data.get(id(instance))
        return value_tuple[1]

    def _finalise_instance(self, weak_ref):
        for key, value in self.data.items():
            if value[0] is weak_ref:
                del self.data[key]
                break


In [42]:
class Point2D:
    __slots__ = '__weakref__',
    x = IntegerValue()
    
    def __init__(self, x, y=None):
        self.x = x  # this is calling x.__set__

    def __eq__(self, other):  # class no longer hashable as __hash__ hasn't been implemented.
        return isinstance(other, Point) and self.x == other.x

p1 = Point2D(100.1)
p1.x

100

In [43]:
del p1

In [44]:
list(Point2D.x.data)

[]

#### Practical Example

Below is an example that will demonstrate how descriptors result in much simpler code compared to creating numerous properties which all call the same refactored method:

In [1]:
import weakref

class ValidString:
    def __init__(self, min_length=0, max_length=255):
        self.data = {}
        self._min_length = min_length
        self._max_length = max_length
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError('Value must be a string.')
        if len(value) < self._min_length:
            raise ValueError(
                f'Value should be at least {self._min_length} characters.'
            )
        if len(value) > self._max_length:
            raise ValueError(
                f'Value cannot exceed {self._max_length} characters.'
            )
        self.data[id(instance)] = (weakref.ref(instance, self._finalize_instance), value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            value_tuple = self.data.get(id(instance))
            return value_tuple[1]  
        
    def _finalize_instance(self, weak_ref):
        for key, value in self.data.items():
            if value[0] is weak_ref:
                del self.data[key]
                break

We can now use `ValidString` as many times as we need:

In [2]:
class Person:
    __slots__ = '__weakref__',
    
    first_name = ValidString(1, 100)
    last_name = ValidString(1, 100)
    
    def __eq__(self, other):
        return (
            isinstance(other, Person) and 
            self.first_name == other.first_name and 
            self.last_name == other.last_name
        )

In [3]:
p1, p2 = Person(), Person()

p1.first_name, p1.last_name = 'Guido', 'van Rossum'
p2.first_name, p2.last_name = 'Raymond', 'Hettinger'

In [4]:
class BankAccount:
    __slots__ = '__weakref__',
    
    account_number = ValidString(10, 255)
    
    def __eq__(self, other):
        return (
            isinstance(other, BankAccount) and 
            self.account_number == other.account_number
        )

In [5]:
b1, b2 = BankAccount(), BankAccount()
b1.account_number = 'tooshort'

ValueError: Value should be at least 10 characters.

In [6]:
del p1
del p2
del b1
del b2

In [7]:
Person.first_name.data

{}

# 06 - The __set_name__ Method

# 07 - Property Lookup Resolution

# 08 - Properties and Descriptors

# 09 - Application - Example 1

# 10 - Application - Example 2

# 11 - Functions and Descriptors