# Understanding the Python Data Model: Objecs, attributes and types

## Python Objects
- _"Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects"_
- Every `object` has:
    + **Identity**: it never changes once the object has been created (_~= memory address_)
    + **Type**: a reference to another object (its `class` or `type`) which determines the operations that the object supports. In theory, it shouldn't change after creation
    + **Value**: usually just a mapping from strings to (counted) references to other objects, but it might be something else (e.g. builtins and slotted classes)
    
```cpp
// Rough approximation in C++
using PyRef = std::shared_ptr<PyObject>;
   
# Python object
struct PyObject {
    PyRef __class__;                         // Type  
    std::map<std::string, PyRef> __dict__;   // Value
};
```

In [1]:
class Foo:
    pass

foo = Foo()
print(f"{foo.__class__=}, {foo.__class__ is Foo=}")
print(f"{foo.__dict__=}\n")

# The builtin `vars()` is a shortcut to `obj.__dict__`
print(f"{vars(foo) is foo.__dict__=}")

foo.__class__=<class '__main__.Foo'>, foo.__class__ is Foo=True
foo.__dict__={}

vars(foo) is foo.__dict__=True


- We can add new attributes to the instance at any time
    + In the end it's just adding key-value pairs to a hash map
- For Python, the _type_ of the object is the object stored in `__class__`
    - _Usually_ `type(foo) == foo.__class__`
    
    - `foo.__class__ == Foo` => `isinstance(foo, Foo) == True`

In [2]:
class Foo:
    pass

foo = Foo()
try:
    print(foo.a)
except AttributeError:
    print("There isn't an 'a' in 'foo'!")

foo.__dict__ = {'a': 10, 'b': -1.0}
print(f"{vars(foo)=}")
print(f"{foo.a=}\n")

print(f"{type(foo)=}")
print(f"{type(foo) == foo.__class__=}")
print(f"{isinstance(foo, Foo)=}")

There isn't an 'a' in 'foo'!
vars(foo)={'a': 10, 'b': -1.0}
foo.a=10

type(foo)=<class '__main__.Foo'>
type(foo) == foo.__class__=True
isinstance(foo, Foo)=True


### *Note*: `type()` has two different uses in Python

1. Return the type of an object
    ```python
    # Fake pseudo-code
    class type():
        @overload    
        def __call__(object) -> object:
            return object.__class__
    ```

2. Create new types
    ```python
    # Fake pseudo-code
    class type():
        @overload
        def __call__(name, bases, dict, **kwds) -> object:
            new_class = object()
            new_class.__name__ = name
            new_class.__mro__ = C3_linearization(*bases, object)
            new_class.__dict__ = dict
            return new_class
    ```

## Object <---> Type Overview
- An object's `__dict__` stores the values associated to this instance
- A object's `type` (or `class`)is the object stored in `__class__` attribute:
    - Contains in its `__dict__` the data shared among multiple instances
    - Works as an instance _factory_
- Instance methods are just regular functions stored in the class's `__dict__`
    + Using the name `self` for the first parameter is just a convention
    + When accessing through the instance, the function is partially applied to the instance (_Descriptor Protocol_) 
        ```python
        foo_instance.method ==> functools.partial(type(foo_instance).__dict__['method'], foo_instance)
        ```
    + When accessing through the class, we get the original function back
        ```python
        foo_class.method ==> type(foo_instance).__dict__['method']
        ```
- Since _classes_ are _objects_, we can also add new values to a class at any time (_monkey patching_, usually discouraged)

In [3]:
class Foo:
    def talk(self, text):
        print(f"Instance {id(self)} says: '{text}'")
    
foo = Foo()
foo.talk("Boooooo!")

# Monkey patching `Foo` class...
def shutUp(instance):
    print(f"Instance {id(instance)} is now silent")

Foo.shutUp = shutUp  # equivalent to: Foo.__dict__["shutUp"] = shutUp
foo.shutUp()

print(f"{type(Foo.shutUp)=}, {Foo.shutUp is shutUp=}")

Instance 140291838631200 says: 'Boooooo!'
Instance 140291838631200 is now silent
type(Foo.shutUp)=<class 'function'>, Foo.shutUp is shutUp=True


## Operators and _dunder_/_magic_ methods
- Python parser translates special syntax into function calls:
    + An instance's `__class__` defines the _dict_ where magic methods are defined:
        + `foo[d]` => `foo.__class__.__getitem__(foo, d)`
        + `foo.d` => `foo.__class__.__getattribute__(foo, d)`
        + `a + b` => `a.__class__.__add__(a, b)` (or `b.__class__.__radd__(b, a)`)
        + `str(c)` => `c.__class__.__str__(c)`
    + Builtin objects can also call specific methods on their arguments to further customize behavior:
        + `[1,2,3][e]` => `list.__getitem__()` calls `e.__index__()` to get an `int` index
        + `float(f)` => `float.__new__()` calls `f.__float__()` to get the actual `float` value
    + `type.__call__()` works as a factory of new instances
        + `Foo(3) => Foo.__call__(3)`

 # How do `type` and `object` relate to each other?
 
- All objects inherit from `object` (all MROs always end up in `object`)
- The type of all types is `type`
- _Dunder_ methods are looked up in the previous level of abstraction of the calling instance

<img src="images/meta-diagram.svg" width="1400">

# How do `type` and `object` relate to each other?
 
- All objects inherit from `object` (all MROs always end up in `object`)
- The type of all types is `type`
- Failed lookups are also tried in the next _base_ class following the `__mro__` attribute

<img src="images/mro-diagram.svg" width="1200">

In [4]:
class O: ...

class A(O): ...
class B(O): ...
class C(O): ...
class D(O): ...
class E(O): ...

class K1(C, A, B): ...
class K2(B, D, E): ...
class K3(A, D): ...

class Z(K1, K3, K2): ...
    
print(f"{Z.__bases__ = }\n{Z.__mro__ = }")

Z.__bases__ = (<class '__main__.K1'>, <class '__main__.K3'>, <class '__main__.K2'>)
Z.__mro__ = (<class '__main__.Z'>, <class '__main__.K1'>, <class '__main__.C'>, <class '__main__.K3'>, <class '__main__.A'>, <class '__main__.K2'>, <class '__main__.B'>, <class '__main__.D'>, <class '__main__.E'>, <class '__main__.O'>, <class 'object'>)


## _types_ as factories in the Python Object Model

- A `type` is a Python object with some extra requirements:
    + Its `__call__()` method creates new objects whose `__class__` attribute points to itself
        + Default `__call__()` implementation calls `__new__()` and then calls `__init__()` (if appropriate)
    + Its `__mro__` attribute contains the a tuple of references to other classes
        + _Bases_ classes linearized using the C3 linearization algorithm
        + The first element in `__mro__` is always itself
        + The last element in `__mro__` is always `object`
    + Its `__class__` points to another _class_ object: its **metaclass**
        + All metaclasses inherit from `type`
    + (CPython implementation detail: a class `__dict__` is a non-writable `MappingProxyType` instead of an actual `dict`)

In [5]:
class FooMeta(type):
    def __call__(cls, *args, **kwargs):
        print(f"<{cls.__class__.__name__}>: {cls.__name__}.__call__({args=}, {kwargs=}")
        return type.__call__(cls, *args, **kwargs)

class Foo(metaclass=FooMeta):
    def __new__(cls, *args):
        print(f"{cls.__name__}.__new__({args=}")
        return object.__new__(cls)

    def __init__(self, a, b):
        print(f"{self.__class__.__name__}.__init__({a=}, {b=}")
        self.a = a
        self.b = b

print("Creating Foo instance...")
foo = Foo(1, 2)
print(f"\n{vars(foo)=}")

Creating Foo instance...
<FooMeta>: Foo.__call__(args=(1, 2), kwargs={}
Foo.__new__(args=(1, 2)
Foo.__init__(a=1, b=2

vars(foo)={'a': 1, 'b': 2}


# _Object Oriented Programming_ in Python

- Object Oriented Programming in Python is just the _default behavior_
    + Dynamic dispatch: _descriptor protocol_
    + Inheritance: _MRO_ lookup
- `object.__getattribute__()` implements the bulk of the object model
    - Looking for attributes not only in the instance _vars_ but also in the `__class__`
    - Looking for attributes not only in the direct `__class__`, but also in the other base classes (`__mro__`)
        
        + _MRO_: Method Resolution Order using [C3 linearization](https://en.wikipedia.org/wiki/C3_linearization) of bases

            
- Basically, the object model is implemented in the `object()` and `type()` methods

## `object.__getattribute__()`: base implementation of the Python Data Model

- Intuition for `object.__getattribute__(instance, attr_name)`:
    1. Look for `attr_name` in `instance.__dict__`
    2. If it's not there, look for `attr_name` in `instance.__class__.__dict__`
    3. If it's not there, call `instance.__class__.__dict__['__getattr__'](self, attr_name)`
- Note that `__getattribute__()` and `__getattr__()` are very different:
    - `__getattribute__()` is *always* called for every attribute access (very easy to create infinite recursion calls)
    - `__getattr__()` is only called if `attr_name` hasn't been found through a regular lookup
- The previous attribute lookup is a simplification, the [actual lookup](./images/object-attribute-lookup-v3.png) is more complicated and relies on the [_descriptor protocol_](https://docs.python.org/3/howto/descriptor.html)


# Descriptor Protocol

+ `Descriptors` provide a hook to customize attribute lookup on an attribute basis
+ A descriptor is any object that defines
    + `def __get__(self, obj, objtype=None)`
    + `def __set__(self, obj, value)` (optional)
    + `def __delete__(self, obj, value)` (optional)
    + More recently `def __set_name__(owner, name)` method has been added for cases where a descriptor needs to know the name of class variable it was assigned
+ Descriptors **only** work when used as class variables (e.g. _property_)
+ _Data descriptors_ only define `__set__()` or `__delete__()`
+ _Non-data descriptors_ only define `__get__()`


In [1]:
class PropertyDescriptor:
    def __set_name__(self, owner, name):
        self.private_name = '_' + name

    def __get__(self, obj, objtype=None):
        value = getattr(obj, self.private_name)
        return value or 42

    def __set__(self, obj, value):
        setattr(obj, self.private_name, value)

class Owner:
    my_data = PropertyDescriptor()

    def __init__(self, data):
        self._my_data = data

Owner().my_data == 42


True

## `object.__getattribute__()`

- Let's look now at the actual lookup.
    - Diagram [actual lookup](./images/object-attribute-lookup-v3.png)


In [2]:
def type_getattribute(cls, name, default):
    "Emulate _PyType_Lookup() in Objects/typeobject.c"
    for base in cls.__mro__:
        if name in vars(base):
            return vars(base)[name]
    return default

def object_getattribute(obj, name):
    "Emulate PyObject_GenericGetAttr() in Objects/object.c"
    null = object()
    objtype = type(obj)
    cls_var = type_getattribute(objtype, name, null)
    descr_get = getattr(type(cls_var), '__get__', null)
    if descr_get is not null:
        if (hasattr(type(cls_var), '__set__')
            or hasattr(type(cls_var), '__delete__')):
            return descr_get(cls_var, obj, objtype)     # data descriptor:   . x =>  x.__get__(a, type(a))
    if hasattr(obj, '__dict__') and name in vars(obj):
        return vars(obj)[name]                          # instance variable
    if descr_get is not null:
        return descr_get(cls_var, obj, objtype)         # non-data descriptor:  a.x =>  x.__get__(a, type(a))
    if cls_var is not null:
        return cls_var                                  # class variable
    if hasattr(objtype, '__getattr__'):
        return objtype.__getattr__(obj, name)           # call __getattr__ if it is defined in the class
    raise AttributeError(name)

# Functions implement the descriptor protocol

- Functions implement the descriptor protocol
```python    
class Function:
    ...
    def __get__(self, obj, objtype=None):
        return MethodType(self, obj) if obj is not None else self

class MethodType:
    def __init__(self, func, obj):
        self.func = func
        self.obj = obj

    def __call__(self, *args, **kwargs):
        return self.func(self.obj, *args, **kwargs)
```
- Using the descriptor protocol also works to bind functions to instances as methods


In [30]:
def f(a):
    return f"{a = } ({type(a) = })"

print(f"{f.__get__ = }")
print(f"{f(1) = }\n")

class A:
    g = f

a = A()

print(f"{a.g = }")
print(f"{f.__get__(a) = }\n")

print(f"{A.g = }")
print(f"{f.__get__(None, A) = }")



f.__get__ = <method-wrapper '__get__' of function object at 0x7fde640d5ea0>
f(1) = "a = 1 (type(a) = <class 'int'>)"

a.g = <bound method f of <__main__.A object at 0x7fde6454ce20>>
f.__get__(a) = <bound method f of <__main__.A object at 0x7fde6454ce20>>

A.g = <function f at 0x7fde640d5ea0>
f.__get__(None, A) = <function f at 0x7fde640d5ea0>


# What is `super()`?

- `super(type, object_or_type=None)` is just a proxy object to lookup for attributes in the `__mro__` attribute of classes
    + The first argument sets the predecessor of the starting position in the chain
    + The second argument (non-optional in practice) sets the object to extract the `__mro__` attribute from
- It can be used When used anywhere, but when used **inside a method of a class** the interpreter adds automatic values for both arguments if they are missing
```python
    class Foo(Bar, A, B):
        def __init__(self, *args):
            super().__init__(*args)  # Transformed into:  super(Foo, self).__init__(*args)
            
```

In [33]:
class A:
    @property
    def prop(self):
        return "This is an A instance"

    @classmethod
    def class_method(cls):
        return f"Class is {cls.__name__}"


class B(A):
    @property
    def prop(self):
        return "This is an B instance"
    
class C(B):
    @property
    def prop(self):
        return "This is an C instance"
    
    def from_instance(self):
        return super().prop

c = C()    
print(C.__mro__)
print(f"{c.prop = }")
print(f"{c.from_instance() = }\n")
print(f"{super(C, c).prop = }")
print(f"{super(B, c).prop = }")

print(f"{c.class_method() = }")
print(f"{super(B, C).class_method() = }")

(<class '__main__.C'>, <class '__main__.B'>, <class '__main__.A'>, <class 'object'>)
c.prop = 'This is an C instance'
c.from_instance() = 'This is an B instance'

super(C, c).prop = 'This is an B instance'
super(B, c).prop = 'This is an A instance'
c.class_method() = 'Class is C'
super(B, C).class_method() = 'Class is C'


# Summary
- Python syntax is just syntatic sugar translated into function calls
- Objects point to its class where all the definitions are
- The type of a class is a metaclass
- Everything inherits from `object`
- Every metaclass is a `type`
- Base classes are looked up based on the MRO
- Data descriptors have priority over instance attributes, and attributes over non-data descriptors
