# Advanced Python - Part 1 <img src="/content/drive/MyDrive/python4hpc-main/images/advanced/python-logo.png" style="align: right; width: 220px" />

Before we explore the more advanced toolkits and libraries we first want to give you an overview of topics such as
- built-in python collections and when to use them
- deep dive into object-oriented programming behind the scenes,
- what are the functional programming parts of python and how to use them
- a few words about the garbage collector and finally
- tools python has for simple caching strategies

## Why should I care?

Why should I care about basic python stuff? I'm using framework xyz ... why is it important?

<div class="alert alert-block alert-info"><b>
Knowing how to better structure your code, what tools are provided and what performance impact they have: it really matters, especially when you are writing highly complex code that should perform well.<br/>
Only once you understand how the environment works, what it does behind the scenes, only then you can take full control of it.
</b></div>

## Python version and implementation

Unless noted otherwise the performance details and the implementation notes refer to the latest __Python 3.10__ version as well as the official __CPython__ interpreter.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Containers and Utility classes

Python already comes with some very sophisticated built-in collection types everybody should know about

- `tuple` - fixed size list of arbitrary items
- `list` - simple dynamic list of arbitrary items
- `set` - unordered collection of hashable values that supports mathematical operations
- `frozenset` - same as set but immutable and thus hashable
- `dict` - the default mapping type for key-value storage

There are also more advanced lesser known collection types and helpful utilities within the python library

Overview of the classes in the _collection_ package:

- `namedtuple` - a factory function to create tuple subclasses with named fields
- `deque` - a double-ended queue
- `ChainMap` - a proxy class to provide a common view to multiple mapping instances
- `Counter` - simple helper class for counting hashable objects
- `OrderedDict` - this dict implementation focuses on insertion order and is optimised for reordering
- `defaultdict` - dict that provides default values for keys
- `UserDict`, `UserList`, `UserStr` - base classes for implementations of dict, list or str subclasses

Another important container type is the `dataclass` from the _dataclasses_ packages.

Also have a look at the following pages in the python documentation and the python wiki:
* __[Sequence types](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)__
* __[Collection package docs](https://docs.python.org/3/library/collections.html)__
* __[TimeComplexity of Python Collections](https://wiki.python.org/moin/TimeComplexity)__


### Everyday containers

#### [Tuple](https://docs.python.org/3/library/stdtypes.html#tuple)

A `tuple` is a simple immutable fixed-size sequence of arbitrary values.

It is often used implicitly as return values in functions or in general when the number of elements is fixed and ease of use is important.

Since tuples are immutable they work with `hash` and have a fixed hash value. Because of this they can be used in all cases where an immutable hashable datatype is required (e.g. keys in dicts)

In [None]:
# parentheses are actually optional but often used for clarity
a_tuple = (1, 'string', 32, )
print('via parentheses:', a_tuple)

# this is also called 'packing'
a_tuple = 1, 'string', 32
print('packing:', a_tuple)

# attention: if parentheses are used like this a 'generator object' will be constructed instead
# see "Functional Programming" for more info
print('generator: ', (a for a in a_tuple))

# alternative creation via tuple constructor using an iterable/sequence
a_tuple = tuple([1, 'string', 32, ])
print('via constructor: ', a_tuple)

# deconstruction of a tuple into values aka 'unpacking'
a, b, c = a_tuple
print('unpacked:', a, b, c)

# indexed access
print('indexed access:', f'{a_tuple[0]}, {a_tuple[2]}')

# tuples are implicitly used to return multiple values from a function
def calc():
    return 1, 2, 3

print('tuple from return:', calc())

# tuples can be hashed (if they only contain hashable objects)
print('tuple hashed: ', hash(a_tuple))

#### [List](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)

The default python collection for storing items in a consecutive list. Lists are mutable and thus can dynamically grow and shrink.

Contrary to the name 'list' the default python implementation actually uses an array of references internally and not e.g. a linked list. In general the python implementation always accesses object instances indirectly by only holding references to the actual object. This is necessary to allow for arbitrary types in almost all places. Coming back to the list this means that the largest runtime costs come from growing beyond the current size and inserting/deleting elements at the start. In these cases the underlying array of references needs to be resized (realloc + copy) or contained references need to be moved to make or remove space for the inserted/deleted element.

A list can easily be used as a stack by using `append` and `pop` methods but if you need e.g. a FIFO queue other types are preferable (see `deque`). The same goes for frequent lookups where other datatypes are more performant e.g. `set` or `dict`.

In [None]:
# use square brackets and comma separated values to create a list and assign it to a variable
b_list = ['Hello', 'World', '!']
print('via square brackets:', b_list)

# alternative creation via constructor using an iterable e.g. here from a tuple
b_list = list(('Hello', 'World', '!'))
print('via list constructor:', b_list)

# deconstructiong a list into variables works too
a, b, c = b_list
print('a, b, c: ', a, b, c)

# insert element at index
b_list.insert(2, 'Johnny')
print('after insert:', b_list)

# append to the end
b_list.append('!')
print('after append:', b_list)

# indexed access of elements
print('indexed access:', f'{b_list[1]} {b_list[4]}')

# remove item by using del & index
del b_list[2]
print('b_list after del:', b_list)

# remove item by value
b_list.remove('World')
print('b_list after remove:', b_list)

# remove last item
b_list.pop()
print('b_list after pop:', b_list)

# copy whole list
c_list = b_list.copy()
print('c_list (copy):', c_list)

# reverse list
c_list.reverse()
print('c_list (reversed):', c_list)

#### [Set and Frozenset](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset)

If the uniqueness of items needs to be guaranteed and/or set operations such as `union`, `intersection` etc are needed, the `set` collection is the datatype of choice. In addition membership tests are also usually cheap to perform. Be aware that `set` is *unordered* so the insertion order is not guaranteed.

Internally a `set` can be thought of as an unordered hashmap. This means that the items are inserted based on their hash value. Values that have been added to a set are required to not change their hash once they have been added. Breaking this rule breaks the internal mechanics of the set and produces possibly undefined behaviour.

Sets can also be used with operators such as '&' (intersection), '|' (union), '-' (difference) and '^' (symmetric_difference) instead of calling the method.

A `frozenset` is basically the same as `set` but it is immutable and hashable. Once created its contents cannot be changed anymore. These additional constraints allow the use of such instances as keys for e.g. a dictionary. Regarding performance both types are equivalent.

In [None]:
# create two sets with chars from strings (=iterable)
a_set = set('Hello World!'.upper())
print(a_set)

b_set = set('How are you today?'.upper())
print(b_set)

# get characters in both sets
elements_in_both = a_set.intersection(b_set)
print('elements in both sets: ', elements_in_both)

# get differences (not symmetrical)
elements_in_a_but_not_in_b = a_set.difference(b_set)
print('elements in a but not in b: ', elements_in_a_but_not_in_b)
elements_in_b_but_not_in_a = b_set.difference(a_set)
print('elements in b but not in a: ', elements_in_b_but_not_in_a)

# get characters that are only in one set
elements_only_in_one = a_set.symmetric_difference(b_set)
print('elements only in one set: ', elements_only_in_one)

# all unique elements
all_elements = a_set.union(b_set)
print('all elements: ', all_elements)

# demonstrate that e.g. lists cannot be added to a set
try:
    set(([1, 2, 3], [4, 5, 6]))
except Exception as e:
    print('Cannot add unhashable items - ', e)

#### [Dict](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)

Dictionaries are a very central datatype in python. It stores unique keys that point to a certain value.

Lookups via keys is usually fast - this means that membership tests & retrieving a value is considerably cheap.

In [None]:
# initialise from kwargs
a_dict = dict(a=1, b=2)

# initialise with {}
a_dict = {'a': 1, 'b': 2}
print(a_dict)

# update dict from another dict
a_dict.update({'b': 3, 'c': 4})
print(a_dict)

# insert a new key-value pair
a_dict['d'] = 5

# change an existing value
a_dict['a'] -= 1
print(a_dict)

As stated before a key needs to be hashable and should always produce the same hash so that it can be used in a `dict` without problems.

A `dict` instance itself is not hashable in general which means that dict instances cannot directly be used as keys.

However you can easily have a `dict` value in a class instance and use the instance as key. This works because classes have a default hash implementation that uses the actual memory address for computing the hash. Please note that this also means that to instances with the same values will have a different hash, since they are not the same object!

In [None]:
a_dict = dict(a=1, b=2)

# demonstrate that dicts are not hashable
try:
    hash(a_dict)
except TypeError as e:
    print('Cannot hash dict - ', e)

# demonstrate that dicts cannot have dicts as keys
dict_key = dict(z=100)
try:
    a_dict[dict_key] = 3
except TypeError as e:
    print('Cannot have dict as key - ', e)

# however: we can have classes containing dicts as keys
class MyClass: pass
obj = MyClass()
obj.my_attrib = dict(y=101)

a_dict[obj] = 4
print(a_dict)

# the object have a different hash
obj2 = MyClass()
obj2.my_attrib = obj.my_attrib
print('hash obj:', hash(obj))
print('hash obj2:', hash(obj2))


#### [Deque](https://docs.python.org/3/library/collections.html#deque-objects)

The name `deque` stands for *double ended queue* and already tells a lot about the desired use case of this class. If there is the need to quickly insert and remove from the beginning or end of the data structure this collection can be used (FIFO / LIFO). It provides the necessary methods and is optimised for this very purpose.

In addition to that a `deque` can also be used with a `maxlen` argument. This means that when new items are added to one side, the collection discards as many items from the other side to satisfy the `maxlen` count again.

In [None]:
from collections import deque

a_deque = deque()

# append on the right side
a_deque.append(1)

# append single element on the left side
a_deque.appendleft(10)
print('after append right and left:', a_deque)

# extend from iterable on the left side
# note: adds the iterable in reverse
a_deque.extendleft((4, 5, 6, ))
print('after extend left:', a_deque)

# pop an item from the left isde
a_deque.popleft()

# pop an item from the right side
a_deque.pop()
print('after pop left and pop right:', a_deque)

# rotate elements right by 1 element
a_deque.rotate(1)
print('after rotate right by 1:', a_deque)

# create a deque with maxlength of 3
b_deque = deque(maxlen=3)
b_deque.extend((1, 2, 3, ))
print('deque with maxlen:', b_deque)

# appending on the right side shifts out an element from the beginning
b_deque.append(4)
print('append element:', b_deque)

# appending on the left side shifts out an element from the end
b_deque.appendleft(10)
print('append left:', b_deque)

Find a simple performance comparison between *list* & *deque* below

In [None]:
from collections import deque

def lifo_list():
    l = []
    for i in range(1000):
        l.append(i)
    while l:
        item = l.pop()

print('LIFO with list')
%timeit -n 100 -r 5 lifo_list()

In [None]:
from collections import deque

def lifo_deque():
    d = deque()
    for i in range(1000):
        d.append(i)
    while d:
        item = d.pop()

print('LIFO with deqeue')
%timeit -n 100 -r 5 lifo_deque()

Used as a _LIFO_ (last in first out) the performance is nearly the same.

In [None]:
from collections import deque

def fifo_list():
    l = []
    for i in range(1000):
        l.append(i)
    while l:
        item = l[0]
        del l[0]

print('FIFO with list')
%timeit -n 100 -r 5 fifo_list()

In [None]:
from collections import deque

def fifo_deque():
    d = deque()
    for i in range(1000):
        d.append(i)
    while d:
        item = d.popleft()

print('FIFO with deqeue')
%timeit -n 100 -r 5 fifo_deque()

When used as a _FIFO_ (first in first out) the deque is definitely faster.

#### [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict)

The main use case for `OrderedDict` is when insertion order is important and frequent reordering operations are necessary. Because of that focus the underlying implementation is very different from the default `dict`: internally a doubly linked list is used in CPython.

The CPython `dict` implementation also guarantees the insertion order of keys since __[Python 3.6](https://docs.python.org/3.6/whatsnew/3.6.html#new-dict-implementation)__. This implementation detail of `dict` was officially added to the language spec in __[Python 3.7](https://docs.python.org/3.7/whatsnew/3.7.html#summary-release-highlights)__ and made the `OrderedDict` less important compared to earlier python version. Still it can be a very useful class for specific problems.

In [None]:
from collections import OrderedDict

# create a new ordered dict form a dict argument
o_dict = OrderedDict({'c': 10, 'a': 2, 'b': 3})
print(o_dict.keys())

# move the key 'c' to the end of the ordered dict
o_dict.move_to_end('c')
print(o_dict.keys())

# remove an item from the back
o_dict.popitem()
print(o_dict.keys())

Although insertion order is guaranteed for `dict` it is still not easy to reorder elements e.g. moving an element to the back can be done but inserting at specific positions gets extremely tricky without recreating the whole dictionary. The `OrderedDict` has simple solutions for such use cases

Consider this example: We have given input dictionary and want to move some elements to the back of the internal order.

In [None]:
d = {i: i*2 for i in range(100000)}
keys = [i*2 for i in range(10000)]

def dict_move_keys_to_back_inplace():
    for k in keys:
        d[k] = d.pop(k)

print(list(d.items())[:10])
dict_move_keys_to_back_inplace()
print(list(d.items())[:10])

print('standard dict move elements to back')
%timeit -n 100 -r 5 dict_move_keys_to_back_inplace()

In [None]:
from collections import OrderedDict

d = OrderedDict({i: i*2 for i in range(100000)})
keys = [i*2 for i in range(10000)]

def ordered_dict_move_keys_to_back_inplace():
    for k in keys:
        d.move_to_end(k)

print(list(d.items())[:10])
ordered_dict_move_keys_to_back_inplace()
print(list(d.items())[:10])

print('ordered dict move elements to back')
%timeit -n 100 -r 5 ordered_dict_move_keys_to_back_inplace()

However the internal data structure of `OrderedDict` also has considerable drawbacks when space efficiency, iteration speed and update operations are of importance.

In such cases a standard `dict` usually performs better - consider the following very simple insert example

In [None]:
def dict_insert():
    d = dict()
    for i in range(10000):
        d[i] = i

print('standard dict insert')
%timeit -n 100 -r 5 dict_insert()

In [None]:
from collections import OrderedDict

def ordered_dict_insert():
    d = OrderedDict()
    for i in range(10000):
        d[i] = i

print('ordered dict insert')
%timeit -n 100 -r 5 ordered_dict_insert()

### Utility Containers

#### [Namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple)

Named tuples were primarily created to improve readability. Behind the scenes it is actually a factory function that creates a new type with the fields given as arguments and the `tuple` as base class. The implementation of the new type makes sure that all necessary values are there upon initialisation and that an access to a non-existing element raises an exception. It supports all the operations a regular `tuple` would plus access to the attributes by name.

Regarding performance a `namedtuple` is usually worse than a regular `tuple` since there is additional code running to realise the checks behind the scenes.

In [None]:
from collections import namedtuple
from typing import NamedTuple

# create a named tuple by using the factory function `namedtuple`
TupleFG = namedtuple('TupleFG', ('f', 'g'))
print('Inheritance chain: ', TupleFG.mro())

x = TupleFG(1, 2)
print('Instance of TupleFG: ', x)

# alternative way of creating a named tuple type
# this way it is possible to easily add type hints for the IDE
class TupleXY(NamedTuple):
    x: int
    y: int

y = TupleXY(3, 4)
print('Instance of TupleXY: ', y)

# use the factory function to create a named tuple
z = TupleXY._make((1, 2,))
print('Instance of TupleXY via factory function: ', z)

#### [Dataclasses](https://docs.python.org/3/library/dataclasses.html)

Dataclasses were introduced first in Python 3.7 and allow the user to quickly put together simple data container classes using type hints. It is important to mention that the types are not enforced or that automatic conversions will be applied. This is still up to the user.

The methods that are generated out of the box can be customised by specifying flags in the decorator e.g. by specifying `slots=True` the dataclass implementation will use slots for its fields instead of an internal `dict` (see "Object-oriented Programming -> Slots").

Regarding performance and memory requirements simple `dict` operations will still outperform dataclasses but they _can_ be as fast as a class using slots. Compared to `namedtuple` the preference is very clear: use `dataclass` unless you have a specific reason to use a `namedtuple`.

In [None]:
from dataclasses import dataclass
from typing import Any

@dataclass(repr=False, eq=False, order=False, slots=True)
class DataclassDemo:
    x: int
    y: int
    value: Any

data = DataclassDemo(10, 12, dict())

#### [Defaultdict](https://docs.python.org/3/library/collections.html#defaultdict-objects)

`defaultdict` is a subclass of `dict` and behaves just like regular dictionaries. The main difference is that you can provide a factory function for values. Whenever a key is not yet present in the dictionary a new entry is created using the default factory function.

In [None]:
from collections import defaultdict

# provide a default value of '100' for all values
a_dict = defaultdict(lambda: 100)
print('defaultdict:', a_dict.keys())

a_dict['a'] += 10
print('add entry:', a_dict.keys())
print('entry value:', a_dict['a'])

a_dict['b'] -= 40
print('add entry:', a_dict.keys())
print('entry value:', a_dict['b'])

This default factory function parameter enables us to simplify workflows such as this often seen construct:

In [None]:
input_data = [('foo', 'bar'), ('foo', 'baz'), ('bar', 'boo')]
values_per_key = {}
for key, value in input_data:
    if key not in values_per_key:
        value_list = []
        value_list.append(value)
        values_per_key[key] = value_list
    else:
        values_per_key[key].append(value)

print(values_per_key)

Using 'defaultdict' it is possible to simply write:

In [None]:
from collections import defaultdict

input_data = [('foo', 'bar'), ('foo', 'baz'), ('bar', 'boo')]
values_per_key = defaultdict(list)
for name, value in input_data:
    values_per_key[name].append(value)
print(values_per_key)

#### [Counter](https://docs.python.org/3/library/collections.html#counter-objects)

A `Counter` class can be used to quickly count the occurrences of (hashable) items in a list. It is a subclass of `dict` and thus behaves like a dict were the keys are the items and the values are the counts. If a non-existing element is access the `Counter` instance will return `0` instead of raising a `KeyError`.

In [None]:
from collections import Counter

# feed in a string (=iterable) and count the occurences of each character
cnt1 = Counter('Hello World!')
print('cnt1: ', cnt1)

# print out the most common character
print(cnt1.most_common(1))

# the constructor can also take in a mapping that already has counts
cnt2 = Counter({'red': 3, 'blue': 1, 'green': 0})
print('cnt2: ', cnt2)
print(cnt2.most_common(1))
print(cnt2.total())

# it is also easily possible to combine the values of two counters
cnt3 = Counter(('green', 'blue', 'yellow', 'blue'))
print('cnt3: ', cnt3)

cnt4 = cnt2 + cnt3
print('cnt4: ', cnt4)

#### [ChainMap](https://docs.python.org/3/library/collections.html#chainmap-objects)

A `ChainMap` can be used to virtually link multiple dictionaries together without discarding the original dicts. If a value gets updated it stores the updates in the first dict argument, lookups however search the full chain (from left to right).

In [None]:
from collections import ChainMap

# create three different dictionaries for this example
runtime_arguments = {}
cmdline_arguments = {'b': False, 'c': 6.2830}
default_arguments = {'a': 0, 'b': True, 'c': 3.1415}
print('runtime_arguments:', runtime_arguments)
print('cmdline_arguments:', cmdline_arguments)
print('default_arguments:', default_arguments)

# put them in a ChainMap (order matters)
c_map = ChainMap(runtime_arguments, cmdline_arguments, default_arguments)
print('ChainMap: ', c_map)

# lookup keys and see how the value resolution works
print('param a (from default_arguments):', c_map['a'])
print('param b (from cmdline_arguments):', c_map['b'])
print('param c (from cmdline_arguments):', c_map['c'])

# adding an override via ChainMap for "c" sets the value in "runtime_arguments"
c_map['c'] = 10.0
print(c_map)

print('runtime_arguments:', runtime_arguments)
print('cmdline_arguments:', cmdline_arguments)
print('default_arguments:', default_arguments)

print('param c (now from runtime_arguments):', c_map['c'])

#### UserDict, UserList, UserStr

These classes exist to simplify user defined implementations of `dict`, `list` as well as `str`. They make their content available in the `data` attribute and thus allow operating on the data for e.g. custom implementations of existing functions or new functions.

In [None]:
from collections import UserList

class MyUserList(UserList):
    def clear_odd(self):
        """ Add a simple custom function to clear the add values (assumes that the items are int). """
        self.data = [v for v in self.data if v % 2 == 0]

l = MyUserList([1, 2, 3, 4, 5, 6, 7, 8])
print('Initial list:', l)
l.clear_odd()
print('After executing custom method: ', l)

## Object-oriented Programming - Backgrounds

Also see: __[Python Datamodel](https://docs.python.org/3/reference/datamodel.html)__

In python everything is an object. Objects that are called classes are simply put just template objects that tell the python interpreter how to create an object of certain attributes & methods to operate on these attributes.

The most simple way in python to define a class is `class C: pass`. This is equivalent to writing `C = type('C', (), dict())`. Both statements produce a new class object that can be considered a new `type`.

In [None]:
class Test: pass
print('class Test:', Test)

Test = type('Test', (), dict())
print('class Test:', Test)

print('type of Test:', type(Test))

### Creating new instances

When a user wants to create a new `Test` instance by executing `Test()` the python interpreter first calls the `__new__` method on the class object. This method is by definition a class method and is responsible for creating a new object instance of the type (see [class methods](#Static-&-Class-attributes-and-methods)). Usually this is done by delegating to `type`'s `__new__` method (the base class). The purpose of `__new__` is to provide users with a hook, an extension mechanism to utilise the class creation process to customise certain aspects of the created object instance. In most cases it is not necessary to implement/use this method directly - see [Metaclasses / Metaprogramming](#Metaclasses-/-Metaprogramming) for other examples.

After the object instance has been created the interpreter will then call the `__init__` method on the instance. This method needs to take care of internal initialisation of state and variables of the instance (aka a constructor).

Once this process is completed the construction of a new object instance of type `Test` has been finished and the object can be used.

In [None]:
class Test:
    def __new__(cls, *args, **kwargs):
        print('new instance: __new__')
        return super().__new__(cls)

    def __init__(self, *args, **kwargs):
        super().__init__()
        print('new instance: __init__')
        self.test_attribute = 10

test_instance = Test()
print('instance:', test_instance)

### Special attributes and methods

In the previous example we already had a first examples of special attributes and methods. In fact python uses them a lot to provide extension points and to introduce user defined behaviour.

For example in a regular instance of a class all attributes that are created are stored in an internal dictionary. The field's name is `__dict__` and can be accessed and manipulated like any other python dict. The pattern `__<name>__` tells a user that this variable has a special meaning.

A similar naming scheme is used for e.g. implementing a custom `str` conversion method (`__str__`), a custom `hash` implementation (`__hash__`) or for providing a custom 'official string representation' of the instance whenever `repr` is called (`__repr__`). If `__str__` is not implemented it falls back to the `__repr__` method to provide a string representation of the class.

A python object can be inspected by passing it to the `dir` method. Since everything is an object in python `dir` can be equally used on `functions`, `lambdas` and all other python objects. The same is true for special attributes and methods: in fact every python object has and uses special attributes and methods.

In [None]:
import os

class Test:
    def __init__(self, attrib):
        self.attrib = attrib

    def __str__(self):
        return f'Test instance with attrib={self.attrib}'

    def __repr__(self):
        return f'<Test attrib={self.attrib}>'

t = Test(10)
print('print instance:', t)

print('internal dict:', t.__dict__)

print(t.__str__)
print('custom str:', str(t))

print(t.__repr__)
print('custom repr:', repr(t))

print(t.__hash__)
print('default hash:', hash(t), os.linesep)


print('dir on t:', dir(t), os.linesep)

print('dir on Test.__init__:', dir(Test.__init__))

Other often used special methods are the comparison methods (`__lt__`, `__eq__`, `__gt__`, etc ...) and the attribute access methods (`__getattr__`, `__getattribute__`, `__setattr__`, ...).

Since there are really lots of details to all of those function please refer to the python documentation for a complete list and description of all methods.

An important realisation is that attributes, methods and most other characteristics of a class can thus be changed at runtime.

In [None]:
class Test:
    pass

t = Test()

# print class name
print('class name: ', t.__class__.__name__)

# dict is empty
print('empty class instance: ', t.__dict__)

# dynamically add a new attribute
t.new_attribute = 'foo'

# dict contains 'new_attribute' afterwards
print('class instance afterwards: ', t.__dict__)

# print default representation
print('default repr of class instance: ', repr(t))

# change __repr__ method at runtime for all instances
Test.__repr__ = lambda self: f'<Test new_attribute={self.new_attribute}>'

# print new representation
print('new repr of class instance: ',repr(t))

# dynamically replace constructor
def new_init(self, other_attribute):
    self.other_attribute = other_attribute

Test.__init__ = new_init

# create new instance using new constructor
u = Test('other')

# dict contains 'other_attribute'
print('with new constructor: ', u.__dict__)

try:
    # creating a new instance now requires the new attribute
    Test()
except TypeError as e:
    print('caught TypeError after setting new "__init__" method - ', e)

# it is even possible (but discouraged) to change the class of an instance at runtime
class BetterTest:
    def foo(self):
        return 'foo'

t.__class__ = BetterTest
print('after replacing type:', type(t))
print('calling new method:', t.foo())

### Slots

If arbitrary attributes are not necessary a class can be restricted to only allow certain attributes by using so called slots. In addition to having a check in place that an accessed attribute is really existing it can also save memory and improve lookup speeds.

Usually the attribute lookup uses the specified attribute name and traverses a number of possible hook-points (e.g. the internal `__dict__`) that could provide a value for the name.

When using slots python will create descriptors for the specified attributes. This omits the creation of the internal `__dict__` attribute and lets the python interpreter omit some lookup steps and instead directly access the attributes by using these descriptors (see [Datamodel - Slots](https://docs.python.org/3/reference/datamodel.html#slots) for more information).

Since there is no internal dict to store and retrieve attributes no new attributes can be added at runtime. Such an access fails with an `AttributeError` when slots are used.

Note about `tooling.getsize`: The method `sys.getsizeof` doesn't take into account the actual size of the objects attributes - you only see the space that is required for the object instance itself. Since every attribute is stored in the `__dict__` attribute this won't give you the complete picture. In order to show the actual size difference when using slots, we are using a small snippet to recursively get the size of all things stored in the object.

In [None]:
from sys import getsizeof
from tooling.getsize import getsize

class ClassExample:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class_example = ClassExample(1, 2, 3)
print('size of class layout with __dict__: ', getsizeof(class_example))
print('total size of class with __dict__: ', getsize(class_example))

class SlotExample:
    __slots__ = ('x', 'y', 'z')

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

slot_example = SlotExample(1, 2, 3)
print('size of class with __slots__: ', getsizeof(slot_example))
print('total size of class with __dict__: ', getsize(slot_example))
print('__dict__ of class: ', getattr(slot_example, '__dict__', None))
print('descriptor of SlotExample.x: ', SlotExample.x)

try:
    slot_example.new_attribute = 'bar'
except AttributeError:
    print('Caught AttributeError - assignment of new attributes is not possible when slots are used.')

### Inheritance

Python also supports polymorphism and (multiple) inheritance. The inheritance hierarchy is defined by the MRO (method resolution order) and can be looked up at runtime using the `mro` method.

Regarding method resolution order consider the following example of multiple inheritance (diamond problem) for demonstration purposes.

In [None]:
import abc      # see https://docs.python.org/3/library/abc.html

class Animal(abc.ABC):
    def __init__(self):
        super().__init__()
        self.speed = 1
        self.smell = 1

    def speak(self):
        # default implementation raises a not implemented error
        # note: raising an error prevents cooperative inheritance from working
        raise NotImplementedError

    def cuddle(self):
        # default implementation does nothing
        return ''

    @abc.abstractmethod
    def attack(self):
        # default implementation does nothing
        return ''

class Dog(Animal):
    def __init__(self):
        super().__init__()
        self.speed = 20
        self.smell = 100

    def speak(self):
        return 'bark'

    def cuddle(self):
        return f'wince {super().cuddle()}'

    def attack(self):
        return f'grrrrr'

class Cat(Animal):
    def __init__(self):
        super().__init__()
        self.speed = 10
        self.smell = 20
        self.purr_factor = 9000

    def speak(self):
        return 'meow'

    def cuddle(self):
        return f'purr {super().cuddle()}'

    def attack(self):
        return f'hiss'

class DogCat(Dog, Cat):
    pass

class CatDog(Cat, Dog):
    pass

We are using `@abc.abstractmethod` on the method `attack`. In python this is a way to force all subclasses to implement a specific method. If a subclass does not implement the method it is itself considered an abstract class and cannot be instantiated.

Note how the method `speak` which is also not implemented on `class Test` does not generate any error.

In [None]:
try:
    Animal()
except TypeError as e:
    print('Animal Error:', e)

class Test1(Animal):
    pass

try:
    Test1()
except TypeError as e:
    print('Test1 Error:', e)

class Test2(Animal):
    def attack(self):
        return 'klonk'

test2 = Test2()
print('MRO of Test2: ', type(test2).mro())
print(test2.attack())
print('Internal dict of test2: ', test2.__dict__)

print('is "test2" an instance of Animal: ', isinstance(test2, Animal))
print('is "test2" an instance of Dog: ', isinstance(test2, Dog))
print('is "test2" an instance of Cat: ', isinstance(test2, Cat))

Next we're going to have a look at the classes `DogCat` and `CatDog`:

In [None]:
kotpies = DogCat()
print('MRO of DogCat: ', type(kotpies).mro())
print('Internal dict of kotpies: ', kotpies.__dict__)

# use 'isinstance' to check if an instance is of a given type
print('is "kotpies" an instance of Animal: ', isinstance(kotpies, Animal))
print('is "kotpies" an instance of Dog: ', isinstance(kotpies, Dog))
print('is "kotpies" an instance of Cat: ', isinstance(kotpies, Cat))

print('kotpies says:', kotpies.speak())
print('cuddle kotpies:', kotpies.cuddle())
print('kotpies attacks:', kotpies.attack())

In [None]:
pieskot = CatDog()
print('MRO of CatDog: ', type(pieskot).mro())
print('Internal dict of pieskot: ', pieskot.__dict__)

# use 'isinstance' to check if an instance is of a given type
print('is "pieskot" an instance of Animal: ', isinstance(pieskot, Animal))
print('is "pieskot" an instance of Dog: ', isinstance(pieskot, Dog))
print('is "pieskot" an instance of Cat: ', isinstance(pieskot, Cat))

print('pieskot says:', pieskot.speak())
print('cuddle pieskot:', pieskot.cuddle())
print('pieskot attacks:', pieskot.attack())

From the above examples we can see that the MRO works by going from subclass(es) to baseclass until it stops at the implicit `object` base class every python type inherits from.

Note that the order of the (parent-)classes in the definition of the subclasses matters. For example `DogCat` (`Dog` comes before `Cat`) vs `CatDog` (`Cat` comes before `Dog`).

If you need cooperative inheritance to work for your constructors and methods, don't forget to call `super().<methodname>()` in the body of the specific method.

### Static & Class attributes and methods

Classes can have so called static attributes, static methods and class methods.

What this means is that the method is not considered to work on an instance of a class (`self`) but just resembles a free standing function. In the case of a class method the actual class object is given as the first argument.

Since class objects are also just objects, they also have an internal `__dict__` and can store values as well. Those values are accessible directly via the class object itself and also via an instance. But be aware that direct assignment of a class attribute via an instance will actually create an attribute on the instance rather than changing the class attribute itself.

In [None]:
class Static:
    count = 10

    @staticmethod
    def static_method():
        return f'static count is: "{Static.count}"'

    @classmethod
    def class_method(cls):
        return f'class: "{cls.__name__}" bar - static count is "{cls.count}"'

a, b = Static(), Static()
print(a.count, b.count, Static.count)

Static.count = 5
print(a.count, b.count, Static.count)

# be aware! when trying to access a static attribute like this, it creates a new instance attribute instead
print(a.__dict__)
a.count = 4
print(a.__dict__)

print(a.count, b.count, Static.count)

# call the static method, it has access to static attributes
print(Static.static_method())

# call the class method, which is similar to static method but gets the type as first argument
print(Static.class_method())

# you can also call static and class methods via an instance
print(a.static_method())
print(b.class_method())

### Metaclasses / Metaprogramming

Since even types themselves are objects we can use that to our advantage. Creating a class derived from `type` can be used to add additional behaviour to instances created a this class. A class that creates other class instances in that manor is called a `metaclass`.

This kind of mechanism is often used e.g. to realise ORM systems, to generate additional methods or code for definitions in classes.

Note that every class in python can only have exactly one metaclass.

In [None]:
from typing import Dict, List

class MyMetaclass(type):
    def __new__(cls, name, bases=None, namespace=None):
        print('using "MyMetaclass"', name, bases, namespace)
        instance = type.__new__(cls, name, bases, namespace)

        fields = {key: value
                  for key, value in namespace.items()
                  if isinstance(value, MyField)}

        for key, value in fields.items():
            print(f'adding field: {key}; {value}')
            setattr(instance, key, value.default)

        def check(self):
            for key, value in fields.items():
                if not value.required:
                    continue
                if getattr(self, key) == None:
                    raise ValueError(f'attribute must not be none: {key}')

        # hook up the function
        setattr(instance, 'check', check)

        return instance

# very basic field descriptor example
class MyField:
    def __init__(self, default=None, required=None):
        self.default = default
        self.required = required

# just here to demonstrate where it shows up
class BaseClass:
    pass

# subclass actually using the metaclass
class SubClass(BaseClass, metaclass=MyMetaclass):
    attrib1 = MyField(default=10, required=False)
    attrib2 = MyField(required=True)

# instantiate a new instance
sc_a = SubClass()

# print the values
print(sc_a.attrib1)
print(sc_a.attrib2)

# run check
try:
    sc_a.check()
except ValueError as e:
    print('check failed:', e)

# set attrib2 and try again
sc_a.attrib2 = 23
sc_a.check()
print('check passed')