# Day 1: Advanced Constructs



This cours is all about some more advanced Python concepts you might not see in a beginner's course. For a good overview of advanced concepts you can for example check out [this TDS Post](https://towardsdatascience.com/10-topics-python-intermediate-programmer-should-know-3c865e8533d6)

## Iterators

Loops in Python work differently than in languages like C++. If you want to implement a C-style loop in Python, you will probably write something like this:

In [1]:
signal = [15,25,32,48,24]
time_points = [1,5,7,9,10]

i=0
while i < len(signal):
  print(i, time_points[i], signal[i])
  i += 1

0 1 15
1 5 25
2 7 32
3 9 48
4 10 24


But this is kind of cumbersome. We need to track i manually, need to make sure we increment it, etc etc. So let's make it a bit nicer by using a Pythonic for loop:

In [2]:
for s,t in zip(signal, time_points):
  print(t,s)

1 15
5 25
7 32
9 48
10 24


Quite a bit shorter! If we want to get the index back, we can use the enumerate method:

In [3]:
for i, (s,t) in enumerate(zip(signal, time_points)):
  print(i,t,s)

0 1 15
1 5 25
2 7 32
3 9 48
4 10 24


How does Python make this so intuitive? The concept underlying this for loop is the so-called **iterator**.

![](https://camo.githubusercontent.com/e35ad1f313f321499924b47e3677b227bcc8c4f9a709a585cab6e3f9bd422d2c/68747470733a2f2f66696c65732e7265616c707974686f6e2e636f6d2f6d656469612f742e6261363332323264363366352e706e67)

We see that the two important methods of the iterator seem to be *iter()* and *next()*. These are exacmples of so-called [Dunder or Magic Methods](https://dbader.org/blog/python-dunder-methods) (dunder because the start and end with a double underscore in their definition). Dunder methods are methods that allow instances of a class to interact with the built-in functions and operators of the language, and they are often not directly called by the user, but more used internally. Let's have a look how we can construct our own iterator:

In [21]:
class InfiniteIterator():
  def __init__(self,item):
    self.item = item

  def __iter__(self):
      return self

  def __next__(self):
    return self.item

In [22]:
looper = InfiniteIterator('Python')
next(looper)
next(looper)

'Python'

In [None]:
for i in InfiniteIterator('Python'):
  print(i)

Nice, we created our first own iterator! However, we want to stop this iterator at some point, so wee need to implement a *StopIteration* exception. This is a special exception that is raised when an iterator is exhausted. Let's see how Python does it:

In [24]:
signal = [15,25,32,48,24]

signal_iter = iter(signal)

In [30]:
next(signal_iter)

StopIteration: 

Ok, Python has raised a StopIteration! For our own iterator, we can implement this as follows:

In [31]:
class FiniteIterator():
  def __init__(self,item, max_count):
    self.item = item
    self.max_count = max_count
    self.count = 0

  def __iter__(self):
      return self

  def __next__(self):
    if self.count >= self.max_count:
       raise StopIteration
    self.count += 1
    return self.item
    

In [32]:
looper = FiniteIterator('Python', 3)
for i in looper:
  print(i)

Python
Python
Python


With this we have demystified the for loop! It is just a while loop that calls the next() method of an iterator until a StopIteration is raised. We also learned about the general concept of an iterator and an iterable:

- An **iterable** is an object that has an __iter__ method which returns an **iterator** or, alternatively, which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that can be iterated over (e.g. a list, a tuple, a dictionary, a set, a string, a file, etc.)
- An **iterator** is an object with a __next__ method that returns the next element of a sequence. If there is no next element, it raises a StopIteration exception.

In our case, we fused both the iterable and the iterator into one class, but this is not necessary. We can also implement a separate iterator class. 

All iterable objects in Python have an __iter__ method that is called when the object is iterated over in a for loop. The __iter__ method returns an iterator object that defines a __next__ method, which accesses elements in the data structure one at a time. When there are no more elements, the __next__ method raises a StopIteration exception, which tells the for loop to terminate. All iterators must implement the __iter__ and __next__ methods. Let's see how this works:

In [19]:
x = 1
l = [1,2,3]

l.__iter__
#x.__iter__
#l.__next__
i = l.__iter__()
j = i.__iter__()
j.__next__()
#

1

## Generators

Now that we know how to implement our own iterator, we can also implement our own generator. A generator is a function that returns an iterator. It looks like a normal function except that it contains yield statements for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function. They ca be seen as **simplified iterators**, providing some syntactic sugar for creating iterators. With generators, we do not create classes with all the __init__ shebang etc, but only functions with the special *yield* statement. Let's see how we can implement our own generator:

In [33]:
def infinite_generator(item):
  while True:
    yield item

This was simple! We just replaced the __init__ method with a function and the return statement with a yield statement. Let's see how we can use this generator:

In [None]:
for i in infinite_generator('Python'):
  print(i)

In [34]:
def finite_generator(item, max_count):
  count = 0
  while count < max_count:
    yield item
    count += 1

Let's see what happens under the hood:

In [37]:
gen = finite_generator('Python', 3)
gen

<generator object finite_generator at 0x10d59a900>

In [41]:
next(gen)

StopIteration: 

Quite similar to our iterator! What does the yield statement do? It returns a value and suspends the execution of the function. The next time the function is called, execution continues from where it left off, with the same variable values it had before yielding. This allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list.

A generator stops when it reaches the end of the function, or when it encounters a return statement. We can use multiple yield statements in a generator, and each time the generator is called, it will continue from the last yield statement. Let's see how this works:

In [42]:
def threetimes_generator(item, max_count):
    yield item
    yield item
    yield "nearly empty!"

In [43]:
for i in threetimes_generator('Python', 3):
  print(i)

Python
Python
nearly empty!


To make it even easier to create generators, we can add even more syntactiv sugar via so-called generator expressions. These are very similar to list comprehensions, but instead of creating a list, they create a generator. Let's see how this works:

In [48]:
#List comprehension of numbers 1 to 1 million
numbers_list = [i for i in range(1,1000001)]
#size of list in mega bytes
import sys
print(sys.getsizeof(numbers_list)/1000000)

#Generator expression of numbers 1 to 1 million
numbers_gen = (i for i in range(1,1000001))
#size of generator in mega bytes
print(sys.getsizeof(numbers_gen)/1000000)

8.448728
0.000112


When would you use list comprehensions versus generator expressions? If you need to iterate over the result multiple times, you probably want to use a list comprehension. If you are just going to iterate over it once, however, a generator expression is more efficient. This is because it does not create an entire list of results, but instead returns a generator object that can be iterated over. This is especially important if the number of items in the sequence is large or potentially infinite, as might be the case when you are working with iterators that stream data from a file or network connection.

Another use case where list comprehensions are more useful is when you want to further work with the result (use the result as an argument to a function or index into it), like in the following example:

In [49]:
print(numbers_list[100])
print(numbers_gen[100])

101


TypeError: 'generator' object is not subscriptable

Finally, let's see how we can chain together multiple generators/iterators.

In [52]:
#stream of values from iterator gets passed to another iterator
data = [-1,5,10,3,-4,6,-78,4,65,2]

def absolute(data):
  for i in data:
    yield abs(i)

def square(data):
  for i in data:
    yield i**2

def add_one(data):
    for i in data:
        yield i+1

def is_big(data):
    for i in data:
        yield i>10
result = is_big(add_one(square(absolute(data))))
list(result)

[False, True, True, False, True, True, True, True, True, False]

## Closures and Decorators

We will now talk about closures and decorators. These are advanced topics, but they are very useful to know about. First, let's think about what we can do with functions in Python (and other languages). We can pass them as arguments to other functions, we can return them from functions, and we can assign them to variables. People often say that functions are *first-class objects* in Python. Let's see how this works:

In [55]:
def greet(name):
    return "Hello, " + name + "!"

greet('Pete')
say_hello = greet
say_hello('Pete')
del greet
# greet('Pete')
say_hello('Pete')
say_hello.__name__
say_hello.__qualname__

'greet'

So we can assign functions different names, we can pass them as arguments to other functions, and we can return them from functions. Functions that take other functions as arguments are called **higher-order functions**. Higher-order functions are often used to abstract away common patterns, so that we can reuse the same code with different functions. Let's see how this works:

In [58]:
def count(func, data):
    numbers = func(data)
    return sum(numbers)

data = [-1,5,10,3,-4,6,-78,4,65,2]

print(count(absolute, data))
print(count(square, data))

178
10516
1


We will see more of these higher-order functions when we talk tomorrow about functional programming and constructs like map and reduce.

There is another type of functions called **nested/inner functions**. These are functions that are defined inside another function. Let's see how this works:

In [63]:
#example of inner function
def outer(input):
    def inner(i):
        return f"Hello from inner function, I got {i} from outer function"
    return inner(input)

outer(1)

'Hello from inner function, I got 1 from outer function'

Let's start with closures. A closure is a function that retains the bindings of the free variables that exist when the function is defined, so that they can be used later when the function is invoked and the defining scope is no longer available. 

### Decorators

In [64]:
def log_decorator(func):
    def wrapper(*args, **kwargs):
        # Log before executing the function
        print(f"Calling {func.__name__} with args: {args}, kwargs: {kwargs}")

        # Execute the function
        result = func(*args, **kwargs)

        # Log after executing the function
        print(f"{func.__name__} returned: {result}")

        return result

    return wrapper

In [74]:
@log_decorator
def compute(a,b,operator):
    """Computes a simple arithmetic operation"""
    if operator == '+':
        return a+b
    elif operator == '-':
        return a-b
    elif operator == '*':
        return a*b
    elif operator == '/':
        return a/b

compute(1,2,'+')

Calling compute with args: (1, 2, '+'), kwargs: {}
compute returned: 3


3

In [68]:
?compute

[0;31mSignature:[0m [0mcompute[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mFile:[0m      /var/folders/6p/x1pd8qmd74j77pprvdl4wcxr0000gn/T/ipykernel_30074/2942572205.py
[0;31mType:[0m      function


In [70]:
compute.__name__
compute.__doc__

In [73]:
import functools

def log_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Log before executing the function
        print(f"Calling {func.__name__} with args: {args}, kwargs: {kwargs}")

        # Execute the function
        result = func(*args, **kwargs)

        # Log after executing the function
        print(f"{func.__name__} returned: {result}")

        return result

    return wrapper

In [75]:
@log_decorator
def compute(a,b,operator):
    """Computes a simple arithmetic operation"""
    if operator == '+':
        return a+b
    elif operator == '-':
        return a-b
    elif operator == '*':
        return a*b
    elif operator == '/':
        return a/b

compute(1,2,'+')
?compute

Calling compute with args: (1, 2, '+'), kwargs: {}
compute returned: 3


[0;31mSignature:[0m [0mcompute[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m,[0m [0moperator[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Computes a simple arithmetic operation
[0;31mFile:[0m      /var/folders/6p/x1pd8qmd74j77pprvdl4wcxr0000gn/T/ipykernel_30074/3294929627.py
[0;31mType:[0m      function


With this we have written a fully fledged logging decorator! If you are interested in pushing this further, [here](https://ankitbko.github.io/blog/2021/04/logging-in-python/) is an article that shows how to make this even more advanced.

## Descriptors and Properties

Descriptors are a powerful feature of Python that allow us to customize attribute access. They are the mechanism behind [properties, methods, static methods, class methods, and super](https://realpython.com/python-descriptors/). One could say that they put the magic into object accesses.

Descriptors are a protocol consisting of three methods: \_\_get__, \_\_set__, and \_\_delete__. If any of these methods are defined for an object, then that object is said to be a descriptor.

If only \_\_get__ is defined, then the object is a **non-data descriptor**. If \_\_set__ or \_\_delete__ are defined, then the object is a **data descriptor**. If none of these methods are defined, then the object is a **non-descriptor**.

Let's say we want to create a class describing a protein structure. We want to set a PDB ID and not allow it to be changed.

In [46]:
class PDBIdentifier:

    def __get__(self, obj, type=None):
        print("Getting PDB ID")
        return '1abc'

    def __set__(self, obj, pdb_id):
        print("accessing the attribute to set the value")
        raise AttributeError("Cannot change the value")

    def __delete__(self, obj):
        raise AttributeError("Can't delete attribute")
    
#protein structure class with non-data descriptors
class ProteinStructure:
        pdb_id = PDBIdentifier()
    
x = ProteinStructure()
x.pdb_id

Getting PDB ID


'1abc'

This works, but is still not ideal. We can still change the PDB ID by accessing the attribute directly. We can fix this by using a descriptor, here in the form of a property. Let's see how this works:

In [35]:
class ProteinStructure:
    def __init__(self, pdb_id):
        self.pdb_id = pdb_id
        self.path = f"/data/pdb/{pdb_id}.pdb"

    def getter(self):
        print("Getting PDB ID")
        return self._pdb_id

    def setter(self, pdb_id):
        if not isinstance(pdb_id, str):
            raise TypeError("PDB ID must be a string")
        if not pdb_id.isalnum():
            raise ValueError("PDB ID must be alphanumeric")
        if not len(pdb_id) == 4:
            raise ValueError("PDB ID must be 4 characters long")
        self._pdb_id = pdb_id

    def deleter(self):
        raise AttributeError("Can't delete attribute")
    
    pdb_id = property(getter, setter, deleter)

protein = ProteinStructure('1LOL')
protein.pdb_id

Getting PDB ID


'1LOL'

Nice, now we have a property that we can set and get, but not delete. The function `property()` is a built-in function that creates and returns a property object. The signature of the property function is:

`property(fget=None, fset=None, fdel=None, doc=None) -> object`

So it implements the descriptor protocol we talked about above.

We can also use the property as a decorator. Let's see how this works:

In [47]:
class ProteinStructure:
    def __init__(self, pdb_id):
        self.pdb_id = pdb_id
        self.path = f"/data/pdb/{pdb_id}.pdb"

    @property
    def pdb_id(self):
        print("Getting PDB ID")
        return self._pdb_id

    @pdb_id.setter
    def pdb_id(self, pdb_id):
        if not isinstance(pdb_id, str):
            raise TypeError("PDB ID must be a string")
        if not pdb_id.isalnum():
            raise ValueError("PDB ID must be alphanumeric")
        if not len(pdb_id) == 4:
            raise ValueError("PDB ID must be 4 characters long")
        self._pdb_id = pdb_id

    @pdb_id.deleter
    def pdb_id(self):
        raise AttributeError("Can't delete attribute")

In [48]:
igg = ProteinStructure('1IGG')
igg.pdb_id

Getting PDB ID


'1IGG'

Nice, this makes it even easier to use properties! After all this stuff, you might ask yourself what we need this for. Below are some answers to frequently asked questions about properties and descriptors I get.

- Why should we use properties? They allow us to customize attribute access. This is useful if we want to do some checks before setting an attribute, or if we want to make sure that an attribute is always in a certain format. 

- What is the difference between a property and a method? A property is accessed like an attribute, while a method is called with parentheses. 

- What is the difference between a property and a descriptor? From [StackOverFlow](https://stackoverflow.com/questions/12846116/python-descriptor-vs-property#:~:text=The%20Cliff's%20Notes%20version%3A%20descriptors,properties%20are%20implemented%20using%20descriptors.): descriptors are a low-level mechanism that lets you hook into an object's attributes being accessed. Properties are a high-level application of this; that is, properties are implemented using descriptors. Or, better yet, properties are descriptors that are already provided for you in the standard library.

    If you need a simple way to return a computed value from an attribute read, or to call a function on an attribute write, use the @property decorator. The descriptor API is more flexible, but less convenient, and arguably "overkill" and non-idiomatic in this situation. It's useful for more advanced use cases, like implementing bound methods, or static and class methods; when you need to know, for example, if the attribute was accessed through the type object, or an instance of the type.

- Why do we use properties and descriptors and not plain old getter and setter methods as in C or Java? The use of getter and setter methods is more prevalent in C++ and Java than in Python due to the fundamental differences in how these languages handle data encapsulation and the principle of object-oriented programming.

    1. Access Modifiers: In languages like C++ and Java, data encapsulation is achieved through the use of access modifiers (public, private, protected), which restrict direct access to class fields. Hence, to access or modify these private fields, getter (accessor) and setter (mutator) methods are used. This is in line with the principle of encapsulation in object-oriented programming, where you want to hide the internal representation of an object and only expose methods that are safe to use.

    2. Python Approach: Python, on the other hand, does not have strong support for access modifiers in the same sense. All attributes in Python are technically public, though by convention, a leading underscore is used to denote a field that should be treated as if it were private. Python prefers simplicity and readability, and the use of getters and setters for simple attribute access or assignment can be considered unpythonic.

    3. Properties in Python: Python provides a more elegant solution in the form of property decorators, which allow you to define methods that are accessed like simple attributes. This means you can start with a simple attribute and, if you later need special behavior when that attribute is accessed or modified, you can add it with @property and @<attribute>.setter decorators without changing the attribute's interface.

    This approach also supports the principle of encapsulation, as these methods can hide complex operations and validations behind what appears to be simple attribute access and assignment, but it does so in a way that feels more natural in Python.

If you want to learn/recap more about properties and descriptors, I recommend [this article](https://realpython.com/python-descriptors/) and [this video](https://www.youtube.com/watch?v=vBys0SwYvCQ).

Most of the time, though, you won't work with descriptors directly, but more often with properties. Properties are descriptors that are already provided for you in the standard library and are well described in [this article](https://realpython.com/python-property/).