# MSDM5051 Tutorial 10 - More Utilities in Python

## Contents

1. Iterable & Iterator
2. Functional Programming

---
# 1. Iterable & Iterator

## 1.1. Iterator

Regardness of programming language, an **iterator** is defined as an abstract object that contains a *next()* method and a *done()* method:

- Each time *next()* is called, an item will be returned. The order of returning the item follows a certain sequence.
- When there are no more item left in the sequence, *done()* will return `True`.

And the sequence of the items is called an **iterable**. We can imagine that an iterator can be used be like:

```
while not iterator.done():
    item_in_iterable = iterator.next()
    
    # ... (do something with the item_in_iterable)
```

Basically anything in Python that are collection-like are iterables. For example, strings, tuples, and lists can all be iterated because they are obviously containing items that follow a certain order; dictionary and set can also be iterated, although their iteration order are hidden behind the hashing. 

On the other hand, an iterator can be thought as an iterable with an additional position parameter - which tells you up to which position the items have been processed and those beyond have not. To convert an iterable into an iterator, we can use the `iter()` function. 

In [1]:
print(iter("Python"))
print(iter([1,2,3]))
print(iter({"1":"a", "2":"b", "3":"c"}))

<str_ascii_iterator object at 0x00000238BA3F7520>
<list_iterator object at 0x00000238BA3F7520>
<dict_keyiterator object at 0x00000238B8EFEF20>


Python has a [protocol](https://www.pythonmorsels.com/iterator-protocol/#the-iterator-protocol) guiding how a iterator should run in Python:

- The *next()* method can be operated on the iterator to fetch the next item. 
- There is no *done()*. Rather it should raise an `StopIteration` exception (i.e. error) to notify that the iterator has completed.

which make the usage look like:

In [2]:
def do_iteration_over(iterator):
    while True:
        try:                                                 # try-except statements are used to handle exception (i.e. error) in Python
            item_in_iterable = next(iterator)

            # ... (do something with the item_in_iterable)

        except StopIteration:                                # things inside except will run when a specific type of error is found.
                                                             # In this case, it is looking for the type of error called "StopIteration" 
            break
            

But this syntax is obviously over-complicated in order to "do something with the `item_in_iterable`". Actually, there is a much simpler syntax that we have known since day 1 of learning Python:

In [None]:
for item in iterable:
    
    # ... (do something with the item)
    

That is, in a [`for` loop in Python](https://docs.python.org/3/tutorial/classes.html#iterators), what the code does is in fact first convert the iterable into an iterator object, then keep calling `next()` to fetch the items until it reaches the `StopIteration` exception. This is different from those lower level language like C++, which their `for` loop are simply retrieving items from the memory, according to increment/decrement of index.

But why bother creating the whole `iterator` class as a replacement to the `for` loop in lower level language? This is because we want more control in the iteration process. When a `for` loop is run, the program will first load ALL items in the iterable into the memory before examine and process each of the item. Problems occur when the items are hugh in size - for example when you are doing an image processing task of over 10000 images of a few MB - it is simply impossible to load them all in the memory at the same time.

So it comes to the concept of iterator. In a scenario **when the items can be processed independently**, it is much more memory efficient if we can load one item, process it, and unload it before we load the next item. By having a `next()` method, we can decide when to load the next item. 


In [3]:
# removing the while loop, so item will only be processed when the iterator is told to (and you can control it with another program)

def do_iteration_over(iterator):
    try:                                                 
        item_in_iterable = next(iterator)

        # ... (do something with the item_in_iterable)
        print("processing item", item_in_iterable)

    except StopIteration:
        
        # ... (what to do when no more item are left)
        print("all items have been processed")
        

In [4]:
my_iterable = [3,6,9]

my_iterator = iter(my_iterable)

do_iteration_over(my_iterator)
do_iteration_over(my_iterator)
do_iteration_over(my_iterator)
do_iteration_over(my_iterator)

processing item 3
processing item 6
processing item 9
all items have been processed


## 1.2. Generator

Then it comes to the generalization to "generator". Note that the function `iter()` can only convert an iterable into an `iterator` object directly, but it is more handy if we preprocess the iterable to get the real items we need. For example, we can store the paths of files in a list, and tell the iterator to read the corresponding file data of that path in the memory. 

Python calls these kinds of iterators with preprocessing as `generator`. The syntax to create a `generator` object is almost the same as writing a normal function, but replacing the keyword `return` to `yield`.


In [5]:
def my_generator_func(my_iterable):
    for item in my_iterable:
        
        value_after_process = item**2+1
        
        yield value_after_process              # use yield instead of return

In [6]:
print(my_generator_func)                       # generator function without inputting the iterable = still a normal function object

my_generator = my_generator_func([3,6,9])      # applying generator function on iterable = generator object 
print(my_generator)                            # = iterable + preprocessing + next() method

my_iterator = iter([3,6,9])                    # apply iter() on iterable = iterator object 
print(my_iterator)                             # = iterable + next() method, but no preprocessing


<function my_generator_func at 0x00000238BB85E0C0>
<generator object my_generator_func at 0x00000238BB8529B0>
<list_iterator object at 0x00000238BB863BB0>


In [7]:
do_iteration_over(my_generator)
do_iteration_over(my_generator)
do_iteration_over(my_generator)
do_iteration_over(my_generator)

processing item 10
processing item 37
processing item 82
all items have been processed


## 1.3. Comprehension

Comprehension is a *syntax* native to Python, which is for converting an iterable into another iterable of related items within one line of code. You may have already seen them before, for example, converting a list into another list, set, or dictionary:

In [8]:
# The list [1,2,3] is converted in to [1,4,9]

num_list = [i**2 for i in [1,2,3]]        # use [] for list
print(num_list)

num_set = {i**2 for i in [1,2,3]}          # use {} for set or dict
print(num_set)

num_dict = {i:i**2 for i in [1,2,3]}
print(num_dict)

[1, 4, 9]
{1, 4, 9}
{1: 1, 2: 4, 3: 9}


The comprehension syntax is in fact preferred rather than normal for loops because they are more compact, optimized and are designed to avoid many unnecessary computational overheads during the construction of the new list/set/dict. We may compare the required time for same operations using both syntax:

In [9]:
def comprehension():
    return sum([i * i for i in range(100000)])

def loop():
    res = [None]*100000
    for i in range(100000):
        res[i] = (i**2)
    return sum(res)

%timeit comprehension()
%timeit loop()

10.4 ms ± 272 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.8 ms ± 850 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


However since everything has to be compacted into one line, the kinds of operations we can use comprehension for is limited. We usually define them into two kinds: 

- *Map* - Each item in the original iterable is mapped to its corresponding item in the new iterable. 
- *Filter* - Each item is checked with a condition, and only those returning `True` are put into the new iterable.

And of course you can do both.

In [10]:
by_map = [i**2 for i in [1,2,3,4,5,6,7,8]]
print(by_map)

by_filter = [i for i in [1,2,3,4,5,6,7,8] if i > 3]
print(by_filter)

by_map_and_filter = [i**2 for i in [1,2,3,4,5,6,7,8] if i > 3]
print(by_map_and_filter)

[1, 4, 9, 16, 25, 36, 49, 64]
[4, 5, 6, 7, 8]
[16, 25, 36, 49, 64]


You can also use the comprehension syntax to create a `generator` object by simply replacing `[]` or `{}` with `()`. In this usage the comprehension syntax is called a "generator expression".

In [11]:
generate_by_map_and_filter = (i**2 for i in [1,2,3,4,5,6,7,8] if i > 3)
print(generate_by_map_and_filter)

<generator object <genexpr> at 0x00000238BB8535E0>


---
# 2. Functional Programming

Remember that everything in Python is in fact class and objects? This applies to normal functions as well - we can treat each function as an object and pass it as an argument to another function. 

## 2.1. Built-in function-of-functions

We can find some simple examples of built-in function in Python that require an input of another function:

1. `map()` - Apply the function to each item in an iterable and map them to items in the new iterable.

2. `filter()` - Check each item in an iterable with a function, only those returning `True` are put into the new iterable.

You can think of them as an alternative syntax to using comprehension. But `map` and `filter` will return an `iterator` object instead of a specific iterable type. 


In [12]:
def mapping_function(x):
    return x**2

def filter_function(x):
    return x>3

In [13]:
by_map1 = map(mapping_function, [1,2,3,4,5,6,7,8])
print(by_map1, "-> convert to list ->", list(by_map1))

by_map2 = [mapping_function(i) for i in [1,2,3,4,5,6,7,8]]
print(by_map2)

#################################################
print("\n")

by_filter1 = filter(filter_function, [1,2,3,4,5,6,7,8])
print(by_filter1, "-> convert to list ->", list(by_filter1))

by_filter2 = [i for i in [1,2,3,4,5,6,7,8] if filter_function(i)]
print(by_filter2)

<map object at 0x00000238BB875330> -> convert to list -> [1, 4, 9, 16, 25, 36, 49, 64]
[1, 4, 9, 16, 25, 36, 49, 64]


<filter object at 0x00000238BB875570> -> convert to list -> [4, 5, 6, 7, 8]
[4, 5, 6, 7, 8]


## 2.2. `lambda` function 

The lambda function is also known as the anonymous function in Python. The syntax generally follow:

```python
# create a function object f that can be used later
f = lambda parameter: expression_to_compute

# compute the function over the given input arguments immediately
(lambda parameter: expression_to_compute)(input_arguments)
```

For example,

In [14]:
f1 = lambda x,y: x**2+y         # f1 is assigned to be refering the lambda function

print(f1)         
print(f1(2,3))   # and can be used to compute something later

print((lambda x,y: x**2+y)(2,3))       # equivalently, if input are supplied, the results are immediately computed


<function <lambda> at 0x00000238BB85FB00>
7
7


In [15]:
# See what will you get if you use an equivalently regular function

def f(x,y):
    return x**2+y

f2 = f            # f2 is assigned to be refering the regular function

print(f2)
print(f2(2,3))

<function f at 0x00000238BB85FD80>
7


We can see that the biggest difference between lambda function and regular function is that lambda function has no name (`<lambda>` vs `f` in the object name). This means that a lambda function cannot be called again if we do not make a reference to it with other variable when it is declared. 

So in what situation would we use a lambda function? Some people find them convenient when they need a one-used short function to be passed as an argument. For example,


In [16]:
def what_to_do(condition):
    if condition==True:
        return lambda x: x*2            # We are pretty sure that we won't call this function outside of what_to_do()
                                        # Also the function is too short. Don't want to write a whole def f(...) block of code just for this
    else:
        return lambda x: x/2

#########################################################
my_condition = False
action = what_to_do(my_condition)       # use my_condition to determine which function to be the next action
                                        # but without really executing it

print(action(15))            # then we can save this function for later use
print(action(24))


7.5
12.0


Just be remind that you can always go back to the `def` syntax to define a function, if you feel unfamiliar to use the `lambda` syntax.

## 2.3. Decorator

Decorator is the fancy name describing a "function acting on the original function that will return a modified one", and sometimes people may refer it as a wrapper function, function modifier, etc.. That is, we can use decorator to add some extra funtionality on our original function. And by putting these additions in a separated function from the original function, they may be reused in some other code as well. For example,

In [17]:
# define the decorator function
def decorate_this_function(func):
    
    def f(*args, **kwargs):                                     # define how the original function should be modified
                                                                # *args and **kwargs represent the input of the original function
        
        #####################################
        print("before ", func, "starts")                        # for example, here modifies the original function by printing 2 lines 
        func(*args, **kwargs)                                   # before and after the original function runs
        print("after", func, "finished")
        #####################################

    return f                                                    # return the modified version

In [18]:
# These two expressions are equivalent

###########################################################
# expression 1 - add the decorator using @ before the original function 
@decorate_this_function
def my_func1(user):
    print(user, "is running the function")

    
###########################################################
# expression 2 - execute the decorator over the original function 
def my_func2(user):
    print(user, "is messing with the function")

my_func2 = decorate_this_function(my_func2)

###########################################################
my_func1("Tom")
print("\n")
my_func2("Eric")

before  <function my_func1 at 0x00000238BB894860> starts
Tom is running the function
after <function my_func1 at 0x00000238BB894860> finished


before  <function my_func2 at 0x00000238BB8949A0> starts
Eric is messing with the function
after <function my_func2 at 0x00000238BB8949A0> finished


We have already seen a use of decorator in dynamic programming previously. Remember the `lru_cache` decorator from `functools` module:

```python
from functools import lru_cache

@lru_cache(max_size=None)
def my_recursive_func(input1, input2, input3):
    # ...
```

You can imagine what it does is somewhat like replacing the two `print()` lines in the above example to be
- `print(before ", func, "starts")` $\rightarrow$ "creating a dictionary" + "check if the corresponding output is already present in the queue".
- `print("after", func, "finished")` $\rightarrow$ "save the value to the queue".

Decorators are used extensively in advanced Python projects, like in building web server, doing parallel programming tasks, etc, you will see them everywhere. But the idea is similar - they contain the codes which define the preprocessing or after-processing tasks. The use of decorator can help us separate the main task from these peripheral tasks, making our code easier to manage.
