# ***How to Write "for" in a More Pythonic Way***
---
### About:
The **for loops** are used when you have a block of code which you want to repeat a **fixed number of times**. It is a very commonly used and important structure.\
However, in this tutorial, we will use a more **pythonic** way. Use **as few "for" as possible** to achieve the same purpose.

## How a "for loop" works
Before we start explaining how to write code elegantly in a pythonic way, let's understand how "for loop" works first.

#### The following are some simple for loop code:

In [None]:
"""    List    """
lst = [0, 2, 4, 6, 8]
for item in lst:
    print(item)
# Intuitive.

"""    Dictionary    """
dic = {"a": 0, "b": 2, "c": 4, "d": 6, "e": 8}
for item in dic:
    print(item)
# Why can dictionary use the "for...in..." structure?

"""    File    """
with open("demo_text.txt", "r", encoding="utf8") as file:
    for item in file:
        print(item)
# WTF???

#### For this reason, we need to understand two nouns:
- [***iterable***](https://docs.python.org/3/glossary.html#term-iterable): An object capable of returning its members one at a time. (ex. list, str, tuple, dict, and file objects)
- [***iterator***](https://docs.python.org/3/glossary.html#term-iterator): An object representing a stream of data.

We can explain these two nouns in other words:
- *iterable*: Something that can be iterated over. (Need to **implement** ***\_\_iter__*** function)
- *iterator*: Something used to iterate. (Need to **implement** ***\_\_next__*** function)

#### Then we also need to understand two functions:
- [***\_\_iter__***](https://docs.python.org/3/library/stdtypes.html#iterator.__iter__): Returns an iterator object, and this object can call the \_\_next__ function
- [***\_\_next__***](https://docs.python.org/3/library/stdtypes.html#iterator.__next__): This function will actually execute the iteration behavior of the iterator object

We can combine the above content with the for...in... structure, so that we can understand what happened.

##### Let's do it with the for...in... structure first

In [None]:
# First we create an iterable. Here take list as an example
iterable = [0, 2, 4, 6, 8]

# Then use the for loop to automatically create an iterable for iterator. 
# Finally start iterating and assigning it to i
for i in iterable:
    print(i)

##### Next, use the iter and next functions we just learned to repeat the implementation once

In [None]:
# First we create an iterable. Here take list as an example
iterable = [0, 2, 4, 6, 8]

In [None]:
# Then we have to create an iterator on it.
"""
The iter() function calls python's preset __iter__ function and returns an iterator object.
Then we store it into the iterator variable.
"""
iterator = iter(iterable)

In [None]:
# Then we can start iterating. (Click "Execute cell" repeatedly, let's see what happens)
"""
The next() function will call the python preset __next__ function
"""
i = next(iterator)
print(i)

##### Yeah! We've done the iteration work manually like a for loop automatically iterates over an iterable!
And that's what the for...in... structure is really doing.

At this point, in addition to successfully learning the implementation of python's underlying **for loop**, we also learned what is **iterable and iterator**, and the role of the **\_\_iter__  and \_\_next__** function.

#### Next, let's talk about its application (**Advanced, beginners can ignore**)

##### Implement a linked list that can be used in for loop
In order for the linked list to be used by the for loop, we must manually implement the iter function of the linked list.

In [None]:
# Node iterator (Need to implement the next function)
class NodeIter():
    def __init__(self, node) -> None:
        self.current_node = node
    def __next__(self):
        if self.current_node is None:
            # No data given (to the end)
            raise StopIteration
        node, self.current_node = self.current_node, self.current_node.next
        return node

    def __iter__(self):
        return self

# Define node iterable (Need to implement the iter function)
class Node():
    def __init__(self, name) -> None:
        self.name = name
        self.next = None
    def __iter__(self) -> NodeIter:
        return NodeIter(self)

# Define relationships between nodes
node1 = Node("n1")
node2 = Node("n2")
node3 = Node("n3")
node4 = Node("n4")
node5 = Node("n5")
node1.next = node2
node2.next = node3
node3.next = node4
node4.next = node5

# main
for n in node1:
    print(n.name)

You may find me **implementing \_\_iter__ in NodeIter**.

It is not necessary to make iterator into iterable. However, **Python officially recommends that you implement it** to avoid some functions that require iterable from working properly.

A case is given below:

In [None]:
iterator = iter(node1)
# Start at node3
next(iterator)
next(iterator)

for n in iterator:  # The iterator must be iterable, otherwise it can't be put into a for loop
    print(n.name)

## Pythonic for writing
After understanding the working principle of the for loop, we can start to understand what is the more pythonic "for" writing.

The following codes will be given in the form of **cases**, **explained one by one**, and then **recommend the best way of writing** in my opinion.

#### Pre-work
There is a pre-work here. We need to use the timeit function to understand the time required for the code to work.

In [None]:
# Here are the test tools we need
from timeit import  timeit
from dis import dis

# The time required to run the function stmt 100,000 times
def time_required(stmt):
    print(f"Time to run: {timeit(stmt, number= 100000)} s")

#### Case 1: Build lists using iterator

In [None]:
def function1():
    lst = []
    for i in range(1000):
        lst.append(i)
    return lst

def function2():
    return [i for i in range(1000)]

def function3():
    return [*range(1000)]       # or list(range(1000))

time_required(function1)
time_required(function2)
time_required(function3)

##### In this case, both function2 and function3 are faster than function1.
Let's take a look at **function1** first:
1. It creates an empty list.
1. Then implement the iterator for the range, and use the for loop to take the value of the iterator and assign it to i.
1. Finally **find & call the append function** to put i into the list.

This is completely correct way of writing. However, since calling functions in python comes at a cost (need to find and call this function), this is not a good enough way to write.

Let's take a look at **function2**:\
This style of coding is called **list comprehension**. Its full form is:\
*\[**expression** for item in **iterable** if **condition**\]*

It consists of **three parts**:
1. ***iterable***: The iterable can be any iterable object. (After each iteration, the current iteration value will be passed to item.)
1. ***condition***: The condition is like a filter that only accepts the items that evaluate to True.
1. ***expression***: The expression indicates how you want the item to be passed to the list in the end.

Since it consists of a given form, python no longer needs to spend time looking for the append function in the list. Instead, just call it directly, so it saves a lot of time.

If we change the \[ \] to \( \) and without any **condition**, we call this object a **generator**. You can understand it as a special iterator. Its form is:\
\(**expression** for item in **iterable**\)


Finally, let's look at **function3**:\
It unpacks iterables directly using the *iterable syntax. It takes minimal time without any additional overhead.

##### ***So, should I always use the function3 way of writing?***
The answer is **no**. Since the operation of function3 is directly unpacked, it means that we have no way to operate on the value when extracting the iterated value. Therefore, unless generating a fixed and equally spaced list, **function2 is still the best choice**.

#### Case 2: Built-in function

In [None]:
# Take finding the smallest number in the list as an example
lst = [*range(1000,0,-1)]

def homemade_fn():
    min_val = lst[0]
    for i in lst:
        if min_val > i:
            min_val = i
    return min_val

def builtin_fn():
    return min(lst)

time_required(homemade_fn)
time_required(builtin_fn)

##### In this case, builtin_fn is faster than homemade_fn.
Take finding the minimum value in a list as an example, the only way is to compare the list values one by one. Therefore, the code of both builtin_fn and homemade_fn is **already the simplest**.

The main reason for the time difference between the homemade function and the built-in function is due to **the overhead brought by the bytecode** when running the python code.

We can view the CPython bytecode of the two functions through ***dis*** module

In [None]:
print("homemade_fn bytecode:")
dis(homemade_fn)
print("builtin_fn bytecode:")
dis(builtin_fn)

In this way, it will be clear at a glance. The overhead of builtin_fn is indeed much less than that of homemade_fn, so the speed is of course faster.

##### ***So, should I use built-in functions instead of making my own functions as much as possible?***
**Of course!** Although the built-in functions still need to extract objects one by one from python, they are all **implemented at C level in terms of loops, variable operations, and variable storage**, which can ***save more resources and time***.

#### Case 3:  Any & All

In [None]:
# Look for an item or condition in a list and return true if present
lst = [*range(0,3001, 5)]

def myfind():
    for i in lst:
        if i > 2048:
            return True
    return False

def any_find():
    return any(i > 2048 for i in lst)

time_required(myfind)
time_required(any_find)

In [None]:
# Look for an item or condition in a list and return false if present
lst = [*range(0,3001, 5)]

def myfind():
    for i in lst:
        if i > 2048:
            return False
    return True

def all_find():
    return all(i <= 2048 for i in lst)

time_required(myfind)
time_required(all_find)

##### In this case, both any_find and all_find are **slower** than myfind.
Could this be a Python bug?\
Of course not, the reason for the slowness is that the expression (ie **generator**) in any or all is **run at the python level**. Function myfind has much less to do than using generator like **creating a generator and calling a generator**. Thus, myfind is of course faster than any_find and all_find.

Also, when using "any" and "all", **generators should be passed** instead of lists. Otherwise python needs to be reconverted, again an overhead.

In fact, **without generator**, any and all are faster than for loop. Here is an example:

In [None]:
# Look for an item which is false and return true if present
lst = [True] * 3000

def myfind():
    for i in lst:
        if not i:
            return False
    return True

def all_find():
    return all(lst)

time_required(myfind)
time_required(all_find)

##### ***So should I still use the built-in "any" or "all" functions?***
**It's up to you**, but I still **recommend you use the "all" and "any"** functions. Its code is more readable than using a for loop, and it is more concise to write.

#### Case 4: Build lists using filter

In [None]:
# Find and return all items in the list less than 300
lst = [*range(1000)]

def is_smaller_than_300(num: int|float) -> bool:
    return num < 300

def function1():
    result = []
    for i in lst:
        if is_smaller_than_300(i):
            result.append(i)
    return result

def function2():
    return [i for i in lst if is_smaller_than_300(i)]

def function3():
    return [*filter(is_smaller_than_300, lst)]

time_required(function1)
time_required(function2)
time_required(function3)

##### In this case, both function2 and function3 are faster than function1.
We can compare this case with case 1. Here is just **adding a condition** of judging that the item is less than 300 on the basis of case 1.

Thus, we skip the explanation of function1 and function2.

Let's look at **function3**:\
Here we use a **filter** object. Its function is to take out all the values of lst, judge it by is_smaller_than_300, and **return a generator**.

Since the **generator is evaluated passively**, the speed of the filter is actually **very fast**. That is, most of the overhead of function3 is spent on unpacking. Let's take a look at the performance of the **filter without unpacking**:


In [None]:
lst = [*range(1000)]
def filter_only():
    return filter(is_smaller_than_300, lst)

Now you should know how fast the filter is!\
In fact, most of the time we just need an **iterator** instead of a list. At this time, the powerful advantages of the **filter** are reflected.

In addition, if the **first parameter** passed in by the filter is **None**, the filter will **remove** all items in the list whose evaluate is **False** by default.

##### ***So, should I use filter instead of list comprehension as much as possible?***
If what you need is to return a **list**, then **it's still up to you**. It depends on which code you are used to and the readability of the code. However, when you still need to **iterate** next, please consider using **filter**.

#### Case 5: zip

In [None]:
lst = [*range(1, 1000)]
lst2 = [*range(999, 0, -1)]

def myzip_fn():
    result = []
    for i in range(len(lst)):
        result.append((lst[i], lst2[i]))
    return result

def zip_fn():
    return [*zip(lst, lst2)]

time_required(myzip_fn)
time_required(zip_fn)


##### At some cases, we want to **zip two iterables together**, as shown above. 
In this case, using **zip** object is **several times faster** than **implementing zip yourself**.

It is worth mentioning that **zip** still returns a **generator**, which means that like case 4, the **overhead of zip** itself is almost **non-existent**.

##### ***So, should I use zip if possible in this case?***
**Of course!** Regardless of the level of code writing, time-consuming or overhead,  zip itself is the **optimal solution**. So in this case you **should use zip** anyway.
