# Chapter 6: Iteration by Looping

## TL;DR

The `while` and `for` statements are an alternative way (as compared to recursion) to run a code block repeatedly and, thus, control the flow of execution. They are (often) easier to comprehend. However, it adds no new expressive power as compared to just using the `def` and `if` statements in a recursion.

## The `while` Statement

Whereas functions combined with `if` statements suffice to model any type of iteration, Python comes with a **compound while-statement** that consists of a header line with a boolean expression followed by an indented code block (= body). Before the first and after every execution of the code block, the boolean expression is evaluated and if equal to `True`, the code block runs (again). Eventually, some variable referenced in the boolean expression is changed in the code block such that the condition becomes `False`. If the condition is `False` before the first iteration, the entire code block is never executed. As the flow of control keeps looping back to the beginning of the code block, this concept is also called a **while-loop**.

#### Simple Example (revisited)

Let's rewrite the previous simple `countdown` example.

In [1]:
def countdown(n):
    """Print a countdown until the party starts.

    Args:
        n (int): seconds until the party begins
    """
    while n > 0:
        print(n)
        n -= 1
    # = base case
    print("Happy new Year!")

In [2]:
countdown(3)

3
2
1
Happy new Year!


As the stack diagram in [PythonTutor](http://pythontutor.com/visualize.html#code=def%20countdown%28n%29%3A%0A%20%20%20%20while%20n%20%3E%200%3A%0A%20%20%20%20%20%20%20%20print%28n%29%0A%20%20%20%20%20%20%20%20n%20%3D%20n%20-%201%0A%20%20%20%20print%28%22Happy%20new%20Year!%22%29%0A%0Acountdown%283%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows, there is a subtle difference in the way a `while` statement is treated in memory. In short, `while` statements can not run into a `RecursionError`. In common day-to-day applications this difference is, however, not important.

#### "Still involved" Example: [Euclid's Algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) (revisited)

Finding the greatest common divisor of two numbers is still not so obvious when using a `while` loop.

In [3]:
def gcd(a, b):
    """Calculate the greatest common divisor of two numbers.

    Args:
        a (int): first number
        b (int): second number

    Returns:
        gcd (int)
    """
    while a != b:
        if a > b:
           a -= b
        else:
           b -= a
    return a

In [4]:
gcd(12, 4)

4

##### Efficiency of Algorithms (revisited)

We can also see that this implementation seems way *less* efficient than its recursive counterpart.

In [5]:
%%timeit -n 1 -r 1
gcd(112233445566778899, 987654321)

4.81 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## Infinite Loops

As with recursion, we must ensure that the iteration process ends. For the above `countdown` example this is trivially true as we start with an arbitrary number that gets decremented by $1$ until it is not positive any more.

#### "Mystery" Example: [Collatz Conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture)

Let's play the following game:
- think of any positive integer $n$
- if $n$ is even, the next $n$ is half the old $n$
- if $n$ is odd, the multiply the old $n$ by $3$ and add $1$
- repeat these steps until we reach $1$

**Do we always reach the final $1$?**

The function below implements this game. Does it always reach $1$? No one has proven it so far!

In [6]:
def collatz(n):
    """Print a Collatz sequence in descending order.

    Start with any positive integer n.
    Then each term is obtained from the previous term as follows:
        - if the previous term is even,
          the next term is half the previous term
        - if the previous term is odd,
          the next term is 3 times the previous term plus 1
    The conjecture is that no matter what is the value of n,
    the sequence will always reach 1.

    Args:
        n (int): a positive number to start the Collatz sequence at
    """
    while n != 1:
        print(n, end=" ")
        # n is even
        if n % 2 == 0:
            n = n // 2  # // used so that n remains an integer (vs. a float)
        # n is odd
        else:
            n = 3 * n + 1
    print(n)

In [7]:
collatz(100)

100 50 25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1


In [8]:
collatz(1000)

1000 500 250 125 376 188 94 47 142 71 214 107 322 161 484 242 121 364 182 91 274 137 412 206 103 310 155 466 233 700 350 175 526 263 790 395 1186 593 1780 890 445 1336 668 334 167 502 251 754 377 1132 566 283 850 425 1276 638 319 958 479 1438 719 2158 1079 3238 1619 4858 2429 7288 3644 1822 911 2734 1367 4102 2051 6154 3077 9232 4616 2308 1154 577 1732 866 433 1300 650 325 976 488 244 122 61 184 92 46 23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1


In [9]:
collatz(10000)

10000 5000 2500 1250 625 1876 938 469 1408 704 352 176 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1


In [10]:
collatz(100000)

100000 50000 25000 12500 6250 3125 9376 4688 2344 1172 586 293 880 440 220 110 55 166 83 250 125 376 188 94 47 142 71 214 107 322 161 484 242 121 364 182 91 274 137 412 206 103 310 155 466 233 700 350 175 526 263 790 395 1186 593 1780 890 445 1336 668 334 167 502 251 754 377 1132 566 283 850 425 1276 638 319 958 479 1438 719 2158 1079 3238 1619 4858 2429 7288 3644 1822 911 2734 1367 4102 2051 6154 3077 9232 4616 2308 1154 577 1732 866 433 1300 650 325 976 488 244 122 61 184 92 46 23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1


## The `for` Statement

Python provides a shortcut for the following very common pattern where a temporary "index" variable $i$ needs to be kept track of. The `for` statement loops over a sequence of objects. The `for` statement is really only what is called **syntactic sugar**, i.e., something that adds no new functionality but conveniently replaces a "tedious" pattern.

In [11]:
i = 0
while i < 10:
    print(i, end=" ")
    i += 1

0 1 2 3 4 5 6 7 8 9 

For sequences of integers the built-in function [range()](https://docs.python.org/3/library/functions.html#func-range) is useful: It creates a "list"-like object (just like `nums` in previous notebooks). At the beginning of each loop iteration the variable `i` is assigned to the next object in the list.

In [12]:
for i in range(10):
    print(i, end=" ")

0 1 2 3 4 5 6 7 8 9 

### Containers & Iterables

As we have seen before, looping works naturally on "container"-like objects like lists. **Containers** are any objects that are composed of other objects and also "manage" how these objects are organized. Lists, for example, have the propterty that they remember an order associated with its elements. There are, however, many other container types in Python as we will see in coming chapters.

**Iterables** are a similar but different concept. Any object that we can "loop over", is by definition an iterable.

Therefore, neither a container nor an iterable are concrete data types like `int`, `float`, or `str`. Quite the contrary is true: they are **abstract** concepts that may or may not be implemented by any concrete data type.

Lists like `first_names` below are both, they are iterable containers (they are actually even more than that as we will see in the chapter on lists).

In [13]:
first_names = ["Achim", "Berthold", "Carl", "Diedrich", "Eckardt"]

The characteristic operator associated with a container is the `in` operator which checks if a given object evaluates equal to any of the objects in the container. In layman's terms, it checks if an object is "contained" in the container and this operation is also called **membership testing**.

In [14]:
"Achim" in first_names

True

In [15]:
"Alexander" in first_names

False

Similarily, the characteristic operation of an iterable is that it supports iteration via the `for ... in iterable:` syntax like the example below, where `name` is assigned the elements of the list `first_names` one by one in the same order as they occur in the list itself.

In [16]:
for name in first_names:
    print(name, end="   ")

Achim   Berthold   Carl   Diedrich   Eckardt   

If we need to have an index variable in the loop's body, we can use the built-in function [enumerate()](https://docs.python.org/3/library/functions.html#enumerate).

In [17]:
for i, name in enumerate(first_names):
    print(i, name, sep=" > ", end="   ")

0 > Achim   1 > Berthold   2 > Carl   3 > Diedrich   4 > Eckardt   

The built-in function [zip()](https://docs.python.org/3/library/functions.html#zip) allows us to combine the elements of two or more iterables in a pairwise fashion.

The example below illustrates this:

In [18]:
last_names = ["Müller", "Meyer", "Mayer", "Schmitt", "Schmidt"]

In [19]:
for first_name, last_name in zip(first_names, last_names):
    print(first_name, last_name)

Achim Müller
Berthold Meyer
Carl Mayer
Diedrich Schmitt
Eckardt Schmidt


#### "Hard at first Glance" Example: [Fibonacci Numbers](https://en.wikipedia.org/wiki/Fibonacci_number) (revisited)

In contrast to its recursive relative, the `fibonacci()` function below is somewhat harder to read. However, one advantage of calculating the Fibonacci numbers with a `for` statement is that we could list the entire sequence in ascending order. Note that we do not even need the index variable in the `for` loop (that is what the underscore "\_" indicates).

Here are the first 13 Fibonacci Numbers again:

$0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144$

In [20]:
def fibonacci(i):
    """Calculate the ith Fibonacci number.

    Args:
        i (int): index of the Fibonacci number to calculate

    Returns:
        ith_fibonacci (int)
    """
    a = 0
    b = 1
    for _ in range(i):  # a underscore "_" indicates that we do not need the loop's index variable
        print(a, end=" ")  # line added for didactical purposes
        temp = a + b  # in the chapter on tuples we will see how to avoid using a temporary variable
        a = b
        b = temp
    print(a, end=" ")  # line added for didactical purposes
    return a

In [21]:
fibonacci(12)  # = 13th number

0 1 1 2 3 5 8 13 21 34 55 89 144 

144

##### Efficiency of Algorithms (continued)

Another more important advantage is that now we can calculate even big Fibonacci numbers *very efficiently*.

In [22]:
fibonacci(99)  # = 100th number

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 225851433717 365435296162 591286729879 956722026041 1548008755920 2504730781961 4052739537881 6557470319842 10610209857723 17167680177565 27777890035288 44945570212853 72723460248141 117669030460994 190392490709135 308061521170129 498454011879264 806515533049393 1304969544928657 2111485077978050 3416454622906707 5527939700884757 8944394323791464 14472334024676221 23416728348467685 37889062373143906 61305790721611591 99194853094755497 160500643816367088 259695496911122585 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189 2880067194370816120 4660046610375530309 754011380474634642

218922995834555169026

#### Easy Example: [Factorial](https://en.wikipedia.org/wiki/Factorial) (revisited)

One advantage of calculating the factorial with a `for` statement is that we could track the intermediate result as it "grows" (note that the [range()](https://docs.python.org/3/library/functions.html#func-range) function takes optional *start* and *step* arguments).

In [23]:
def factorial(n):
    """Calculate the factorial of a number.

    Args:
        n (int): number to calculate the factorial of, must be positive

    Returns:
        factorial (int)

    Raises:
        TypeError: if n is not an integer type
        ValueError: if n is negative
    """
    if not isinstance(n, int):
        raise TypeError("Factorial is only defined for integers.")
    elif n < 0:
        raise ValueError("Factorial is not defined for negative integers.")
    result = 1  # because 0! = 1
    for i in range(1, n + 1):  # loop starts at 1 as 0! is already covered
        result = result * i
        print(result, end=" ")  # line added for didactical purposes
    return result

In [24]:
factorial(10)

1 2 6 24 120 720 5040 40320 362880 3628800 

3628800

## More Flow Control

### The `break` Statement

Let's say we have a list of names and we think that some famous person by the name of "Marlene Dietrich" is on it somewhere. And maybe some other famous person by the name of "Helene Fischer" could be on it as well. We would like our little program to print whether or not a famous person is on the list.

In [25]:
people = ["Lisa Müller", "Helma Meyer", "Marlene Dietrich", "Anne Schneider", "Berta Becker"]

In order to check if one or both are in the list, we iterate over it and check for each element if the name matches.

A first naive implementation could look like this.

In [26]:
star = "Marlene Dietrich"

on_list = False

for person in people:
    print(person, end="   ")  # line added for didactical purposes
    if person == star:
        on_list = True

if on_list:
    print("Found:", star)
else:
    print("Did not find:", star)

Lisa Müller   Helma Meyer   Marlene Dietrich   Anne Schneider   Berta Becker   Found: Marlene Dietrich


This implementation is rather inefficient as even if the star is at the beginning of the list, we will iterate until the very end, which could take very long for a big list. Also, we need to write an `if`-`else` logic seperate from the `for` loop to check for the final result.

Python provides a `break` statement that let's us stop the `for` loop at any point we want.

In [27]:
star = "Marlene Dietrich"

on_list = False

for person in people:
    print(person, end="   ")  # line added for didactical purposes
    if person == star:
        on_list = True
        break

if on_list:
    print("Found:", star)
else:
    print("Did not find:", star)

Lisa Müller   Helma Meyer   Marlene Dietrich   Found: Marlene Dietrich


This is a computational improvement. However, this code is a bit "weird" in the sense that the `on_list` has to be initialized before the `for` loop.

### The `for`-`else` Clause

To express the logic in a prettier way, we can add an `else` clause at the end of the `for` loop, which is only executed if the body in the `for` branch is *not* stopped before reaching the last element of the iterable. This way, the "expressive" power of our code increases. Not many other programming languages support this `for`-`else` branching, which turns out to be very useful in practice.

In [28]:
star = "Marlene Dietrich"

for person in people:
    print(person, end="   ")  # line added for didactical purposes
    if person == star:
        on_list = True
        break
else:
    on_list = True

if on_list:
    print("Found:", star)
else:
    print("Did not find:", star)

Lisa Müller   Helma Meyer   Marlene Dietrich   Found: Marlene Dietrich


Now, we can incorporate the `if`-`else` logic into the `for` loop very easily and avoid the `on_list` variable alltogether.

In [29]:
star = "Marlene Dietrich"

for person in people:
    if person == star:
        print("Found:", star)
        break
else:
    print("Did not find:", star)

Found: Marlene Dietrich


Of course, if we search for "Helene Fischer" we have to iterate to the very end. There is no way to optimize this **[linear search](https://en.wikipedia.org/wiki/Linear_search)**, at least as long as we model the list of names with a Python `list`. More advanced data types, however, exist that help mitigate that downside.

In [30]:
star = "Helene Fischer"

for person in people:
    if person == star:
        print("Found:", star)
        break
else:
    print("Did not find:", star)

Did not find: Helene Fischer


### The `continue` Statement

Often times we are given some iterable of numeric data (like a list of integers or a CSV file with many rows and columns) and then need to perform some operation on each element after possibly throwing away some of the data we do not want in our calculations (e.g., think of statistical outliers). `for` loops combined with `if` statements seem like a natural way to tackle such problems.

Take the following computational task as an example:

#### Example: One Simple Filter

Calculate the sum of all even and transformed numbers in the list of integers from $1$ through $12$:
- **"all"**: iterate from beginning to end of a list
- **"even"**: filter out the odd numbers
- **"transformed"**: change each even number according to, for example, $y := x^2 + 1$
- **"sum"**: add up the individual transformed numbers

In [31]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

In [32]:
total = 0

for x in numbers:
    # Only keep with even numbers.
    if x % 2 == 0:
        y = (x ** 2) + 1
        print(x, y, sep=" > ", end="    ")  # line added for didactical purposes
        total += y

total

2 > 5    4 > 17    6 > 37    8 > 65    10 > 101    12 > 145    

370

The above code is still easily readable as it only involves two levels of indentation. In general, code gets harder to comprehend the more "horizontal space" it occupies. It is commonly considered good practice to grow a program "vertically" rather than "horizontally".

Consider the next example, whose implementation in code already starts to look "weird".

#### Example: "Nested" Filters

Calculate the sum of every third number (after a transformation) in a list of integers from $1$ through $12$ but only if the number is even:
- **"every third"**: iterate from beginning to end of a list and look at each number's list index
- **"even"**: filter out the odd numbers
- **"transformation"**: change each even number according to, for example, $y := x^2 + 1$
- **"sum"**: add up the individual transformed numbers

In [33]:
total = 0

for i, x in enumerate(numbers, start=1):
    # Only work with every third list element.
    if i % 3 == 0:
        # Only work with even numbers.
        if x % 2 == 0:
            y = (x ** 2) + 1
            print(x, y, sep=" > ", end="    ")  # line added for didactical purposes 
            total += y

total

6 > 37    12 > 145    

182

Of course, one could object and say that we could combine the two `if` statements with the `and` operator as shown in the chapter on conditionals. Then we trade off less horizontal space with a more "complex" `if` logic. An alternative to that is to use Python's `continue` statement which causes a loop to jump back to its top and right into the next iteration skipping the rest of the code block.

See the code fragement below that occupies more vertical space.

In [34]:
total = 0

for i, x in enumerate(numbers, start=1):
    # Only work with every third list element.
    if i % 3 != 0:
        continue
    # Only work with even numbers.
    elif x % 2 != 0:
        continue

    y = (x ** 2) + 1
    print(x, y, sep=" > ", end="    ")  # line added for didactical purposes 
    total += y

total

6 > 37    12 > 145    

182

This is yet another illustration of why programming is an art. The two preceeding code fragments do exactly the same with identical time complexity. However, argubly the latter it a lot easier to read for a human, at least when the business logic grow beyond more than just two filters.

The idea behind the `continue` statement is conceptually similar to the early exit pattern we have seen in the context of functions.

The two examples can be modeled in an even better way as we will see in the chapter on lists and, in particular, in the section on the map, filter, and reduce paradigm.

Both, the `break` and the `continue` statements, and the optional `else` clause are not only supported within `for` loops but also whenever a `while` loop is used.

### Indefinite Loops

Often times, we find ourselves in situations where we do not (and can not) know ahead of time how often or until which point (i.e., a certain condition is fulfilled) a code block is to be executed.

#### Example: Guessing a Coin Toss

Let's consider the following "game" where we randomly choose a variable to be either "Heads" or "Tails" and a user of our program has to guess it in advance.

Python provides the built-in function [input()](https://docs.python.org/3/library/functions.html#input) that prints a message to the user and reads in a response as a simple `str` object. We use that to process "dynamic" (= unpredictable) input to our program. Further, we use the [random()](https://docs.python.org/3/library/random.html#random.random) function in the [random](https://docs.python.org/3/library/random.html) module to model a coin toss.

A popular pattern to approach such unpredictable loops is to go with a `while True` loop which on its own would run forever. Then, once a certain event occurs, we `break` out of the loop.

Let's look at a first naive implementation.

In [35]:
import random

In [36]:
random.seed(42)

In [37]:
while True:
    guess = input("Guess if the coin comes up as heads or tails: ")

    if random.random() > 0.5:
        if guess == "heads":
            print("Yes, it was heads")
            break
        else:
            print("Ooops, it was heads")
    else:
        if guess == "tails":
            print("Yes, it was tails")
            break
        else:
            print("Ooops, it was tails")

Guess if the coin comes up as heads or tails: Heads
Ooops, it was heads
Guess if the coin comes up as heads or tails: heads
Ooops, it was tails
Guess if the coin comes up as heads or tails: tails
Yes, it was tails


This version has several aspects where we can improve on. First, if a user enters something other than "heads" or "tails" (e.g., "Heads" or "Tails"), the program keeps running without the user knowing about his mistake. Second, it intermingles the coin tossing with the comparison against the user's input. Such a mixing of unrelated business logic makes a program harder to maintain in the long run.

#### Example: Guessing a Coin Toss (revisited)

Let's refactor the code to be a lot more modular and comprehendable.

First, we divide the logic into two functions `get_guess()` and `toss_coin()` that are controlled from within a `while` loop.

`get_guess()` not only reads in the user's input but also implements a very simple input validation pattern in that the [strip()](https://docs.python.org/3/library/stdtypes.html?highlight=__contains__#str.strip) and [lower()](https://docs.python.org/3/library/stdtypes.html?highlight=__contains__#str.lower) methods remove preceeding and trailing whitespace and ensure that the user can spell the input in whatever way he likes (e.g. all upper or lower case). Also, we check if the user entered one of the two valid options and, if he did not, we return `None` to indicate that.

In [38]:
def get_guess():
    """Process ther user's input.
    
    Returns:
        guess (str / NoneType): either "heads" or "tails"
            if the input can be parsed and None otherwise
    """
    guess = input("Guess if the coin comes up as heads or tails: ")
    # Handle frequent cases of "misspelled" user input.
    guess = guess.strip().lower()
    # Return None if the user's input is invalid.
    if guess in ["heads", "tails"]:
        return guess
    return None

`toss_coin()` is a very simple function that by default models a fair coin toss.

In [39]:
def toss_coin(p_heads=0.5):
    """Simulate the tossing of a (biased) coin.

    Args:
        p_heads (optional, float): probability that the coin comes up "heads",
            defaults to 0.5, which resembles a fair coin
    Returns:
        coin (str): "heads" or "tails"
    """
    if random.random() > p_heads:
        return "heads"
    return "tails"

Second, we explictly handle the case where `get_guess()` returns `None` (i.e., the user input is invalid) and show a warning to the user.

In [40]:
random.seed(42)

In [41]:
while True:
    guess = get_guess()
    result = toss_coin()
    # Tell the user about invalid input.
    if guess is None:
        print("Make sure to enter your guess correctly!")
    # With valid input, evaluate the user's guess.
    elif guess == result:
        print("Yes, it was", result)
        break
    else:
        print("Ooops, it was", result)

Guess if the coin comes up as heads or tails: invalid
Make sure to enter your guess correctly!
Guess if the coin comes up as heads or tails: heads
Ooops, it was tails
Guess if the coin comes up as heads or tails: tails
Yes, it was tails


Now, our little program's business logic (i.e., the `if`-`elif`-`else`-logic) is a lot easier to comprehend. Also, we can now easily make changes to the program as a whole. For example, we could make the `toss_coin()` function base the tossing on a probability distribution other than the uniform. In general, a modular architecture usually leads to improved software maintenance.