**Recursion** is a method of solving problems that involves breaking a
problem down into smaller and smaller subproblems until you get to a
small enough problem that it can be solved trivially. In computer science, recursion
involves a function calling itself. While it may not seem like much on
the surface, recursion allows us to write elegant solutions to problems
that may otherwise be very difficult to program.


We will begin our investigation with a simple problem that you already
know how to solve without using recursion. Suppose that you want to
calculate the sum of a list of numbers such as: $[1, 3, 5, 7, 9]$. An
iterative function that computes the sum is shown below. The function
uses an accumulator variable (`total`) to compute a running total of all
the numbers in the list by starting with $0$ and adding each number in
the list.

In [1]:
def iterative_sum(numbers):
    total = 0
    for n in numbers:
        total = total + n
    return total

iterative_sum([1, 3, 5, 7, 9])  # => 25

25

Pretend for a minute that you do not have `while` loops or `for` loops.
How would you compute the sum of a list of numbers? If you were a
mathematician you might start by recalling that addition is a function
that is defined for two parameters, a pair of numbers. To redefine the
problem from adding a list to adding pairs of numbers, we could rewrite
the list as a fully parenthesized expression. Such an expression looks
like this:

$$((((1 + 3) + 5) + 7) + 9)$$

We can also parenthesize the expression the other way around,

$$(1 + (3 + (5 + (7 + 9))))$$

Notice that the innermost set of parentheses, $$(7 + 9)$$, is a problem
that we can solve without a loop or any special constructs. In fact, we
can use the following sequence of simplifications to compute a final
sum.

$$total = \  (1 + (3 + (5 + (7 + 9))))
$$

$$
total = \  (1 + (3 + (5 + 16)))
$$

$$
total = \  (1 + (3 + 21))
$$

$$
total = \  (1 + 24)
$$

$$
total = \  25$$

How can we take this idea and turn it into a Python program? First,
let’s restate the sum problem in terms of Python lists. We might say the
the sum of the list `numbers` is the sum of the first element of the
list (`numbers[0]`), and the sum of the numbers in the rest of the list
(`numbers[1:]`). To state it in a functional form:

$$sum_of(numbers) = first(numbers) + sum_of(rest(numbers))$$

In this equation $$first(numbers)$$ returns the first element of the list
and $$rest(numbers)$$ returns a list of everything but the first element.
This is easily expressed in Python as:

In [2]:
def sum_of(numbers):
    if len(numbers) == 0:
        return 0

    return numbers[0] + sum_of(numbers[1:])

sum_of([1, 3, 5, 7, 9])  # => 25

25

There are a few key ideas in this code sample to look at. First, on line
2 we are checking to see if the list is empty. This check is crucial and
is our escape clause from the function. The sum of a list of length 0 is
trivial; it is just zero. Second, on line 5 our function calls itself!
This is the reason that we call the `sum_of` algorithm recursive. A
recursive function is a function that calls itself.

The diagram below shows the series of **recursive calls** that are
needed to sum the list $$[1, 3, 5, 7, 9]$$. You should think of this
series of calls as a series of simplifications. Each time we make a
recursive call we are solving a smaller problem, until we reach the
point where the problem cannot get any smaller.

![Series of recursive calls adding a list of numbers](figures/sum-list-in.png)

When we reach the point where the problem is as simple as it can get, we
begin to piece together the solutions of each of the small problems
until the initial problem is solved. The diagram below shows the
additions that are performed as `sum_of` works its way backward through
the series of calls. When `sum_of` returns from the topmost problem, we
have the solution to the whole problem.

![Series of recursive returns from adding a list of
numbers](figures/sum-list-out.png)

Like the robots of Asimov, all recursive algorithms must obey three
important laws:

1.  A recursive algorithm must have a **base case**.
2.  A recursive algorithm must change its state and move toward the
    base case.
3.  A recursive algorithm must call itself, recursively.

Let’s look at each one of these laws in more detail and see how it was
used in the `sum_of` algorithm. First, a base case is the condition
that allows the algorithm to stop recursing. A base case is typically a
problem that is small enough to solve directly. In the `sum_of`
algorithm the base case is a list of length 1.

To obey the second law, we must arrange for a change of state that moves
the algorithm toward the base case. A change of state means that some
data that the algorithm is using is modified. Usually the data that
represents our problem gets smaller in some way. In the `sum_of`
algorithm our primary data structure is a list, so we must focus our
state-changing efforts on the list. Since the base case is a list of
length 1, a natural progression toward the base case is to shorten the
list. This is exactly what happens in the `sum_of` algorithm when we call `sum_of` with a shorter
list.

The final law is that the algorithm must call itself. This is the very
definition of recursion. Recursion is a confusing concept to many
beginning programmers. As a novice programmer, you have learned that
functions are good because you can take a large problem and break it up
into smaller problems. The smaller problems can be solved by writing a
function to solve each problem. When we talk about recursion it may seem
that we are talking ourselves in circles. We have a problem to solve
with a function, but that function solves the problem by calling itself!
But the logic is not circular at all; the logic of recursion is an
elegant expression of solving a problem by breaking it down into a
smaller and easier problems.

In the remainder of this chapter we will look at more examples of
recursion. In each case we will focus on designing a solution to a
problem by using the three laws of recursion.


Suppose you want to convert an integer to a string in some base between
binary and hexadecimal. For example, convert the integer 10 to its
string representation in decimal as `'10'`, or to its string
representation in binary as `'1010'`. While there are many algorithms to
solve this problem, including the algorithm discussed in the stacks
chapter, the recursive formulation of the problem is very elegant.

Let’s look at a concrete example using base 10 and the number 769.
Suppose we have a sequence of characters corresponding to the first 10
digits, like `CHAR_FOR_INT = '0123456789'`. It is easy to convert a number
less than 10 to its string equivalent by looking it up in the sequence.
For example, if the number is 9, then the string is `CHAR_FOR_INT[9]` or
`'9'`. If we can arrange to break up the number 769 into three
single-digit numbers, 7, 6, and 9, then converting it to a string is
simple. A number less than 10 sounds like a good base case.

Knowing what our base is suggests that the overall algorithm will
involve three components:

1.  Reduce the original number to a series of single-digit numbers.
2.  Convert the single digit-number to a string using a lookup.
3.  Concatenate the single-digit strings together to form the
    final result.

The next step is to figure out how to change state and make progress
toward the base case. Since we are working with an integer, let’s
consider what mathematical operations might reduce a number. The most
likely candidates are division and subtraction. While subtraction might
work, it is unclear what we should subtract from what. Integer division
with remainders gives us a clear direction. Let’s look at what happens
if we divide a number by the base we are trying to convert to.

Using integer division to divide 769 by 10, we get 76 with a remainder
of 9. This gives us two good results. First, the remainder is a number
less than our base that can be converted to a string immediately by
lookup. Second, we get a number that is smaller than our original and
moves us toward the base case of having a single number less than our
base. Now our job is to convert 76 to its string representation. Again
we will use integer division plus remainder to get results of 7 and 6
respectively. Finally, we have reduced the problem to converting 7,
which we can do easily since it satisfies the base case condition of
$$n < base$$, where $$base = 10$$. The series of operations we have just
performed is illustrated below. Notice that the
numbers we want to remember are in the remainder boxes along the right
side of the diagram.

![Converting an integer to a string in base 10](figures/to-string-base-10.png)


Below is a Python implementation of this algorithm for any base between 2 and 16.

In [3]:
CHAR_FOR_INT = '0123456789abcdef'


def to_string(n, base):
    if n < base:
        return CHAR_FOR_INT[n]

    return to_string(n // base, base) + CHAR_FOR_INT[n % base]

to_string(1453, 16)  # => 5Ad

'5ad'

Notice that we check for the base case where `n` is less than
the base we are converting to. When we detect the base case, we stop
recursing and simply return the string from the `CHAR_FOR_INT`
sequence. Subsequently we satisfy both the second and third laws–by making
the recursive call and by reducing the problem size–using division.

Let’s trace the algorithm again; this time we will convert the number 10
to its base 2 string representation (`'1010'`).

The diagram below shows that we get the results we are
looking for, but it looks like the digits are in the wrong order.


![Converting the number 10 to its base 2 string
representation](figures/to-string-base-2.png)

The
algorithm works correctly because we make the recursive call first, then we add the string representation of the remainder. If we
reversed returning the `CHAR_FOR_INT` lookup and returning the `to_string`
call, the resulting string would be backward! But by delaying the
concatenation operation until after the recursive call has returned, we
get the result in the proper order. This should remind you of our
discussion of stacks back in the previous chapter.


Suppose that instead of concatenating the result of the recursive call
to `to_string` with the string from `CHAR_FROM_INT`, we modified our
algorithm to push the strings onto a stack prior to making the recursive
call. The code for this modified algorithm might look like:

In [5]:
CHAR_FROM_INT = '0123456789ABCDEF'

def to_string(n,base):
    stack = []
    while n > 0:
        if n < base:
            stack.append(CHAR_FROM_INT[n])
        else:
            stack.append(CHAR_FROM_INT[n % base])
        n = n // base
    result = ''
    while stack:
        result = result + stack.pop()
    return result

to_string(1453,16)  # => 5AD

'5AD'

Each time we make a call to `to_string`, we push a character on the stack.
Returning to the previous example we can see that after the fourth call
to `to_string` the stack would look the diagram below.
Notice that now we can simply pop the characters off the stack and
concatenate them into the final result, `'1010'`.

![Strings placed on the stack during
conversion](figures/recursion-stack.png)

The previous example gives us some insight into how Python implements a
recursive function call. When a function is called in Python, a **stack
frame** is allocated to handle the local variables of the function. When
the function returns, the return value is left on top of the stack for
the calling function to access. The diagram below
illustrates the call stack after the return statement.

![Call stack generated from
to_string(10, 2)](figures/new-call-stack.png)

Notice that the call to `to_string(2 // 2, 2)` leaves a return value of `'1'`
on the stack. This return value is then used in place of the function
call (`to_string(1,2)`) in the expression `'1' + CHAR_FROM_INT[2 % 2]`, which
will leave the string `'10'` on the top of the stack. In this way, the
Python call stack takes the place of the stack we used explicitly in our algorithm above. In our list summing example, you can
think of the return value on the stack taking the place of an
accumulator variable.

The stack frames also provide a scope for the variables used by the
function. Even though we are calling the same function over and over,
each call creates a new scope for the variables that are local to the
function.


**The Tower of Hanoi** puzzle was invented by the French mathematician
Edouard Lucas in 1883. He was inspired by a legend that tells of a Hindu
temple where the puzzle was presented to young priests. At the beginning
of time, the priests were given three poles and a stack of 64 gold
disks, each disk a little smaller than the one beneath it. Their
assignment was to transfer all 64 disks from one of the three poles to
another, with two important constraints. They could only move one disk
at a time, and they could never place a larger disk on top of a smaller
one. The priests worked very efficiently, day and night, moving one disk
every second. When they finished their work, the legend said, the temple
would crumble into dust and the world would vanish.

Although the legend is interesting, you need not worry about the world
ending any time soon. The number of moves required to correctly move a
tower of 64 disks is $$2^{64}-1 = 18,446,744,073,709,551,615$$. At a rate
of one move per second, that is $$584,942,417,355$$ years! Clearly there
is more to this puzzle than meets the eye.

The animation below demonstrates a solution to the puzzle with four discs. Notice that, as
the rules specify, the disks on each peg are stacked so that smaller
disks are always on top of the larger disks. If you have not tried to
solve this puzzle before, you should try it now. You do not need fancy
disks and poles–a pile of books or pieces of paper will work.

![An animated solution of the Tower of Hanoi puzzle for four disks](figures/hanoi.gif)

How do we go about solving this problem recursively? How would you go
about solving this problem at all? What is our base case? Let’s think
about this problem from the bottom up. Suppose you have a tower of five
disks, originally on peg one. If you already knew how to move a tower of
four disks to peg two, you could then easily move the bottom disk to peg
three, and then move the tower of four from peg two to peg three. But
what if you do not know how to move a tower of height four? Suppose that
you knew how to move a tower of height three to peg three; then it would
be easy to move the fourth disk to peg two and move the three from peg
three on top of it. But what if you do not know how to move a tower of
three? How about moving a tower of two disks to peg two and then moving
the third disk to peg three, and then moving the tower of height two on
top of it? But what if you still do not know how to do this? Surely you
would agree that moving a single disk to peg three is easy enough,
trivial you might even say. This sounds like a base case in the making.

Here is a high-level outline of how to move a tower from the starting
pole, to the goal pole, using an intermediate pole:

1.  Move a tower of height-1 to an intermediate pole, using the
    final pole.
2.  Move the remaining disk to the final pole.
3.  Move the tower of height-1 from the intermediate pole to the final
    pole using the original pole.

As long as we always obey the rule that the larger disks remain on the
bottom of the stack, we can use the three steps above recursively,
treating any larger disks as though they were not even there. The only
thing missing from the outline above is the identification of a base
case. The simplest Tower of Hanoi problem is a tower of one disk. In
this case, we need move only a single disk to its final destination. A
tower of one disk will be our base case. In addition, the steps outlined
above move us toward the base case by reducing the height of the tower
in steps 1 and 3. Below we present a possible Python solution to the Tower of Hanoi puzzle.


```python
def move_tower(height, from_pole, to_pole, with_pole):
    if height >= 1:
        move_tower(height - 1, from_pole, with_pole, to_pole)
        move_disk(from_pole, to_pole)
        move_tower(height - 1, with_pole, to_pole, from_pole)
```

Notice that the code above is almost identical
to the English description. The key to the simplicity of the algorithm
is that we make two different recursive calls, the first to move all but the bottom disk on the
initial tower to an intermediate pole. Before we make a second recursive call, we simply move the
bottom disk to its final resting place. Finally we move the tower
from the intermediate pole to the top of the largest disk. The base case
is detected when the tower height is 0; in this case there is nothing to
do, so the `move_tower` function simply returns. The important thing to
remember about handling the base case this way is that simply returning
from `move_tower` is what finally allows the `move_disk` function to be
called.

If we implement this simple `move_disk` function, we can then illustrate the required moves to solve the problem:

```python
def move_disk(from_pole, to_pole):
    print('moving disk from {} to {}'.format(from_pole, to_pole))
```

Now, calling `move_tower` with the arguments `3, 'A', 'B', 'C'` will give us the output:

```
moving disk from A to B
moving disk from A to C
moving disk from B to C
moving disk from A to B
moving disk from C to A
moving disk from C to B
moving disk from A to B
```

Now that you have seen the code for both `move_tower` and `move_disk`, you
may be wondering why we do not have a data structure that explicitly
keeps track of what disks are on what poles. Here is a hint: if you were
going to explicitly keep track of the disks, you would probably use
three `Stack` objects, one for each pole. The answer is that Python
provides the stacks that we need implicitly through the call stack.
