# Data Structures and Algorithms in Python - Ch.4: Recursion
### AJ Zerouali, 2023/09/14

## 0) Introduction

Linked lists were one of the last chapters in my first programming course in 2007. In 2010, I failed an interview at Ubisoft because I forgot everything I knew about this data structure. I hereby pledge to take my revenge against linked lists in these notes.

**References:**

- Chapter 4 of "Data structures and algorithms in Python", by Goodrich, Tamassia and Goldwasser (primary, abbreviated [GTG13]). 
- Section 15 of "Python for Data Structures, Algorithms, and Interviews!" by Jose Portilla.

**Comments on Portilla's course:**
- Covers 3 topics: Recursion in general, recursion, and dynamic programming. Tail recursion alluded to in Lecture 105 (last in section 15).
- Memoization has to do with caching results, and doesn't seem to be covered in [GTG13].
- 


## 1) Basic examples of recursion.

Some basic examples to illustrate the concepts. See the homework problems in the exercises notebook for more.

In the programming context, a recursive function is one that calls itself recursively. For this process to stop at some point, one needs a base case that will typically be the last call of the function to itself. As one might expect, this is a CS counterpart of induction in the mathematical sense.

A useful implementation trick is to remember that the *return* line always involves the recursive function, and this is always a good starting point to decide how to set up the induction and the base case.

**Comment:** Recursion seems to be used to get rid of certain loops. Does that indeed improve the time complexity?

### 1.a - The factorial

This is the most elementary and illustrative example. In a first programming course, the factorial example is used to teach loops. In data structures and algorithms, it is used to illustrate the idea of a function calling itself recursively.

Since $n! = n\cdot (n-1)!$ for $n>0$, our factorial function could simply return:

        def factorial(n):
            ...
            return n*factorial(n-1)

The question now is to stop this chain of calls with the base case of $0!=1$. The recursive implementation of the factorial is therefore:

In [1]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n*factorial(n-1)

In [2]:
factorial(6)

720

### 1.b - Binary search revisited

We discussed binary search in section 2.b of part 1 (Algo_Analysis). Instead of relying on a while loop, we could nest the function calls each time we 

In [3]:
def binary_search(arr, target, low = 0, high = None):
    '''
        Recursive implementation of binary search.
        :param arr: sorted list of numbers
        :param target: target value to find in arr.
        :param low: lower-bound index for search. 0 by default.
        :param high: upper-bound index. None by default. 
        :return: index of target in arr if search is successful and False otherwise.
    '''
    if not high:
        high = len(arr)-1
    if high < low:
        return False
    else:
        mid = (high+low)//2
        if target == arr[mid]:
            return mid
        elif target > arr[mid]:
            return binary_search(arr, target, mid + 1, high)
        elif target < arr[mid]:
            return binary_search(arr, target, low, mid -1)

## 2) Memoization (intro to dynamic programming)

Portilla devoted two interview exercises to memoization/dynamic programming. 

Dynamic programming, in the context of DSA is an algorithm that divides the general problem into similar subproblems, and builds upon the latter to compute the solution of the main problem. This is not exactly the same as the optimal control and reinforcement learning concept of dynamic programming (which is about policy evaluation and then policy improvement). This topic is addressed in section 13.3 of [GTG13], in the context of text processing.

Memoization, is about using a cache or a lookup table when solving a problem. Portilla doesn't give a formal lecture about this topic, and redirects to the Wikipedia page:

https://en.wikipedia.org/wiki/Memoization

As this is only an overview of the topic of memoization/dynamic, I will just provide basic examples of how this is implemented without much depth (for now).

### 2.a - Example: The factorial function

We gave a recursive implementation of the factorial function. Here we will use a cache to store previously computed results. We take this from the Wikipedia page:

In [16]:
def factorial(n, cache = None):
    if cache==None or len(cache)<n:
        cache = [1]+ [None]*n
    if n==0:
        return cache[n]
    elif cache[n]!=None:
        return cache[n]
    else:
        x = factorial(n-1, cache)*n
        cache[n]=x
        return x

In [17]:
factorial(6)

720

In [18]:
factorial(0)

1

### 2.b - Some remarks

Dynamic programming is a popular topic for tech interviews, and deserves its own section. [GTG13] discusses it rather late, in section 4.3, after having discussed trees and divide-and-conquer algorithms. Portilla mentions this topic only in passing, and provides the coin change problem as an illustration. In Karimov's DSA course, dynamic programming is also covered rather late, in sections 47 and 48. In conclusion, I will not write more about this topic in this notebook.

## 3) Analyzing the running time of recursions

This is based on section 4.2 of [GTG13]. Before giving some concrete examples of how running times are analyzed in this context, we introduce some CS vocabulary ([GTG13] p.161):
- By an **activation** of the recursive function, we mean a particular *invocation* of this function.
- By **tracing a recursion** we mean chasing the diagram of calls/activations for a given input size *n*.

**Comment:** Finding the definition of "trace" is precisely what I have always abhorred about CS and engineering: the use of terms with no accurate definition and the lack of good communication practices.

### 3.a - Example: The factorial function

It is easy to see in the example of section 1.a that:
1) The call *factorial(n)* will entail $(n+1)$ activations of the function.
2) For each individual activation of *factorial(k)*, the body executes a constant number of operations.
3) From the above, the $(n+1)$ activations of $O(1)$ complexity lead to an $O(n)$ running time for *factorial(n)*.

### 3.a - Example: Binary search

In this case (see section 1.b), we again have an $O(1)$ running time for each activation. It remains to determine the number of activations for a given $n$, in the worst case scenario where the target value lies at one of the endpoints of the array. Since at each activation we are dividing the length of the sorted array by at least 2, the number of activations $k$ required to reach the final length of $1$ must satisfy $n=2^k$, so that $k=\log_2(n)$. This gives us the $O(\log(n))$ complexity of recursive binary search.

**Comments:** [GTG13] analyze more examples in section 4.2, and mention *amortization* (sec.5.3) and *tree traversal* (Ch.8).

## 4) Issues with recursion

See section 4.3 of [GTG13]. The main takeaway of this section is that algorithms based on recursion can easily end-up having exponential running times. The first case they discuss is the element uniqueness problem of section 3.3.3, which we skip here.

An important example of exponential execution time with recursion is a bad implementation of Fibonacci sequence computation (see Ex.3 in the exercises notebook). Suppose we use the following function to compute the $n$-th Fibonacci number $F_n$:

In [None]:
def fib_rec(n):
    if n==0:
        return 0
    elif n==1:
        return 1
    elif n>1:
        return fib_rec(n-1)+fib_rec(n-2)

We have two issues with this implementation. The first one, which is more obvious, is that we are repeating a substantial number of elementary computations when we call *fib_rec(n-1)+fib_rec(n-2)* at each step. 

The second issue, which is far more concerning, is the rate of growth of the terms, meaning that the number of calls more than doubles at each activation of *fib_rec()*. To see this fact, notice first that $F_n>2^{\frac{n}{2}}$ for all $n\ge 4$. As such, we can infer that computing $F_n$ for a large number is $\Theta(2^{n/2})$, since it is based on additions.

With a usual for loop, computing $F_{50}$ takes around $5\cdot10^{-5}s$ to compute, while *fib_rec(50)* takes more than $10min$ (in fact it freezes the Python kernel for Jupyter at some point).

The correct recursive implementation of a function that computes the $n$-th term of the Fibonacci sequence is as follows:

In [None]:
def fibonacci(n):
    if n<=1:
        return (1, 0)
    else:
        (a,b)=fibonacci(n-1)
        return (a+b, b)