# Lab 8
## Data Structures & Algorithms

## Today
* [Divide-and-conquer refresher](#divide)
* [Bubble Sort](#motivation-bubble-sort)
* [Improved Bubble Sort](#improved-bubble-sort)
* [Merge Sort](#merge-sort)
* [A Recurrence Relation](#a-recurrency-relation)
* [Master Theorem](#master-theorem)
* [Exercises](#exercises)

# Divide-and-conquer refresher

Divide-and-conquer algorithms are a class of algorithms that solve a problem by:
1. **Divide**: Breaking the problem into smaller, more manageable subproblems.
2. **Conquer**: Solving the subproblems recursively.
3. **Combine**: Merging the solutions of the subproblems to solve the original problem.

### Examples

You saw examples for divide-and-conquer algorithms in the lecture, here is a subselection of those:

- **Merge Sort**: A sorting algorithm that 
    1. divides the array into two halves, 
    2. sorts each half recursively, 
    3. and then merges the sorted halves.
- **Quick Sort**: Another sorting algorithm that: 
    1. selects a pivot element, 
    2. partitions the array around the pivot, 
    3. and then sorts the subarrays recursively.
- **Binary Search**: A searching algorithm that:
    1. divides the sorted array in half at each step,
    2. at each step searches to find the target element, 
    3. does so recursively.
- **Strassen's Algorithm**: A matrix multiplication algorithm that: 
    1. divides matrices into smaller submatrices, 
    2. recursively, 
    3. and combines their products efficiently.
- **Karatsuba Multiplication**: A multiplication algorithm for two $n$-digit numbers which reduces the process to three multiplications of $\frac{n}{2}$-digit numbers.


## Motivation: Bubble Sort

Before we dive into advanced sorting techniques like divide-and-conquer algorithms (e.g., Merge Sort or Quick Sort), it's useful to first understand a simpler, more intuitive sorting method: Bubble Sort.

**Idea of Bubble Sort**: 

*Bubble Sort is a straightforward (i.e. brute force) sorting algorithm that works by repeatedly swapping adjacent elements if they are in the wrong order. The idea is to "bubble up" the largest element to its correct position in each pass.*
1. **Repeated passes**: We iterate through the array $n$ times (where $n$ is the length of the array). 
2. **Pairwise Swapping**: 
    - Compare each adjacent pair of elements.
    - If the first element is greater than the second, swap them.

Effect of Swapping:
- After the first full pass, the largest element moves to the last position.
- After the second full pass, the second largest element is in its correct position, and so on.
- This continues until the array is fully sorted.

Here's an example implementation of bubble sort in Python:

In [None]:
def bubble_sort_brute_force(arr):
    """
    Bubble sort 
    
    Parameters
    ----------
    arr : a list of number

    Returns
    ----------
    The list sorted in ascending order
    """
    
    arr_temp = list(arr)
    n = len(arr_temp)
    
    for i in range(n):
        for j in range(n - 1):
            # Get the difference between two adjacent numbers
            diff = arr_temp[j] - arr_temp[j + 1]
            if diff > 0:
                # Swap the two numbers
                arr_temp[j], arr_temp[j + 1] = arr_temp[j + 1], arr_temp[j]               

    return arr_temp

In [None]:
arr_1 = []
arr_2 = [3]
arr_3 = [3, 2]
arr_4 = [3, 2, 1, 4]

print(bubble_sort_brute_force(arr_1))
print(bubble_sort_brute_force(arr_2))
print(bubble_sort_brute_force(arr_3))
print(bubble_sort_brute_force(arr_4))

[]
[3]
[2, 3]
[1, 2, 3, 4]


#### Complexity

The time complexity of bubble sort is $O(n^2)$. 
- In the worst-case scenario, where the input array is in reverse order, bubble sort will need to make $n$ passes through the array, with each pass requiring $O(n)$ comparisons and swaps. 
- So, despite its simplicity, bubble sort is not efficient for sorting large arrays due to its quadratic time complexity.

# Improved Bubble Sort

We can save on some operations: 
- We know that in the first round, the largest number will be moved to the last place of the array. 
- So, in the second round, we do not have to consider the last element in the array as it is the largest element that we moved there in the first round. 
- By extension, in the third round, we can ignore the last two elements.
- ...and so on. 
- To formalise this: in round $i$, we can ignore the last $i-1$ elements:

In [None]:
def bubble_sort_improved(arr):
    """
    Bubble sort 
    
    Parameters
    ----------
    arr : a list of number

    Returns
    ----------
    The list sorted in ascending order
    """
    
    arr_temp = list(arr)
    n = len(arr_temp)
    
    for i in range(n):
        # in the second loop, we are leaving out the last i-1 elements of the array
        for j in range(n - i - 1):
            # Get the difference between two adjacent numbers
            diff = arr_temp[j] - arr_temp[j + 1]
            if diff > 0:
                # Swap the two numbers
                arr_temp[j], arr_temp[j + 1] = arr_temp[j + 1], arr_temp[j]               

    return arr_temp

#### Complexity

NB: this still has running time $O(n^2)$, but it will still be a bit faster.

# Merge Sort

Merge sort is another sorting algorithm that follows the divide-and-conquer approach. It works by:
1. Dividing the array into two halves, 
2. Sorting each half recursively, 
3. Merging the sorted halves.

<div>
   <img src="images/mergesort_viz.png" width="500px" title="mergesort visualisation">
</div>

Here's an example implementation of merge sort in Python:

In [None]:
def merge_sort(arr):
    """
    Merge sort 
    
    Parameters
    ----------
    arr : a list of number

    Returns
    ----------
    The list sorted in ascending order
    """
    arr_temp = list(arr)
    n = len(arr_temp)    
    
    if n > 1: 
        # STEP 1: DIVIDE
        # Divide the list into two smaller ones
        # The middle of the list
        mid = n // 2 # using floor division (a.k.a integer division)
        # The left sublist
        arr_temp_left = arr_temp[:mid] 
        # The right sublist
        arr_temp_right = arr_temp[mid:]

        # STEP 2: RECURSIVE CALL (UNTIL N=1)
        # Recursively call merge_sort to sort the two smaller lists
        arr_temp_left = merge_sort(arr_temp_left)
        arr_temp_right = merge_sort(arr_temp_right)
        
        # STEP 3: MERGE  
        # Merge the two sorted smaller lists
        i = j = k = 0
        n_left, n_right = len(arr_temp_left), len(arr_temp_right)
          
        while i < n_left and j < n_right: 
            if arr_temp_left[i] < arr_temp_right[j]: 
                arr_temp[k] = arr_temp_left[i] 
                i += 1
            else: 
                arr_temp[k] = arr_temp_right[j] 
                j += 1
            k += 1
          
        # If there are elements in arr_temp_left that have not been visited 
        while i < n_left: 
            arr_temp[k] = arr_temp_left[i] 
            i += 1
            k += 1
 
        # If there are elements in arr_temp_right that have not been visited 
        while j < n_right: 
            arr_temp[k] = arr_temp_right[j] 
            j += 1
            k += 1
            
    return arr_temp

In [None]:
arr_1 = []
arr_2 = [3]
arr_3 = [3, 2]
arr_4 = [3, 2, 1, 4]

print(merge_sort(arr_1))
print(merge_sort(arr_2))
print(merge_sort(arr_3))
print(merge_sort(arr_4))

[]
[3]
[2, 3]
[1, 2, 3, 4]


### Time Complexity Analysis

Merge Sort has a time complexity of $O(n \log n)$ in all cases, making it a more efficient sorting algorithm than Bubble Sort. 

It achieves this time complexity by dividing the array into halves recursively and merging the sorted halves *efficiently*.

Let's look at this in more detail: remember that for divide-and-conquer algorithms, we use a **recurrence relation** to express the running time.

# A Recurrence Relation
First, let's say $T(n)$ is the worst-case running time of the algorithm for an input of size $n$.

For MergeSort: $T(n)$ can be decomposed into 3 separate processes at each step:
      1. we subdivide the problem (in this case into two pieces of size $n/2$). 
      2. we then know that the algorithm will spend $T(n/2)$ on each sub-problem 
      3. plus some amount of time (in this case $O(n)$) on combining the subproblems. 

The running time of MergeSort algorithm therefore satisfies the following recurrence relation:

<div>
   <img src="images/screenshot_mergesort.png" width="300px">
</div>

There are different ways of solving such a recurrence relation (i.e. make $T$ only appear on the left-hand side of the inequality). The most intuitive way is to 'unroll' the recurrence and look at patterns in the first few levels:

<div>
   <img src="images/screenshot_rectree_mergesort.png" width="500px">
</div>

#### What's going on here? 

*For all levels, the division operation is $c$ -- $O(1)$ -- so we can effectively ignore it and focus on the merge and recurion elements!*

1. **Top Level (Entire Array of Size $n$)**
   - The merge operation (combining sorted halves) takes $O(n)$ time *(we just assert this for now)*.
   - We also make two recursive calls to sort the left and right halves.
2. **Second Level (Two Subproblems of Size $n/2$)**
   - Each of the two halves is of size $n/2$.
   - Each half takes at most $O(n/2)$ time for merging.
   - Since there are two such halves, the total time at this level is still $O(n)$.
3. **Third Level (Four Subproblems of Size $n/4$)**
   - Each subproblem is now $n/4$ in size.
   - Each takes at most $O(n/4)$ time for merging.
   - Since there are four such subproblems, the total time at this level remains $O(n)$.
- **Generalizing to Deeper Levels**
   - At each level of recursion, the total merging time remains $O(n)$ because there are always enough subproblems to sum up to $n$.
   - The recursion continues until we reach base cases (single-element arrays).

#### So at level $j$: 
1. Number of Subproblems at Level $j$
   - At each level of recursion, the number of subproblems **doubles** compared to the previous level.
   - We start with **1 subproblem** (the entire array), after $j$ levels, there are: $2^j \text{ subproblems}$

2. Size of Each Subproblem at Level $j$
   - The array is divided in half at every level.
   - After $j$ levels, each subproblem is of size: $\frac{n}{2^j}$:
      - after **one split**, the size is $ n/2 $, 
      - after **two splits**, it's $ n/4 $, 
      - and so on.

3. Total Work Done at Level $j$
   - As before, the division operation is done in constant time, so it drops out. This leaves us just with merging and recursion.
   - Each of the $ 2^j $ subproblems requires at most $ O(n/2^j) $ time for merging.
   - Since there are $ 2^j $ such subproblems, the total work done at level $j$ is: $2^j \cdot \frac{n}{2^j} = n$

**The key point to get here is that the total work remains $ O(n) $ at **every** level!**

4. Summing Over All Levels
   - The recursion continues until we reach base cases where the subproblem size is **1**.
   - The number of levels required to reduce $ n $ down to **1** is the number of times we can halve $ n $:
   
   $$\log_2 n$$
   
   - Since we do **$ O(n) $ work at each level**, and there are $ O(\log n) $ levels, the total time complexity is:
   $$O(n \log n)$$

# Master Theorem

Another way to solve recurrence relations is to use the **Master Theorem**.

The Master Theorem provides a systematic way to analyze the runtime of divide-and-conquer algorithms by expressing their recurrence relation in a general form:

$$
T(n) = aT\left(\frac{n}{b}\right) + O(n^c)
$$

where:
- $ a $ is the number of recursive subproblems created at each step.
- $ b $ is the factor by which the problem size shrinks at each level.
- $ c $ is the cost of the merging or combining step.

This gives the following properties:
- $1 + \log_b n$ is the number of levels (the **depth**),
- $\frac{n}{b^i}$ is the **size** of subproblems at level $i$,
- $a^i$ is the **number** of subproblems at level $i$.

### General Form of Running Time
Using these properties we can work out a general form of running time for any divide-and-conquer problem.

Following a pattern similar to the **recursion tree analysis**, we sum the work done **at each level** of recursion: given by the relationship between 
  1. the depth of the recursion, 
  2. the scaling of the subproblems at each level, 
  2. and the number of subproblems produced at each level.

Formally:
- The recursion **depth** is $ \log_b n $ (since the problem size reduces by a factor of $ b $ at each level).
  - where $n$ is the length of the
- At each level, the work done depends on:
  1. **The number of subproblems** at that level. ($a^i$)
  2. **The size of each subproblem** and the time required to process it. ($\frac{n}{b^i}$)

These factors determine the recurrence ratio $ r $, which helps classify the running time into one of three cases.

### Classifying Time Complexity with the Master Theorem
Formally, we can define the following:

$$
r = \frac{a}{b^c}
$$

The total running time follows:

$$
T(n) = n^c \sum_{i=0}^{\log_b n} r^i
$$

which leads to the following cases:

$$
T(n) =
\begin{cases} 
O(n^c) & \text{if } r < 1 \quad (c > \log_b a) \quad \text{(Root-dominated)} \\
O(n^c \log n) & \text{if } r = 1 \quad (c = \log_b a) \quad \text{(Balanced)} \\
O(n^{\log_b a}) & \text{if } r > 1 \quad (c < \log_b a) \quad \text{(Leaf-dominated)}
\end{cases}
$$

### Root-Dominated, Balanced, and Leaf-Dominated Cases
The Master Theorem helps determine whether an algorithm's time complexity is:
- **Root-dominated**: Work is mainly done at the top level (combining step).
- **Balanced**: Work is evenly distributed across levels.
- **Leaf-dominated**: Work is mainly done at the leaves (smallest subproblems).

## Example Analysis
Consider an algorithm where:
1. The **combining step** takes linear time ($ c = 1 $).
2. The problem is divided into **subproblems** of size $ n/2 $ (i.e., $ b = 2 $).

Thus, we compute:

$$
r = \frac{a}{2}
$$

For different values of $ a $, we get different running times:

| **Number of Subproblems ($ a $)** | **Running Time** | **Classification** |
|--------------|----------------|------------------|
| $ a = 1 $ | $ O(n) $ | **Root-dominated** (Cost dominated by the top level) |
| $ a = 2 $ | $ O(n \log n) $ | **Balanced** (Merge Sort case) |
| $ a > 2 $ | $ O(n^{\log_2 a}) $ | **Leaf-dominated** (Cost dominated by the leaves) |

*NB: in our MergeSort case, $a=2$ and $b^c=2$, so it is linear; it produces two subproblems at each level, but each subproblem's size is halved.*

### Intuition Behind These Cases
- **When $ a = 1 $:** 
  - Each recursive call reduces the problem size, but since there's only one subproblem at each step, the total work remains **linear**.
  - The **top-level work (combining step)** dominates the complexity.
  <div>
   <img src="images/screenshot_rectree_a1.png" width="200px">
  </div>  
  
- **When $ a = 2 $ (Merge Sort case):** 
  - Each level does roughly the same amount of work.
  - The recursion **depth is $ \log n $**, and each level takes $ O(n) $, so the total time is $ O(n \log n) $.
  - Work is **evenly distributed across levels**.
  <div>
   <img src="images/mergesort_viz.png" width="500px" title="mergesort visualisation">
  </div>

- **When $ a > 2 $:**
  - The number of subproblems grows **faster than the problem size shrinks**, leading to an increasing workload at the lower levels.
  - The **leaves of the recursion tree** dominate the total complexity.
  <div>
   <img src="images/screenshot_rectree_a3.png" width="500px">
  </div>

(All visualisations of recursive trees are taken from the Kleinberg textbook, which also has great further explanations in it!).


## Exercises

### Exercise 1

Extend the bubble sort algorithm from above, to make it even more efficient. 

When we pass an already sorted array to the implementation of the algorithm above, it always goes through the array $n$ times, which is unnecessary. In this even further improved version, we want to make sure that after each round of passing through the array, we first check if the array has already been sorted and we terminate the algorithm as soon as we find that it has. Implement this according to the following idea:

* remember that we compare each two adjacent elements in the array and do a swap if the first is larger than the second
* this means that if no two elements were swapped in a round, the array is already sorted!
* include a variable in your code that indicates if any swap was made in a round
* then terminate the algorithm once the flag variable has not recorded any swaps

In [15]:
def bubble_sort_optimal(arr):
    """
    Bubble sort 
    
    Parameters
    ----------
    arr : a list of number

    Returns
    ----------
    The list sorted in ascending order
    """

    # Implement me

### Exercise 2

Modify the merge sort algorithm to sort the elements in descending order.

In [23]:
def merge_sort_desc(arr):
    """
    Merge sort in descending order
    
    Parameters
    ----------
    arr : list
        A list of numbers
    
    Returns
    -------
    list
        The list sorted in descending order
    """
    # Implement me

### Exercise 3

As discussed in the lecture, an **inversion** in an array is when for two elements `array[i]` and `array[j]` we have `array[i]` > `array[j]` and `i < j`. E.g. `array = [3,1,2]` has two inversion: `(3,1)` and `(3,2)`. In other words, an inversion is every pair of elements that is violating an ascending order of the elements.

Implement an algorithm for counting inversions in a naive way, where you go through every single pair of elements and check if it is an inversion. If it is, increase a counter by 1.

In [2]:
def count_inversions_brute_force(arr):
    """
    Count inversions in an array using the brute-force approach.
    
    Inversion in an array occurs when a pair of elements (arr[i], arr[j]) where i < j,
    and arr[i] > arr[j].
    
    Parameters
    ----------
    arr : list
        A list of numbers.

    Returns
    -------
    int
        The number of inversions in the array.
    """
    # Implement me

### Exercise 4

What is the time and space complexity of this algorithm? 

### Exercise 5

Now, implement the counting inversions algorithm so that it runs in $O(n \log n)$ using divide-and-conquer. In the end, it should return the total number of inversions and the sorted input array. Remember what you learnt about this in the lectures and read the very helpful section in the Algorithm Design (Kleinberg & Tardos) textbook (in chapter 5). The following hints for the implementations (and the solutions that will be provided) come from [this](https://www.geeksforgeeks.org/python-program-for-count-inversions-in-an-array-set-1-using-merge-sort/) website. Do try this yourself before looking at the implementation!

* the idea of divide-and-conquer is always to recursively divide the array into subarrays
* imagine that we divide an array into two subarrays and manage to find the number of inversions for each
* to find the total number of inversions, we are then only missing the inversions that need to be counted across the two subarray (i.e. in the 'combination' or 'merge' step of the divide-and-conquer algorithm)
* so the total number of inversions is the number of inversions in the left subarray, right subarray, and merge().
* to get the number of inversions in merge(): let i is used for indexing left sub-array and j for right sub-array. At any step in merge(), if a[i] is greater than a[j], then there are (mid – i) inversions. because left and right subarrays are sorted, so all the remaining elements in left-subarray (a[i+1], a[i+2] … a[mid]) will be greater than a[j]

To deal with the last part of the algorithm (counting the inversions in 'merge'), first write a merge-and-sort functions according to the following pseudo code from the Algorithm Design textbook. Note that this is a **very** similar algorithm to a part of the merge sort algorithm we looked at above (except you also have to keep track of the inversions as you merge the two arrays).

```
Merge-and-Count(A,B)
    Maintain a Current pointer into each list, initialized to point to the front elements (e.g. use i and j, that are both 0 to start with)
    Maintain a variable Count for the number of inversions, initialized to 0
    While both lists are nonempty:
        Let ai and bj be the elements pointed to by the Current pointers, ai = A[i] and bj = B[j]
        Append the smaller of these two to the output list
        If bj is the smaller element:
            Increment Count by the number of elements remaining in A
        Endif
        Advance the Current pointer in the list from which the smaller element was selected.
    EndWhile
    Once one list is empty, append the remainder of the other list to the output
    Return Count and the merged list
```

In [53]:
def merge_and_count(A, B):
    """
    Merge two sorted lists and count inversions
    
    Parameters
    ----------
    A : list
        A sorted list.
    B : list
        Another sorted list.

    Returns
    ----------
    tuple
        A tuple containing the merged sorted list and the number of inversions.
    """
    # Implement me

### Exercise 6

We now use the function written in the last exercise, to write the algorithm for counting inversions, that we call `sort_and_count`.

Again, use the pseudo-code from the text book as a helper:

```
Sort-and-Count(L)
If the list has one element:
    there are no inversions
Else
    Divide the list into two halves:
        A contains the first ⌈n/2⌉ elements
        B contains the remaining ⌊n/2⌋ elements
    (rA, A) = Sort-and-Count(A)
    (rB, B) = Sort-and-Count(B)
    (r,L) = Merge-and-Count(A,B)
Endif
Return r =rA +rB +r, and the sorted list L
```

In [56]:
def sort_and_count(L):
    """
    Sort a list and count inversions using divide-and-conquer approach
    
    Parameters
    ----------
    L : list
        A list of elements.

    Returns
    ----------
    tuple
        A tuple containing the number of inversions and the sorted list.
    """
    
    # Implement me

### Exercise 7

Use the Master Theorem to give time complexity of the algorithm in Exercise 6 and say whether it is root-dominated, leaf-dominated or balanced.