# Outline

- Sorting data: from naive to sophisticated.
- From merging sorted arrays to decomposing an unsorted array into multiple singular (and therefore sorted) arrays to be merged.
- How to measure performance. Big-oh notation. Asymptotic behavior
- Binary search.
- Other topics to discuss:
  - Examples here are done with lists of integer values, but can be generalized to any object that has a comparable method (discuss how define `__lt__` for that purpose).
  - Maybe a good time to introduce generics in Python.
  - In place v. return sort

# Summary of assignments

This assignment comprises \*\*\* problems.

- ...


# Naive sorting

Bubble sort in array with $n$ elements performs $n$ iterations. In the first iterations there are $n$ passes, in the second $n-1$, and so on. The last iteration performs a single pass. The total number of passes is

$$ 1 + 2 + \ldots + n = \frac{n(n+1)}{2} $$


In [12]:
def naive_sort(arr: list) -> list:
    """Naive sort implementation using bubble sort algorithm."""
    n: int = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                # Use plain swap instead of Pythonic tuple swap for clarity
                # Αlso good opportunity to discuss swapping without a temp variable
                # and without tupple unpacking (XOR or addition/subtraction based)
                temp = arr[j]
                arr[j] = arr[j + 1]
                arr[j + 1] = temp
    return arr

# Merge of two sorted arrays


In [13]:
def merge(arr1: list, arr2: list) -> list:
    """Merge two sorted arrays into a single sorted array. Avoid Pythonic code
    with list.extend() or list slicing for clarity."""
    merged: list = [None] * (len(arr1) + len(arr2))
    i: int = 0
    j: int = 0
    k: int = 0
    while i < len(arr1) and j < len(arr2):
        if arr1[i] < arr2[j]:
            merged[k] = arr1[i]
            i += 1
        else:
            merged[k] = arr2[j]
            j += 1
        k += 1
    # Fill any remaining elements from either array
    while i < len(arr1):
        merged[k] = arr1[i]
        i += 1
        k += 1
    while j < len(arr2):
        merged[k] = arr2[j]
        j += 1
        k += 1
    return merged

### Demonstrate `merge`


In [14]:
abcd = [0, 2, 4, 6]
efgh = [1, 3, 5, 7]
abcdefgh = merge(abcd, efgh)
print(abcdefgh)

[0, 1, 2, 3, 4, 5, 6, 7]


### Visualize recursion

First, step by step using the following code


In [15]:
a = [6]
b = [5]
c = [1]
d = [0]
e = [4]
f = [2]
g = [7]
h = [3]

ab = merge(a, b)
print(ab)
cd = merge(c, d)
print(cd)
ef = merge(e, f)
print(ef)
gh = merge(g, h)
print(gh)

abdc = merge(ab, cd)
print(abdc)
efgh = merge(ef, gh)
print(efgh)

abcdefgh = merge(abdc, efgh)
print(abcdefgh)

[5, 6]
[0, 1]
[2, 4]
[3, 7]
[0, 1, 5, 6]
[2, 3, 4, 7]
[0, 1, 2, 3, 4, 5, 6, 7]


Then by going back to the code above and substituting with on-the-spot cut-and-paste: `abcd` and `efgh` with `merge(ab, cd)` and `merge(ef, gh)`, and so on.


In [16]:
a = [6]
b = [5]
c = [1]
d = [0]
e = [4]
f = [2]
g = [7]
h = [3]

ab = merge(a, b)
cd = merge(c, d)
ef = merge(e, f)
gh = merge(g, h)

abdc = merge(ab, cd)
efgh = merge(ef, gh)

abcdefgh = merge(abdc, efgh)

print(abcdefgh)

print(merge(merge(merge(a, b), merge(c, d)), merge(merge(e, f), merge(g, h))))

[0, 1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7]


### Recursion in practice

How can we write code do to the following if the input is an array like
`[6,5,1,0,4,2,7,3]`? We need to covert it to arrays `[6]`, `[5]`, `[1]`, `[0]`, `[4]`, `[2]`, `[7]`, `[3]`.

```python
merge(
    merge(                               # \
      merge([6], [6]),  # returns [5, 6] #  } returns [0,1,5,6] \
      merge([1], [0])), # returns [0, 1] # /                     \
                                         #                        } returns [0,1,2,3,4,5,6,7]
    merge(                               # \                     /
      merge([4], [2]),  # returns [2, 4] #  } returns [2,3,4,7] /
      merge([7], [3])   # returns [3, 7] # /
    )
  )
```

Write a method that takes a list of size $2^p$ and splits it into two halves and returns them.


In [17]:
def split_in_half(arr: list) -> list:
    """Split an array into two halves and return them as a list."""
    mid: int = len(arr) // 2
    left: list = arr[:mid]
    right: list = arr[mid:]
    return [left, right]


# Example
test = [6, 5, 1, 0, 4, 2, 7, 3]
left, right = split_in_half(test)
print(left)  # Output: [6, 5, 1, 0]
print(right)  # Output: [4, 2, 7, 3]

left_left, left_right = split_in_half(left)
print(left_left)  # Output: [6, 5]
print(left_right)  # Output: [1, 0]

right_left, right_right = split_in_half(right)
print(right_left)  # Output: [4, 2]
print(right_right)  # Output: [7, 3]

# Further splits
left_left_left, left_left_right = split_in_half(left_left)
print(left_left_left)  # Output: [6]
print(left_left_right)  # Output: [5]

left_right_left, left_right_right = split_in_half(left_right)
print(left_right_left)  # Output: [1]
print(left_right_right)  # Output: [0]

right_left_left, right_left_right = split_in_half(right_left)
print(right_left_left)  # Output: [4]
print(right_left_right)  # Output: [2]

right_right_left, right_right_right = split_in_half(right_right)
print(right_right_left)  # Output: [7]
print(right_right_right)  # Output: [3]

[6, 5, 1, 0]
[4, 2, 7, 3]
[6, 5]
[1, 0]
[4, 2]
[7, 3]
[6]
[5]
[1]
[0]
[4]
[2]
[7]
[3]


### Keep splitting in half


In [18]:
def interesting_function(arr: list) -> None:
    print(f"\nAbout to process array {arr}")
    if len(arr) == 1:
        print("\tArray of size 1, nothing to do.")
    elif len(arr) > 1:
        print(f"\tArray size is {len(arr)} > 1, must split again.")
        left, right = split_in_half(arr)
        print(f"\t\tLeft half: {left}, Right half: {right}")
        interesting_function(left)
        interesting_function(right)


interesting_function(test)


About to process array [6, 5, 1, 0, 4, 2, 7, 3]
	Array size is 8 > 1, must split again.
		Left half: [6, 5, 1, 0], Right half: [4, 2, 7, 3]

About to process array [6, 5, 1, 0]
	Array size is 4 > 1, must split again.
		Left half: [6, 5], Right half: [1, 0]

About to process array [6, 5]
	Array size is 2 > 1, must split again.
		Left half: [6], Right half: [5]

About to process array [6]
	Array of size 1, nothing to do.

About to process array [5]
	Array of size 1, nothing to do.

About to process array [1, 0]
	Array size is 2 > 1, must split again.
		Left half: [1], Right half: [0]

About to process array [1]
	Array of size 1, nothing to do.

About to process array [0]
	Array of size 1, nothing to do.

About to process array [4, 2, 7, 3]
	Array size is 4 > 1, must split again.
		Left half: [4, 2], Right half: [7, 3]

About to process array [4, 2]
	Array size is 2 > 1, must split again.
		Left half: [4], Right half: [2]

About to process array [4]
	Array of size 1, nothing to do.

Abou

### Simple mergesort

The method above takes an array and breaks it down to single-element arrays. These arrays are, by definition sorted. As we already demonstrated that we can feed them into method `merge` and combine them into a single, also sorted, array. Putting these two techniques together:


In [19]:
def interesting_merge_function(arr: list) -> list:
    """Recursively sort an array using merge sort algorithm."""
    result = arr
    if len(arr) > 1:
        left, right = split_in_half(arr)
        interesting_left = interesting_merge_function(left)
        interesting_right = interesting_merge_function(right)
        result = merge(interesting_left, interesting_right)
    return result


print(f"array before: {test}\n array after: {interesting_merge_function(test)}")

array before: [6, 5, 1, 0, 4, 2, 7, 3]
 array after: [0, 1, 2, 3, 4, 5, 6, 7]


### Iterative mergesort

Explain the difference between recursive and iterative approaches.


In [20]:
def iterative_merge_sort(arr: list) -> list:
    """Bottom-up (iterative) merge sort for lists of integers.

    Uses the existing `merge` function that merges two sorted lists.

    Contract:
    - Input: arr: list[int]
    - Output: new sorted list[int] (does not modify input)
    - Error modes: if input contains non-comparable elements, behavior is undefined

    Approach:
    - Start with run_size = 1 and iteratively merge adjacent runs of length run_size.
    - Double run_size each pass until run_size >= len(arr).

    Time: O(n log n), Space: O(n) due to merged buffers.
    """
    n = len(arr)
    # Fast path
    if n <= 1:
        return arr.copy()

    # Work on copies to avoid mutating the original list
    src = [x for x in arr]

    # Temporary list for merged output; we'll swap src/dest each pass
    dest = [None] * n

    run_size = 1
    while run_size < n:
        # Merge pairs of runs of length run_size
        for start in range(0, n, 2 * run_size):
            mid = min(start + run_size, n)
            end = min(start + 2 * run_size, n)
            left = src[start:mid]
            right = src[mid:end]
            merged = merge(left, right)
            # copy merged back into dest
            dest[start : start + len(merged)] = merged
        # swap buffers
        src, dest = dest, src
        run_size *= 2
    return src


# Demo and quick smoke tests
if __name__ == "__main__":
    tests = [
        [],
        [1],
        [2, 1],
        [6, 5, 1, 0, 4, 2, 7, 3],
        [5, 4, 3, 2, 1, 0],
        [3, 1, 4, 1, 5, 9, 2, 6, 5],
    ]
    for t in tests:
        print(f"in: {t} -> out: {iterative_merge_sort(t)}")

in: [] -> out: []
in: [1] -> out: [1]
in: [2, 1] -> out: [1, 2]
in: [6, 5, 1, 0, 4, 2, 7, 3] -> out: [0, 1, 2, 3, 4, 5, 6, 7]
in: [5, 4, 3, 2, 1, 0] -> out: [0, 1, 2, 3, 4, 5]
in: [3, 1, 4, 1, 5, 9, 2, 6, 5] -> out: [1, 1, 2, 3, 4, 5, 5, 6, 9]


## Binary search

Start with simple linear search.


In [21]:
def linear_search(arr: list, target: int) -> bool:
    found = False
    i = 0
    while not found and i < len(arr):
        found = arr[i] == target
        i += 1
    return found

- Best case scenario, `arr[0] == target`.
- Worst case scenario, `arr[len(arr)-1] == target` or `target` $\not\in$ `arr`.

Next assume the array is ordered already.


In [22]:
def binary_search(arr: list, target: int) -> bool:
    low = 0
    high = len(arr) - 1
    found = False
    while not found and low <= high:
        mid = (low + high) // 2
        found = arr[mid] == target
        if arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return found

# Demo and quick smoke tests
if __name__ == "__main__":   
    arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    targets = [0, 5, 9, -1, 10]
    for target in targets:
        print(f"linear_search(arr, {target}) = {linear_search(arr, target)}")
        print(f"binary_search(arr, {target}) = {binary_search(arr, target)}")

linear_search(arr, 0) = True
binary_search(arr, 0) = True
linear_search(arr, 5) = True
binary_search(arr, 5) = True
linear_search(arr, 9) = True
binary_search(arr, 9) = True
linear_search(arr, -1) = False
binary_search(arr, -1) = False
linear_search(arr, 10) = False
binary_search(arr, 10) = False


Discuss $\mathcal O(\log_2 n)$ but point out the extra effort required to sort the array ($\mathcal O(n\log_2 n)$).