# Divide and Conquer

Divide and conquer is a technique that breaks the problem into subproblems that are themselves smaller instances of the type of problem. A divide and conquer algorithm recursively solves these subproblems and appropriately combines their ansers.

Because divide and conquer algorithms are recursive, we can prove their correctness with induction.

(Slides 79 - 88, 285 - 288)

In [16]:
import numpy as np

In [17]:
!pip install ipynb
from ipynb.fs.full.nb3Sorting import *



## Searching
- Input: a list of numbers, sorted in increasing order $A = [a_0, a_1, ..., a_{n-1}]$ and a number x
- Goal: return the position of x in A or return a statement that x is not in the list

**BinarySearch(A, x, min, max)**
initially min = 0, max = n-1
1. if max < min: return "$x \notin A$"
2. mididx = floor((max + min)/2)
3. if x < A[mididx] then return BinarySearch(A, x, min, mididx-1)
4. else if x > A[mididx] then return binarySearch(A, x, mididx + 1, max)
5. else return mididx

**Proof of Correctness**:
- Part 1: show that if $x \in A$, then binary search will return an index i such that A[i] = x.
    - Base case: n = 0, so in the first iteration max - min = 0, so max = min = mididx. We know $x\in A$, so x = A[min] = A[max] = A[mididx]. Step 5 tells us to return A[mididx], so it returns i = mididx such that A[i] = x $\checkmark$
    - Strong inductive hypothesis: suppose $x \in A[min:max]$ and $min - max \le k$, then binary search returns an index i such that A[i] = x.
    - Inductive step: now suppose $max - min = k + 1$ and $x \in [min:max]$.
        - case 1: x < A[mididx]. Because A is sorted, x must be in A[min:mididx - 1], and we return binary search on A[min:mididx - 1] in step 3. $mididx - 1 - min \lt max - min = k + 1$ so $mididx - 1 - min \le k$. Thus, by the strong inductive hypothesis, this returns an i s.t. A[i] = x. $\checkmark$
        - case 2: x > A[mididx]. Because A is sorted, x must be in A[mididx + 1:max], and we return binary search on A[mididx + 1:max] in step 3. $max - 1 - mididx \lt max - min = k + 1$ so $max - 1 - mididx \le k$. Thus, by the strong inductive hypothesis, this returns an i s.t. A[i] = x. $\checkmark$
        - case 3 x = a[mididx]. We return mididx on step 5, so it returns an i s.t. A[i] = x $\checkmark$
    - Thus if $x \in A[min:max]$, binary search returns an index i such that A[i] = x for all k = max - min!
- Part 2: show that if $x \notin A$, then binary search will return a statement that $x \notin A$. 
    - Suppose $x \notin A$
    - Now suppose binary search returns a value j. The only way binary search returns a value is if A[value] = x, so A[j] must equal x. Then $x \in A$ and we have reached a contradiction.
    - Thus $x\notin A \implies$ binary search will return a statement that $x \notin A$.

In [7]:
def BinarySearchInner(A, x, minimum, maximum):
    if minimum > maximum:
        return str(x) + "not in list"
    mid = (minimum + maximum)//2
    if x < A[mid]:
        return BinarySearch(A, x, minimum, mid-1)
    elif x > A[mid]:
        return BinarySearch(A, x, mid+1, maximum)
    elif x == A[mid]:
        return mid
    
def BinarySearch(A, x):
    return BinarySearchInner(A, x, 0, len(A) - 1)

In [8]:
BinarySearch([1,2,3],2)

1

## Merge sort
see [sorting](nb3Sorting.ipynb) notebook

## Sums
Slide 286

In [18]:
def mySum(L):
    if len(L) == 0:
        return 0
    elif len(L) == 1:
        return L[0]
    elif len(L) == 2:
        return L[0] + L[1]
    else:
        mid = len(L)//2
        return mySum(L[:mid]) + mySum(L[mid:])

In [19]:
mySum([1,2,3])

6

## BucketSort
Slide 287.
Parallel MergeSort. 
- Divide data into regions, one region = one bucket. Sort within the bucket with any algorithm. O(n/m log(n/m))
- Merge together into a fully sorted sequence. O(n)
- Overall O(n + n/m log(n/m))

This implementation isn't actually parallelized. For that, we'd want the line `sorted_chunks = [heapsort(chunk) for chunk in chunks]` to split the `heapsort` processes across multiple processors.

In [20]:
def chunkIt(L, nbuckets):
    n = len(L)
    bucket_size = n // nbuckets
    out = []

    for i in range(nbuckets):
        if i == nbuckets - 1:
            # last, go to end even if bigger than bucket size
            out.append(L[
                i*bucket_size:n
            ])
        else:
            out.append(L[
                i*bucket_size:
                min(i*bucket_size+bucket_size, n)
            ])

    return out

def bucketSort(L, nbuckets):
    # L = list to sort
    # nbuckets = # of buckets to use
    chunks = chunkIt(L, nbuckets)
    
    sorted_chunks = [heapsort(chunk) for chunk in chunks]
    
    while len(sorted_chunks) > 1:        
        merged_chunks = []
        
        for i in range(len(sorted_chunks)//2):
            merged_chunks.append(merge(sorted_chunks[i], sorted_chunks[i+1]))
            
        if len(sorted_chunks) % 2 != 0:
            # if odd number of chunks, last one didnt get merged
            merged_chunks.append(sorted_chunks[-1])
                        
        sorted_chunks = merged_chunks
        
    return sorted_chunks[0]

In [21]:
bucketSort([1, 4, 3, 2, 12], 3)

[1, 2, 3, 4, 12]

## Numerical integration
Slide 289

TODO CHECK THIS

In [108]:
def numerical_integration(function, xrange, n_rect):
    # split
    chunks = np.array_split(np.array(xrange), n_rect)
    
    
    # area under each section
    areas = []
    for i in range(len(chunks)-1):
        r = (chunks[i][0], chunks[i+1][0])
        width = r[-1] - r[0]
        center = 0.5*(function(r[-1]) + function(r[0]))
        height = function(center)
        area = height*width
        areas.append(area)
            
    # combine
    return sum(areas)

In [109]:
def f(x):
    return rex**2

In [110]:
numerical_integration(f, range(0, 12), 12)

32210.75

In [112]:
numerical_integration(f, range(0, 12), 3)

6656.0