# CSPB 3104 Assignment 3:

***
# Instructions

This assignment is to be completed as a python3 notebook.

The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

 Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

Using markdown you can typeset formulae using latex.
This way you can write nice readable answers with formulae like thus:

The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
where $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__

***

## Question 1

Answer the following questions about heaps.

__1(a)__  Write down an algorithm to find the third smallest element in a minheap with more than $3$ elements. You may write pseudocode or english description of the algorithm's steps. What is the running time complexity on a heap of size $n$? * Assume all elements in the heap are distinct *






A min heap that obeys all rules always has the minimum element in the array as the root.  
This algorithm is a variation of Heap Sort that stops early.  
To extract the third smallest element, first extract the smallest element by calling delete on array[0],  
which swaps array[0] with array[N-1], (where N=len(array)). Then adjust the length of the array to N-1,  
then call bubble down on array[0] to maintain a proper min heap. Now, the second largest element is at the root.  
Next, do these same sequence of steps once more so that the third largest element is at the root, and simply return this value.  

Both delete and bubble down have a complexity $\Theta(Log_{2}N)$,  
because a maximum of  $\Theta(Log_{2}N)$ steps are taken to bubble a value down to its proper place.  
Delete (bubble down) is run 2 times, and returning the root (third smallest value overall) is $\Theta(1)$  
so that T(N)=$\Theta(2Log_{2}N)$+$\Theta(1)$, which, when constants are removed, becomes  
$\Theta(LogN)$, the overall complexity.  

Note that swap and returning the min element are each trivial, being $\Theta(1)$, so they do not affect the overall complexity.  



__1(b)__ We wish to find the largest element in a min-heap represented by array $A[1], \ldots, A[n]$. Show using a series of examples for $n=7$ that any element starting from $A[\lceil{\frac{n}{2}}\rceil], \ldots, A[n]$ can be the largest element. Your answer should be in the form of 4 min heaps.

The maximum element in a min heap can be present at any position in the range $A[\lceil{\frac{n}{2}}\rceil], \ldots, A[n]$  
Here are 4 min heap examples to indicate this for a heap of size 7.  
In each example, one of the indicated positions has the largest element (A[4], A[5], A[6], or A[7]).  


Note that a node in a min heap must just have a value <= to the value of both its children, and this is the main requirement.  

Case 1 (Largest value in A[4]): A[1, 2, 3, 9, 7, 4, 5]  
Case 2 (Largest value in A[5]): A[1, 2, 3, 7, 9, 4, 5]   
Case 3 (Largest value in A[6]): A[1, 2, 3, 4, 5, 7, 6]   
Case 4 (Largest value in A[7]): A[1, 2, 3, 4, 5, 6, 7]   


***
## Question 2

Suppose you have an array __A__ of *n* distinct elements.

The following pseudocode finds the k biggest values of __A__:

```
Biggest(A, k): \\returns an array of the k biggest values of A
        mergesort(A)  
        return A[n-k, n]
 ```
 
__2(a)__ What is the complexity of the above algorithm and why?



Merge sort has an overall complexity of $\Theta(nLogn)$ in the worst case.  
The problem recursively divides the array into halves, until only one element remains, which takes $Log_{2}n$ steps.  
It then makes a maximum of n comparisons in each level, and merging is linear, deriving the overall complexity of $\Theta(nLogn)$.

The final step of the algorithm simply returns a list slice of the k biggest values, which can be done in a max of $\Theta(k)$ steps. 
  
If you want to return (k=n) all n elements in the worst case this is $\Theta(n)$.

So, $T(n)=\Theta(nLogn)+ \Theta(n)$, and the leading term is $\Theta(nLogn)$.  
**So the overall complexity is $T(n)=\Theta(nLogn)$ in the worst case.**  

Here is another derivation of merge sort's complexity:  
The recurrence for MS is T(n) = 2T(n/2) + n. Each recursive call divides the problem into 2 subproblems of size  (N/2),  
and merging is done in linear time. Applying Case 2 of the Master Theoerem, for MS,  
the worst case complexity is $\Theta(NLogN)$.  

Using the recurrence tree, merge sort's complexity is also given by:  
$O(\sum_{i=0}^{k} 2^{i} * (n/2^{i}))=O(NLogN)$  

__2(b)__ Now suppose that the order of the array was important.  Design and implement an algorithm that returns an array of the k largest elements of __A__ in their original order, and it should run in $\Theta(nk)$ time.

For example, BiggestInOrder([0,5,1,3,4], 3) should return [5,3,4].

In [1]:
#Strategy:Build a max heap out of the array (Theta(N))
#then find k+1 largest value
#by swapping A[0] and A[N-1] and rebuilding heap k times (Theta(1)+K*(Theta(N)).
#The k+1 largest value is now at root.
#Store this value, and find the original index of this value in array. (Theta(N))
#Partition the original array around the index of k+1th value, so that the k+th largest values will be at the front of list, (Theta(N))
#and finally return a slice of these. (Theta(1))

#The overall complexity is then Theta(N).


#Partition array around value at index P
def partition(A, low, high, P):
    
    #i+1 is always next index to place values in R1, where values > pivot. 
    #R2 has values <= pivot,
    #R3 has values unprocessed
    
    #always initialize i to (low-1)! KEY!!
    i=low-1
    p=A[P] #select pivot
    #j is idx of unprocessed value
    for j in range (low, high):
        if A[j]>p: #***********************************************************
            #increment i so that it A[i] swapped with A[j], adding to R1
            i=i+1
            #swap em
            A[i], A[j] = A[j], A[i]
        #else, j is automatically incremented by for loop, extending R2
    #finally, place pivot in correct location by swapping, return its index
    A[i+1], A[high]= A[high], A[i+1]
    #i+1 Is KEY
    return i+1

        
#Theta(LogN)
#heapify array
#N is len(A), i is starting index
def heapify(A, N, i):
    left=2*i+1
    right=2*i+2
    #initialize max to i
    maxx=i
    #check if left child exists, and assign max to left if approp.
    if(left<N):
        if(A[i]<A[left]):
            maxx=left
        
    #check if two children exist, then assign max. Falls through from prev if if two children exist.
    if(right<N):
        if(A[right]>A[maxx]):
            maxx=right
    #swap parent and child if parent is not largest
    if (maxx == left) or (maxx == right):
        #Python has useful 1 line swap function
        A[i],A[maxx] = A[maxx],A[i]  
        #recursively call heapify on maxx index, which does bubble down until leaf node calls heapify(base case)
        heapify(A, N, maxx)

              
def BiggestInOrder(A, k):
    #if the whole array is requested, return it
    if k==len(A):
        return A
    #create separate copy of A
    copy=A.copy()
    N=len(copy)
    last=(N//2)-1
    #create a max heap: heapify is Theta(N)
    for i in range(last, -1, -1): 
         heapify(copy, N, i)
    N=N-1
    #find k+1 largest value
    #Basically build heap: k*Theta(N) to find k+1 largest value 
    for i in range (0, k):
         #store largest element
         #swap largest element with last element then bubble down
         copy[0], copy[N]=copy[N], copy[0]
        #maintain heap by bubbling down 
         heapify(copy, N, 0)
        #Must reduce heap size to N-1 each time.
         N=N-1
    #val now k+1 biggest value overall
    val=copy[0]
    #find original index of val, in list A. Store in i
    #Theta(N)
    for i in range(0, len(A)):
        if A[i]==val:
            break
    #print(i)
    #partition around k+1 biggest value, so that k largest values will be in original order at front of list.
    #k*Theta(N)
    partition(A, 0, len(A)-1, i)
    #print(A)
    j=0
    L=[]
    #append values to return list
    while j<k:
        L.append(A[j])
        j+=1
    #print(L)
    return L

__2(c)__ If we don't care about the original ordering, then we can use a heap to design an algorithm that runs faster than the one in part (b).  Design and implement an algorithm that returns an array of the k largest elements of __A__ using a heap.

In [2]:
#Given an array A, size N, and subtree rooted at i, create Max heap
def heapify(A, N, i):
    left=2*i+1
    right=2*i+2
    #initialize max to i
    maxx=i
    #check if left child exists, and assign max to left if approp.
    if(left<N):
        if(A[i]<A[left]):
            maxx=left
        
    #check if two children exist, then assign max. Falls through from prev if if two children exist.
    if(right<N):
        if(A[right]>A[maxx]):
            maxx=right
    #swap parent and child if parent is not largest
    if (maxx == left) or (maxx == right):
        #Python has useful 1 line swap function
        A[i],A[maxx] = A[maxx],A[i]  
        #recursively call heapify on maxx index, which does bubble down until leaf node calls heapify(base case)
        heapify(A, N, maxx)

#sorts and retuns largest K numbers in reverse order.
def heapHelp(A, k):
    N=len(A)
    L=[]
    last=(N//2)-1
    #Note that the last node with children is found by (N//2-1), which is the first node that needs to be heapifed/bubbled down
    #need second arg to -1 so that loop goes to and including zero
    for i in range(last, -1, -1): 
         heapify(A, N, i)
         #extract k largest terms 
    #make N=N-1 for convenience of loop below
    N=N-1
    #extract largest k elements in reverse order
    for i in range (0, k):
         #store largest element
         L.append(A[0])
         #swap largest element with last element then bubble down
         A[0], A[N]=A[N], A[0]
        #maintain heap by bubbling down 
         heapify(A, N, 0)
        #Must reduce heap size to N-1 each time.
         N=N-1
    return L

#simply calls helper functions, above
def BiggestOutOfOrder(A, k):
    if k==len(A):
        return(A)
    L=heapHelp(A,k)
    return L   

__2(d)__  What is the complexity of your algorithm for part (c)?

HeapHelp() makes a max heap by calling heapify, which can be done in $\Theta(N)$ time,  
deriving  $\Theta(N)$ complexity for the first part of heapsort.  
The bottom loop in heapHelp then runs k times, appending k values (in $\Theta(1)$ time),  
swapping k values to the end of the list (in $\Theta(1)$ time) then calling heapify k times in  (K)$\Theta(N)$ time.  
The overall $T(N)=\Theta(N)+(K)\Theta(N) + 2\Theta(1)$, which simplifies to (by getting rid of constants and non-leading terms)  
**$\Theta(N)$ for the worst case overall complexity.** 

Part b) Also runs in $\Theta(N)$ time, but is slower in practice due to more necessary operations to find the values in order. See comments in 2b) for more detailed information.  

---
## Testing your solutions -- Do not edit code beyond this point

In [3]:
from random import sample, randint
def testBiggestInOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestInOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], ' of ', a)
            n_failed +=1
            continue
        currIndex = 0
        inOrder = True
        for j in range(0, len(kbiggest)):
            for l in range(currIndex, len(a)):
                if kbiggest[j] == a[l]:
                    currIndex = l
                    break
                if l == len(a) - 1:
                    inOrder = False
        if inOrder == False:
            if n_failed < 10:
                print(' Code failed for input: ', a, 'returned : ', kbiggest, 'last correct index: ', currIndex)
        else:
            n_passed = n_passed + 1

    return n_passed

n_tests = 10000
n_passed = testBiggestInOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000


In [4]:
from random import sample, randint
def testBiggestOutOfOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestOutOfOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], 'where a is', a)
            n_failed += 1
            continue
        n_passed = n_passed + 1
    return n_passed

n_tests = 10000
n_passed = testBiggestOutOfOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000
