# CSPB 3104 Assignment 3:

***
# Instructions

This assignment is to be completed as a python3 notebook.

The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

 Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

Using markdown you can typeset formulae using latex.
This way you can write nice readable answers with formulae like thus:

The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
where $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__

***

## Question 1

Answer the following questions about heaps.

__1(a)__  Write down an algorithm to find the third smallest element in a minheap with more than $3$ elements. You may write pseudocode or english description of the algorithm's steps. What is the running time complexity on a heap of size $n$? * Assume all elements in the heap are distinct *






The three smallest elements in a minheap are right next to each other. Given that a minheap's smallest element is the first one, the next thing to do is to obtain the second smallest element (one of the root's children). After this, the third smallest element will be decided between the second smallest element's children or the other root's child. Here is the pseudocode for locating the third smallest element:
```
findThirdSmallest(A):
    heap_len = len(A)
    if A[1] > A[2]:
        if A[6] exists:
            ret min(A[1], A[5], A[6])
        elif A[5] exists:
            ret min(A[1], A[5])
        else:
            ret A[1]
    else:
        if A[4] exists:
            ret min(A[2], A[3], A[4])
        else:
            ret min(A[2], A[3])
```

The third smallest element will always be located on the third level maximum (children of the root's children). Checking between these values is only what is needed. The pseudocode checks wether the different indeces are out of bounds and does the comparisons accordingly.

Its complexity is of $\theta(1)$ as all of the operations are constant and there is neither for loops nor recursion involved.

    

__1(b)__ We wish to find the largest element in a min-heap represented by array $A[1], \ldots, A[n]$. Show using a series of examples for $n=7$ that any element starting from $A[\lceil{\frac{n}{2}}\rceil], \ldots, A[n]$ can be the largest element. Your answer should be in the form of 4 min heaps.

As stated before, a min heap's smalllest element is located at the root, or A[1] (the first element in an array).

To prove this, it will be shown by using 4 min heaps as examples.

1) The first example involves a heap with 7 elements. To simplify, all of the heaps will be shown in the form of an array (heapified array).

A = [2,3,5,6,7,8,9]

The condition is true for the first heap, as the largest element (9) is located at the end of the heap, or at A[7]. This satisfies the condition that the largest element of a heap is located between $A[\lceil\frac{7}{2}\rceil] - A[7]$.

2) The second example involves a heap with 8 elements.

A = [2,5,4,7,8,12,7,9]

The condition is true for the second heap as well. The largest element (12) is located at the 6th position of the heap, therefore satisfying the condition that the largest value in this heap lies between $A[\lceil\frac{8}{2}\rceil] - A[8]$.

3) This third heap has 6 elements.

A = [8, 10, 17, 20, 24, 23]

The condition is also true in this case, as the largest element (24) lies in the 5th position of the array. The condition is satisfied once again, as $A[\lceil\frac{6}{2}\rceil] - A[6]$

4) The final heap has 13 elements.

A = [11, 14, 21, 22, 33, 16, 35, 43, 49, 36, 76, 74, 66]

The condition also holds true in this case. 76, the largest element, is located at the 11th position of the array. The condition proves once again to be true: $A[\lceil\frac{13}{2}\rceil] - A[13]$

This shows that the condition shows true for any heapified array: its largest number lies between: $A[\lceil\frac{n}{2}\rceil] - A[n]$


***
## Question 2

Suppose you have an array __A__ of *n* distinct elements.

The following pseudocode finds the k biggest values of __A__:

```
Biggest(A, k): \\returns an array of the k biggest values of A
        mergesort(A)  
        return A[n-k, n]
 ```
 
__2(a)__ What is the complexity of the above algorithm and why?



The complexity of this array is $\Theta(n log(n))$. The only non constant operation done is the mergesort, which takes $\Theta(n log(n))$ the other operation (selecting the k biggest elements, which is selecting the array from k to n) is constant.

__2(b)__ Now suppose that the order of the array was important.  Design and implement an algorithm that returns an array of the k largest elements of __A__ in their original order, and it should run in $\Theta(nk)$ time.

For example, BiggestInOrder([0,5,1,3,4], 3) should return [5,3,4].

In [47]:
def BiggestInOrder(A, k):
    count = 0
    while count < len(A) - k:
        min_val = A[0]
        min_idx = 0
        for i in range(len(A)):
            if A[i] < min_val:
                min_val = A[i]
                min_idx = i
        A[min_idx] = 10000000000000
        count += 1
    final = list(filter(lambda x: x != 10000000000000, A))
    
    return final
        
    
        

# print(BiggestInOrder([0,5,1,3,4], 3))

__2(c)__ If we don't care about the original ordering, then we can use a heap to design an algorithm that runs faster than the one in part (b).  Design and implement an algorithm that returns an array of the k largest elements of __A__ using a heap.

In [67]:
def BiggestOutOfOrder(A, k):
    heapify_max(A)

    final = []
    for _ in range(k):
        final.append(A[0])
        delete(A,0)
    return final

def delete(A, j):
    A[j] = A[-1]
    A.pop()

    bubble_up(A, j)
    bubble_down(A, j)

def heapify_max(A):
    n = len(A)
    for i in range(n // 2 - 1, -1, -1):
        bubble_down(A, i)

def bubble_down(A, j):
    n = len(A)
    while True:
        left = 2 * j + 1
        right = 2 * j + 2

        if left >= n:
            break

        large = left
        if right < n and A[right] > A[left]:
            large = right

        if A[j] >= A[large]:
            break

        swap(A, j, large)
        j = large

def bubble_up(A, i):
    while i > 0:
        parent = (i - 1) // 2

        if A[i] <= A[parent]:
            return

        swap(A, i, parent)
        i = parent
        
def swap(A, i, j):
    temp = A[i]
    A[i] = A[j]
    A[j] = temp
    
    
# A = [0,5,1,3,4]
# B = BiggestOutOfOrder(A, 3)
# print(B)

__2(d)__  What is the complexity of your algorithm for part (c)?

The complexity for this algorithm is $\Theta(k*log(n))$. The implementation uses a max heap, where the code to generate a min heap was adapted to a max heap. As the root is the maximum element, the code deletes the root k times. This leads a list with the kth biggest elements. The time complexity for a delete statement is $\Theta(log(n))$, therefore the whole algorithm runs in $\Theta(k * log(n))$ time.

---
## Testing your solutions -- Do not edit code beyond this point

In [68]:
from random import sample, randint
def testBiggestInOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestInOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], ' of ', a)
            n_failed +=1
            continue
        currIndex = 0
        inOrder = True
        for j in range(0, len(kbiggest)):
            for l in range(currIndex, len(a)):
                if kbiggest[j] == a[l]:
                    currIndex = l
                    break
                if l == len(a) - 1:
                    inOrder = False
        if inOrder == False:
            if n_failed < 10:
                print(' Code failed for input: ', a, 'returned : ', kbiggest, 'last correct index: ', currIndex)
        else:
            n_passed = n_passed + 1

    return n_passed

n_tests = 10000
n_passed = testBiggestInOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000


In [69]:
from random import sample, randint
def testBiggestOutOfOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestOutOfOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], 'where a is', a)
            n_failed += 1
            continue
        n_passed = n_passed + 1
    return n_passed

n_tests = 10000
n_passed = testBiggestOutOfOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000
