Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = "Steven Tey"
COLLABORATORS = ""

---

# CS110 Pre-class Work 7.1

## Part A. Median Heap (watch this [video explanation](https://www.youtube.com/watch?v=756_8C9YBZQ&list=PLF_a-qBXTGFektoI6JUOTRL36JlvD04BR&index=5&t=0s) or read this [description](https://stackoverflow.com/a/15319593/7946759))

Throughout this pre-class work, please use the following definition of median: the median of a list of numbers is the one in the middle of the list when the list is ordered. When such the middle element can’t be determined (i.e., in a list of even length), the average of the two middle elements is the median. For example, 5 is the median of [-1,2,4,5,8,10,12], and (5+7)/2=6 is the median of [1,2,3,5,7,8,10,11].

Using the idea from Lesson 3.2, we can use a pair of heaps to create a data structure which allows fast access to the median. Use the heapq module in python to create both a max-heap and a min-heap. Note that by default, the heapq module in python only creates min-heaps, but if we multiply elements by -1 when we store them, then we can also create max-heaps.


## Question 1.
Write a function `add_to_median_heap(minh, maxh, elem)`. It must accept a min heap, a max heap, and an element to add.


In [125]:
import heapq

def add_to_median_heap(minh, maxh, elem):
    
    # Base case for maxh
    if len(maxh) == 0:
        maxh.append(elem)
    
    # When there is no imbalance in the length of the maxh and the minh
    elif abs(len(maxh) - len(minh)) == 0:       
        if elem <= minh[0]:
            maxh.append(elem)
            heapq._heapify_max(maxh)
        else:
            minh.append(elem)
            heapq.heapify(minh)
    
    # When the maxh has one extra element than the minh
    elif len(maxh) - len(minh) == 1:
        # Base case for minh
        if len(minh) == 0:
            if elem > maxh[0]:
                minh.append(elem)
                heapq.heapify(minh)
            else:
                minh.append(heapq.heappushpop(maxh, elem))
                heapq._heapify_max(maxh)
                heapq.heapify(minh) 
        elif elem <= minh[0]:
            minh.append(heapq.heappushpop(maxh, elem))
            heapq._heapify_max(maxh)
            heapq.heapify(minh)
        else:
            minh.append(elem)
            heapq.heapify(minh)
    
    # When the minh has one more element than the maxh
    elif len(minh) - len(maxh) == 1:
        if elem <= minh[0]:
            maxh.append(elem)
            heapq._heapify_max(maxh)
        else:
            maxh.append(heapq.heappushpop(minh, elem))
            heapq._heapify_max(minh)
            heapq.heapify(maxh)

    return (minh, maxh)

## Question 2
Write a function `median(minh, maxh)`. It must return the median element.


In [126]:
def median(minh, maxh):
    if len(maxh) > len(minh):
        return(maxh[0])
    elif len(minh) > len(maxh):
        return(minh[0])
    else:
        return((maxh[0] + minh[0])/2)

In [68]:
# Please ignore this cell. This cell is for us to implement the tests 
# to see if your code works properly. 

## Question 3.

Uncomment and run the testing code given below to test your functions. It should print out numbers ranging from 1 to 50, in that order.

In [129]:
minh = []
maxh = []
for a in range(1,100,2):
    add_to_median_heap(minh, maxh, a)
    print("%.0f" % median(minh, maxh))  

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


## Question 4.
What’s the worst case complexity to build a median heap using `add_to_median_heap`?

The time complexity of the add_to_median_heap is $O(N log N)$, since there are log N layers in each heap and since we are comparing all of the N elements with the root of both heaps before using heap insertion to insert the element into the right position in either of the heaps. Although there are 2 heaps - maxh and minh - the comparison only has to be done once with the root of each heap. Therefore, when you multiply the number of layers ($log N$) of each heap by the time complexity to process each element ($N$), that will give us a time complexity of $O(N log N)$.

## Question 5.
What’s the worst case complexity of `median`?

$O(1)$ - we are just comparing the roots of both ``minh`` and ``maxh``, and returning the appropriate median, this takes a constant time and doesn't increase as the number of elements increase.

## Question 6.

How does this way of finding the median compare with the vanilla way of sorting the list and pick the middle element? Use arguments based on efficiency or clearity of the respective algorithms.

The average time complexity for this method is $O(n) + O(1)$. 

This is considerably faster than the vanilla way, since that takes $O(N log N)$ even with the fastest method. Plus, we don't need to sort the list again and again whenever a new input is added and the list grows bigger and bigger.

## [Optional] Question 7.

Is it possible to extend this idea to any percentile? If it is, then write code to do so. If it’s not possible, prove why it is not possible.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Part B. Quickselect

Quicksort can be modified to find the $k$-th smallest element in an unordered list. This is known as quickselect. It does this by choosing a partition (as in quicksort). Once the list has been partitioned then we know how many elements lie to the left and to the right of the partition. This allows us to recursively call quickselect on the correct sublist.

## Question 1.

Write a function `qselect(lst, k)`, which takes both a list and an index $k$. The function must then return the $k$-th smallest item in the list.


In [91]:
def qselect(lst, k):

    def partition(A, p, r, idx):

        # base case
        if r == p:
            return A[p]

        # Choosing a pivot randomly
        pivot_index = random.randint(p, r)

        # Swap the first element of the list with the pivot
        A[p], A[pivot_index] = A[pivot_index], A[p]

        # Partition function from Session 5.2  
        i = p
        for j in range(p+1, r+1):
            if A[j] < A[p]:
                i += 1
                A[i], A[j] = A[j], A[i]

        A[i], A[p] = A[p], A[i]

        # Partition recursively for one half of the partition
        if idx == i:
            return A[i]
        elif idx < i:
            return partition(A, p, i-1, idx)
        else:
            return partition(A, i+1, r, idx)

    if lst is None or len(lst) < 1:
        return None

    if k < 0 or k > len(lst) - 1:
        raise IndexError()

    return partition(lst, 0, len(lst) - 1, k)

In [92]:
import random
random.seed(123) # introducing a seed for reproducibility purposes
lst1 = list(range(100))
random.shuffle(lst1)
lst2 = []
for a in range(100):
    lst2.append(qselect(lst1, a))
assert(lst2 == list(range(100)))

## Question 2.
Uncomment and run the testing code given below to test your function. It should print out integers from 0 to 99.


In [93]:
import random
random.seed(123) # introducing a seed for reproducibility purposes
lst = list(range(100))
random.shuffle(lst)
for a in range(100):
    print(qselect(lst, a))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99


## Question 3.

Write down the recurrence relation for your code.

$T(n) = T(\frac{n}{2}) + c$

This gives us a time complexity of $O(log n)$.

This is because we are splitting the list into two and just performing qselect on half of the list only. The term $c$ is the time complexity to find the pivot - which is chosen randomly. 

## Question 4.

Solve the recurrence relation for quickselect in the best case.


$T(n) = T(\frac{n}{2})$

In the best case scenario, when the first element is the median, this gives us a time complexity of $O(1)$.

## Question 5.

Solve the recurrence relation for quickselect in the worst case.

$T(n) = T(n-1)$

This gives us a time complexity of $O(n^2)$.

This is when we are starting with the pivot being on the extreme end of the spectrum - either the biggest element or the smallest element in the list