**CS560 - Algorithms and Their Analysis**
<br>
Date: **12 February 2021**
<br>

Title: **Lecture 4**
<br>
Speaker: **Dr. Shota Tsiskaridze**
|
Bibliography:
<br> 
 **Chapter 6** of Bhargava, Aditya Y., *Grokking Algorithms*, Manning, 2016  [2].
 


<h1 align="center">Heapsort</h1>

<h3 align="center">Sorting Algorithms</h3>

- In previous lectures we introduced several algorithms that sort $n$ real numbers.


- **Insertion-Sort**:
  - Asymptotic **running time** is $\Theta(n^2)$ in the **worst case**.
  - It is a fast **in-place** sorting algorithm for small input sizes.

- **Merge-Sort**:
  - Asymptotic **running time** is $\Theta(n \lg n)$ in the **worst case**.
  - It **does not** operate **in-place**.


- Now we introduce another algorithm, called **heapsort**, that sorts arbitrary real numbers **in-place** in $O(n \lg n)$ time.
  
  Thus, **heapsort** combines the better attributes of the two sorting algorithms.


- **Note**: **In-place** means that **only a constant number** of array elements are **stored** outside the input array at any time.

<h3 align="center">Heaps</h3>

- **Heapsort** introduces another algorithm design technique: **heap** (გროვა), uses a data structure to manage information.


- **Heap** data structure is useful not only for **heapsort**, but it also makes an efficient **priority queue**.


- The (**binary**) **heap** data structure is an **array object** that we can view as a **nearly complete binary tree**:

<center><img src="images/L4_Heap.png" width="800" alt="Example" /></center>

- Each **node** of the binary tree **corresponds** to an **element of the array** $A$.


- The **binary tree** is completely **filled** on **all levels** except possibly the lowest, which is filled from the left up to a point.


- An **array** $A$ that represents a heap is an object with **two attributes**:
  - $A.lenght$, which gives the **number of elements** in the array $A$.
  - $A.heapsize$, which represents **how many elements** in the heap are stored within array $A$.


- Thus, only the elements in $A[0.. A.heapsize]$, where $0 \leq A.heapsize \leq A.length$, are **valid elements** of the heap.


- We define the **height** of the **heap** to be the **height of its root**.

- The **height** of a **node** in a heap is defined to be the **number of edges** on the longest simple **downward path** **from the node to a leaf**.


- The root of the tree is $A[0]$, and given the index $i$ of a node, we can easily compute the indices of its **parent**, **left child**, and **right child**:

In [87]:
def parent(i):
    return (i-1)//2

def left(i):
    return 2*i + 1

def right(i):
    return 2*i + 2

- **Questions**:
  - What are the **minimum** and **maximum** **numbers of elements** in a heap of **height** $h$?
  - If the heap has $n$-**elements**, what is its' **height**?

<h3 align="center">Heap Property</h3>

- There are **two kinds** of binary heaps: **max-heaps** and **min-heaps**.


- In both kinds, the values in the nodes satisfy a **heap property**, the specifics of which depend on the kind of heap.


- In a **max-heap**, the **max-heap property** is that for every node $i$ other than the root:

  $$A[parent(i)] \geq A[i].$$
  
  Thus, the **largest element** in amax-heap is stored at the **root**.
  
  For the **heapsort** algorithm, we use **max-heaps**.


- In a **min-heap**, the **min-heap property** is that for every node $i$ other than the root:

  $$A[parent(i)] \leq A[i].$$
  
  Thus, the **smallest element** in a min-heap is at the **root**.
  
  For the **priority queues**, we use **min-heaps**.

- **Questions**:
  - Assuming that **all elements** are **distinct**, where in a **max-heap** might the **smallest element reside**?
  - Is an **ascending sorted array** the **min-heap**?
  - Is the array with values $A = [23, 17, 14, 6, 13, 10, 1, 5, 7, 12]$ a **max-heap**?
  - If the heap has $n$-elements, what are the **leaf node indices**?

<h3 align="center">Basic Procedures</h3>

- Let's present some **basic procedures** that are used in a **heapsort** algorithm and a **priority-queue** data structure:

  - The **Max-Heapify** procedure, which runs in $O(\lg n)$ time, is the **key** to maintaining the **max-heap property**.
  - The **Build-Max-Heap** procedure, which runs in **linear time**, produces a **max-heap** from an **unordered input array**.
 - The **Heapsort** procedure, which runs in $O(n \lg n)$ time, **sorts an array in place**.
 
 - The **Max-Heap-Insert, Heap-Extract-Max, Heap-Increase-Key**, and **Heap-Maximum** procedures, which run in $O(\lg n)$ time, **allow the heap** data structure to **implement** a **priority queue**.

<h3 align="center">Max-Heapify Procedure</h3>


- In order to **maintain** the **max-heap property**, we call the procedure **Max-Heapify**:
  - It assumes that the **binary trees rooted** at $left(i)$ and $right(i)$ are **max-heaps**, but the **element** $A[i]$ might be smaller than its children, thus **violating** the **max-heap property**.
  - It lets the value at $A[i]$ **float down** in the **max-heap** so that the **subtree rooted** at index $i$ **obeys** the **max-heap property**.
  


In [145]:
def maxHeapify(A, i):
    global heapsize
    l = left(i)
    r = right(i)
    if l <= heapsize and A[l] > A[i]:
        largest = l
    else:
        largest = i
    if r <= heapsize and A[r] > A[largest]:
        largest  = r
    if largest != i:
        exchange(A, i, largest)
        maxHeapify(A, largest)

In [146]:
def exchange(A, i, j):
    temp = A[i]
    A[i] = A[j]
    A[j] = temp

- The action of $\texttt{maxHeapify(A,1)}$ procedure, where $A = [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]$, is shown on the figure below:

<center><img src="images/L4_MaxHeapify.png" width="500" alt="Example" /></center>


- The **running time** of **Max-Heapify** on a **subtree of size** $n$ rooted at a given **node** $i$ is a sum of:
  - the $\Theta(1)$ **time** to fix up the relationships among the elements $A[i]$, $A[left(i)]$, and $A[right(i)]$
  - the **time to run MAX-HEAPIFY** on a subtree rooted at one of the children of node $i$ (assuming that the recursive call occurs).


- The **children’s subtrees** each have **size** at most $2n/3$, where the **worst case** occurs when the **bottom level** of the tree is **exactly half full**.


- Thus, we can describe the **running time** of **Max-Heapify** by the recurrence:

  $$T(n) \leq T(2n/3) + \Theta(1).$$
  
- **Question**: Which **case** of the **Master Theorem** to use?


- **Final Answer**: $T(n) = O(\lg n)$.

<h3 align="center">Build-Max-Heap Procedure</h3>

- We can use the **Max-Heapify** procedure in a **bottom-up** manner to **convert** an array $A[0..n]$, where $n = len(A)-1$, into a **max-heap**.


- As we have shown, the **elements** in the subarray $A \left [ \left ( \lfloor n/2 \rfloor + 1 \right ) .. n \right ]$ are all **leaves of the tree**, and so each is  a $1$-element heap to begin with.
  
  
- The **Build-Max-Heap** procedure **goes through** the **remaining nodes** of the tree and runs **Max-Heapify** on **each one**.

<center><img src="images/L4_BuildMaxHeapify.png" width="500" alt="Example" /></center>


In [147]:
def buildMaxHeap(A):
    heapsize = len(A)-1
    for i in range(len(A)//2-1, -1, -1):
        maxHeapify(A, i)

In [148]:
A = [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]
print(A)
heapsize = len(A)-1
buildMaxHeap(A)
print(A)

[16, 4, 10, 14, 7, 9, 3, 2, 8, 1]
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]


- **Question:** What is the **loop invariant** of the **Build-Max-Heap** procedure?


- At the start of **each iteration** of the **for** loop of **lines** $3–4$, each **node** $i+1, i+2, \cdots, n$ is the **root** of the **max-heap**.


- We can **compute** a simple **upper bound** on the **running time** of **Build-Max-Heap** procedure.


- Each call to **Max-Heapify** costs $O(\lg n)$ time, and  **Build-Max-Heap** makes $O(n)$ such calls. 
 
 
- Thus, the **running time** is $O(n \lg n)$. This **upper bound**, though correct, is **not asymptotically tight**!

<h3 align="center">The Heapsort Algorithm</h3>

- The heapsort algorithm starts by using **Build-Max-Heap** procedure to build a max-heap on the input array $A[0, ..n]$  where $n = len(A)-1$.


- Since the **maximum element** of the array is stored at the **root** $A[0]$, we can **put it** into its **correct final position** by exchanging it with $A[n]$.


- If we now discard **node** $n$ from the **heap**, we observe that the **children of the root** remain **max-heaps**.


- We can do so by simply **decrementing** $heapsize$.


- But the **new root element** might **violate** the **max-heap property**.


- We need to call **Max-Heapify** to **restore** the **max-heap property**, which leaves a **max-heap** in $A[0..n-1]$


- The **heapsort** algorithm then **repeats this process** for the **max-heap** of **size** $n-1$ down to a **heap** of **size** $2$.


In [150]:
def heapsort(A):
    buildMaxHeap(A)
    print(A)
    heapsize = len(A)-1
    for i in range(len(A)-1, 0, -1):
        exchange(A, 0, i)
        heapsize = heapsize-1
        maxHeapify(A, 0)
        print(A)

In [151]:
A = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
print(A)
heapsort(A)

[4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
[16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
[14, 8, 10, 4, 7, 9, 3, 2, 1, 16]
[10, 8, 9, 4, 7, 1, 3, 2, 14, 16]
[9, 8, 3, 4, 7, 1, 2, 10, 14, 16]
[8, 7, 3, 4, 16, 1, 9, 10, 14, 2]
[7, 16, 3, 4, 2, 8, 9, 10, 14, 1]
[16, 7, 3, 4, 2, 8, 9, 10, 14, 1]
[7, 16, 3, 14, 2, 8, 9, 10, 4, 1]
[16, 14, 7, 10, 2, 8, 9, 3, 4, 1]
[16, 14, 7, 10, 2, 8, 9, 3, 4, 1]


- **Heapsort** is an **excellent algorithm**, but a good implementation of **quicksort** usually **beats it in practice**.


- Nevertheless, the **heap data structure** itself **has many uses**.

<h3 align="center">Priority Queues</h3>

- The **most popular** applications of a **heap**: as an efficient **priority queue**.


- As with **heaps**, **priority queues** come in **two forms**: **max-priority queues** and **min-priority queues**.


- We will **focus** onnly on how to implement **max-priority queues**.


- A **priority queue** is a data structure for **maintaining** a **set** $S$ **of elements**, each with an associated value called a **key**.

<center><img src="images/L4_Priority_Queue.jpg" width="800" alt="Example" /></center>



- A **max-priority queue** supports the following **operations**:
  - $\texttt{insert(S, x)}$: **inserts the element** $x$ into the set $S$, which is equivalent to the operation $S = S \cup \{x\}$.
  - $\texttt{maximum(S)}$: **returns the element** of $S$ with the **largest key**.
  - $\texttt{extractMax(S)}$: **removes and returns** the **element** of $S$ with the **largest key**.
  - $\texttt{increaseKey(S, x, k)}$: **increase the value** of element $x$’s **key** to the **new value** $k$.

In [203]:
import numpy as np

def insert(A, key):
    global heapsize
    heapsize = heapsize + 1
    A[heapsize] = - np.inf
    increaseKey(A, heapsize, key)

In [204]:
def maximum(A):
    return A[0]

In [205]:
def extractMax(A):
    global heapsize
    if heapsize < 1:
        return -1
    max = A[0]
    A[0] = A[heapsize]
    heapsize = heapsize - 1
    maxHeapify(A, 1)
    return max

In [206]:
def increaseKey(A, i, key):
    if key < A[i]:
        return -1
    A[i] = key
    while i > 0 and A[parent(i)] < A[i]:
        exchange(A, i, parent(i))
        i = parent(i)

- Let's see these functions in use:

In [207]:
A = [1, 2, 4, 8, 16, 32, 64]
print(A)

[1, 2, 4, 8, 16, 32, 64]


In [208]:
heapsize = len(A)-1
buildMaxHeap(A)
print(A)

[64, 16, 32, 8, 2, 1, 4]


In [209]:
increaseKey(A, 4, 128)
print(A)

[128, 64, 32, 8, 16, 1, 4]


In [210]:
print(maximum(A))

128


In [211]:
heapsize = 4
print(heapsize)
print(extractMax(A))
print(heapsize)
print(A)

4
128
3
[16, 64, 32, 8, 16, 1, 4]


In [212]:
heapsize = 0
insert(A, 5)
print(A)

[16, 5, 32, 8, 16, 1, 4]


In [218]:
B = [0, 0, 0, 0, 0, 0, 0]
heapsize = 0
for i in range(len(B)-1):
    insert (B, i)
print(B)

[5, 2, 4, 0, 1, 0, 3]


<h1 align="center">End of Lecture</h1>