# COMP3506 - Data Structures & Algorithms


## Week 1

### Lecture 1 (The RAM Model)
__Memory (abstraction):__ an infinite sequence of cells of size w (bits), where each cell has an address (starting at 1).

__CPU:__ contains registers of size w (bits) - 32 in this course.
-  Can complete four atomic operations:    
    1.  Register (Re-)Initialisation:
        -  Set a register to a fixed value or to the contents of another register
    2.  Arithmetic:
        -  Take values from two registers and perform arithmetic opertations, storing the result in a register.
    3.  Comparison/Branching:
        -  Compare the values of two registers.
    4.  Memory Access:
        -  Take a memory address A, stored in a register, and either:
            -  Read the contents of the memory address A into another (overwriting the existing contents).
            -  Write the contents of another register into the memory cell with address A (overwriting the existing contents).
            
__Algorithm:__ a sequence of atomic operations.

__Cost:__ the length of an algorithm sequence (i.e. the number of atomic operations.

__Word:__ a sequence of w bits (where w is the word length).

### Lecture 2 (Binary Search)
__Worst-Case Running Time (of an algorithm):__ is the largest running time of the algorithm (under a problem size n) on all (possibly infinite) distinct inputs of size n.

##### Binary Search (Pseudocode)
```
1.  let r1 = n,  r2 = v
2.  left <- 1, right <- n
3.  while left <= right
4.      mid <- (left+right)/2
5.      if mid == v:
6.          return "yes"
7.      else if mid > v:
8.          right = mid - 1
9.      else
10.         left = mid + 1
11.  return "no"
```

##### Binary Search (Python Implementation)

In [45]:
S = [3, 4, 12, 37, 64, 119, 131, 133, 187, 224, 233, 267, 307, 344, 353, 377, 382, 395, 449, 465, 471, 477, 490, 499, 509, 536, 559, 574, 593, 609, 642, 686, 692, 705, 713, 718, 730, 784, 803, 849, 882, 898, 919, 949, 954, 956, 969, 976, 990, 996]

In [46]:
n = 50
v = 609
not_v = 610

def binary(n, v, S):
    left = 0
    right = n - 1
    while left <= right:
        mid = (left + right)//2
        # print ("l: " + str(left) + " r: " + str(right) + " m: " + str(mid))
        if S[mid] == v:
            return True
        elif S[mid] > v:
            right = mid - 1
        else:
            left = mid + 1
    return False

print(binary(n, v, S))
print(binary(n, not_v, S))

True
False


### Lecture 3 (Asymptotic Notation)
General rule of thumb is to focus on the largest term.
Why?

    -  Atomic operations differ in time.
    -  Makes for a much simpler calculation.
    -  Focus on performance as n -> infinity , makes constants irrelevant.
    -  Overall objective is to minimise growth in running time.

##### Big O Notation
Take $f(n)$ and $g(n)$ as two functions on $n$.
$f(n)$ grows asymptotically no faster than $g(n)$ if there is a constant $c_1 > 0$ such that $f(n) \leq c_1 * g(n)$ holds $\forall n \geq c_2$, where $c_2$ is a constant.

This is denoted as $f(n) = O(g(n))$.

e.g.

<div style="text-align:center;">
$ 10n = O(5n) $ <br/>
choose constants $c_1 = 2, c_2 = 1$ <br/>
$10n \leq 2 \cdot 5n$ <br/>
$10n \leq 10n \,\forall\, n \geq c_2 = 1.$
</div>

##### Big $\Omega$ Notation
Take $f(n)$ and $g(n)$ as two functions on $n$.
$f(n)$ grows asymptotically no slower than $g(n)$ if there is a constant $c_1 > 0$ such that $f(n) \geq c_1 * g(n)$ holds $\forall n \geq c_2$, where $c_2$ is a constant (Think lower bound, though that's not technically correct, or $f(n)$ will be faster or equal to $g(n)$).

This is denoted as $f(n) = \Omega(g(n))$.

e.g. 

<div style="text-align:center;">
$ log_2 n = \Omega(1) $ <br/>
show that $log_2 n \geq c_1 \cdot 1 $ <br/>
let $c_1 = 1, c_2 = 3$ <br />
$log_2 3 > 1 \geq 1 $  
</div>

##### Big $\Theta$ Notation
If $f(n) = O(g(n))$ and $f(n) = \Omega(g(n)$, then:

$ f(n) = \Theta(g(n)) $

and $f(n)$ is said to grow __asymptotically as fast as__ $g(n)$.

e.g. 

<div style="text-align:center;">
First, show Big O: <br/ >
$n^2 + 2n + 1 = O(n^2) $<br/>
show that $n^2 + 2n + 1 \leq c_1 \cdot n^2\, \forall\, n \geq c_2$ <br/>
let $c_1 = 5$ and $c_2 = 1$ <br/>
$ n^2 + 2n + 1 \leq 5 \cdot n^2 $ holds. <br/>
<br />
Then, show Big $\Theta$: <br/>
$n^2 + 2n + 1 \geq c_1 \cdot n^2 \, \forall \, n\geq c_2 > 0 $ <br/>
let $c_1 = 1, c_2 = 1$ <br/>
$n^2 + 2n + 1 \geq n^2 $ <br/>
</div>

## Week 2
### Exercises


In [47]:
# Problem 1 - Compute value of floor(log2n) in no more than 100log2n time.
r = 17.5678293

def log_floor(r):
    i = 1
    while 2**i < r:
        i += 1
    return i-1

print(log_floor(r))

4


###### Problem 2
Binary search works by setting pointers to the leftmost and rightmost elements (3 and 94, respectively). It then find the middle element (52). If that middle element is the search value (35), the search returns. Given that it's not, and the middle element is greater than the search term, the right element is adjusted to one left of the middle term (45; effectively discarding half the search input). As the middle is not the search term, the middle is calculated again (25), given that it is too low, the left index is recalculated to the middle + 1 (26; discarding the left half). The middle is again calculated (32). It is less than the search term, so the left index is recalculated to middle + 1 (40). The middle is 40, higher than the search term, so the right is recalculated to the middle - 1. Now the right index is no longer greater than the left and the search term has not been found, hence the search term is not in the search set.

In [48]:
# Problem 3 - Predecessor

s = [3, 14, 25, 32, 40, 45, 52, 55, 59, 65, 68, 69, 81, 86, 94]

def predec(S, v):
    left = 0
    right = len(S) - 1
    while right >= left:
        mid = (right + left) // 2
        # print(str(left)+" "+str(mid)+" "+str(right))
        if S[mid] == v:
            return S[mid] 
        elif S[mid] > v:
            right = mid - 1
        elif S[mid] < v:
            left = mid + 1
    if S[mid] < v:
        return S[mid]
    else:
        return None

print(predec(s, 1000))
print(predec(s, 25))

94
25


In [49]:
# Problem 4 - Prefix Counting

s = [3, 14, 25, 32, 40, 45, 52, 55, 59, 65, 68, 69, 81, 86, 94]

def prefix(S, v):
    left = 0
    right = len(S) - 1
    while right >= left:
        mid = (right + left) // 2
        # print(str(left)+" "+str(mid)+" "+str(right))
        if S[mid] == v:
            return mid
        elif S[mid] > v:
            right = mid - 1
        elif S[mid] < v:
            left = mid + 1
    return mid

print(prefix(s, 25))
print(prefix(s, 33))

2
4


In [50]:
# Problem 5 - 3-Sum Problem

s = [3, 14, 25, 32, 40, 45, 52, 55, 59, 65, 68, 69, 81, 86, 94]

def three_sum(S, v):
    n = len(S) - 1
    left = 0
    right = n - 1
    p = permutations(S, 3)
    for trip in p:
        if sum(trip) == v:
            return "yes"
    return "no"

print(three_sum(s, 42))

yes


In [51]:
# Problem 6

s = [3, 14, 25, 32, 40, 45, 52, 55, 59, 65, 68, 69, 81, 86, 94]

def better_three_sum(S, v):
    n = len(S) - 1
    left = 0
    right = n - 1
    for i in range(len(S)):
        y = v - S[i]
        if better_two_ints(S[0:i]+S[i+1:], y):
            return "yes"
    return "no"

print(better_three_sum(s, 42))
print(better_three_sum(s, 43))
print(better_three_sum(s, 104))

yes
no
yes


### Tutorial 1
-  Compiler translates programming language into assembly.
-  Assembly will determine what atomic operations are performed.
-  Assembly differs based on CPU architecture.
-  While we can use the number of (programming language) statements as a rough guide, we don't usually as actual operations performed will depend on the compiler and hardware being used (this process is generally abstracted away).
##### Sum of Two Integers Problem
There is a sequence of n positive integers in strictly increasing order in memory at the cells numbered from 1 up to n. The value n has been placed in Register 1, and the a positive integer v has been placed in Register 2.
Determine whether if there exist two integers x and y (not necessarily distinct) in the sorted sequence such that x + y = v.

In [52]:
#  Naïve Attempt (n**2)
s = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

from itertools import permutations

def two_ints(S, v):
    p = permutations(S, 2)
    for pair in p:
        if pair[0] + pair[1] == v:
            return "yes"
    return "no"

print(two_ints(s, 30))

yes


In [53]:
#  Binary Search Attempt (n + log(n))
def binary_two_ints(S, v):
    for x in S:
        y = v - x
        if binary(len(S), y, S):
            return True
    return False
print(binary_two_ints(s, 30))
print(binary_two_ints(s, 1))

True
False


In [54]:
#  An Even Better Solution (n)
def better_two_ints(S, v):
    n = len(S) - 1
    left = 0
    right = n
    while left != right:
        if S[left] + S[right] == v:
            return True
        elif S[left] + S[right] > v:
            right -= 1
        else:
            left += 1
    return False

print(better_two_ints(s, 30))
print(better_two_ints(s, 1))

True
False


### Lecture 4 (Recursion)

Recursion is based on two steps:
    1.  Base Case:
        -  Solve the case where the problem size is n = 1 and n = 0 (usually trivial).
    2.  Inductive Case:
        -  Solve the problem with problem size n > 1 by reducing n.
    
#### Binary Search Example
Inductive Step:
    -  v = search term
    -  e = middle element of array
    -  if v = e, return True
    -  else:
        -  If v < e, solve the problem in the part of S before e.
        -  If v > e, solve the problem in the part of S after e.

#### The Sorting Problem
A set of n integers is given in an array of length n. The value of n is inside the CPU (i.e. in a register). Design an algorithm to store S in an array where the elements have been arranged in ascending order.

##### Selection Sort
Base Case (n = 1):

1.  nothing to sort.

Inductive Case (n > 1):

1.  Scan all the elements in the array to identify the largest one ( $e_{max}$ ).

2.  Swap the positions of $e_{max}$ and the last element of the array.

3.  Sort the first n - 1 elements.

Running Time: 
    1.  Base case: $ f(n) = O(1) $
    2.  Inductive Case: $ f(n) \leq O(n) + f(n - 1)

$$c_2 \cdot (\frac{n\cdot(n + 1)}{2}) + c_1$$

$$= O(n^2)$$

### Lecture 5 (Merge Sort)

Merge sort uses recursion to sort an array in $O(n log n ))$ time.

-  Base Case:
    -  If n = 1, nothing to Sort
-  Inductive Case:
    -  Recursively sort the first half of the array.
    -  Recursively sort the second half of the array.
    -  Merge the two halves into the final sorted sequence.
           -  Repeat the following until i > n/2 or j > n/2:
               -  If $A_1[i]$ < $A_2[j]$, append $A_1[i]$ to A; i++;
               -  Else: append $A_2[j]$ to A; j++;

Running Time:
    -  Base Case: 
$$f(n) = O(1)$$
    -  Inductive Case:

$$ f(n) \leq 2f(n/2) + O(n) $$
$$ f(n) \leq 2^i f(n/2^i) + i \cdot c_2 n $$
$$ (h = log_2 n) $$
$$ 2^hf(1) + h \cdot c_2 n$$
$$ n \cdot c_1 + c_2 n \cdot log_2 n = O(nlogn) $$

This represents the best running time for any comparison-based approach to the sorting problem (i.e. every comparison-based algorithm must incur $\Omega(nlogn)$ time).