# ECS529U Algorithms and Data Structures
# Lab sheet 3

This third lab gets you to work with big-Θ classes and practically check the efficiency of
sorting algorithms by testing them on randomly generated arrays.

**Marks (max 5):** Question 1: 1.5 | Question 2: 1 | Questions 3-7: 0.5 each

## Question 1 (does not require coding)

For each of the following expressions, find if they are Θ(1), Θ(logn), Θ(n), Θ(nlogn), Θ(<sup>2</sup>), Θ(n<sup>2022</sup>) or Θ(2<sup>n</sup>):
1. 500 + 0.5n + 45logn
2. 5000
3. 42 + nlogn + 5logn + 50n
4. 5nlogn + 2<sup>n</sup> + 300n<sup>2020</sup>n<sup>2</sup>

Find the complexity, in terms of a simple big-Θ class, of the following expression:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5(logn)<sup>13</sup> + 300n<sup>3</sup> + 30nlogn + 100

Finally, consider the following function that counts the duplicate elements inside an array.

    def countDups(A):
        B = A[:]
        selectionSort(B)
        dups = 0
        for i in range (1,len(B)):
            if B[i] == B[i-1]: dups += 1
        return dups
        
Explain, in terms of big-Θ, what is the worst-case time complexity of `countDups` as a function of the size of the array `A`. Note here that the line `B=A[:]` is the same `B=A[0:len(A)]`, that is, it creates a copy of `A` and stores it in `B`.

### P1

1. O(n)
2. O(1)
3. O(nlog(n))
4. O(2^n)
5. O(n^3)

### P2. 
- Copying the array to B is O(n),
- selectionSort is O(n^2), as we look through all the elements remaining each time giving us
n + (n-1) + (n-2) + ... + 1 = n(n+1)/2 time
- and the for loop is also O(n),
- O(n^2) dominates the expression so it's the worst case time complexity

## Question 2

Write a version of insertion sort that works constructively, i.e. returns a new sorted array and leaves the original array unchanged.

For example, if we call this version `insertionSortC`, and run the following code

    A = [30, 25, 67, 99, 8, 16, 28, 63, 12, 20]
    B = insertionSortC(A1)
    print("Original array is: ",A)
    print("Sorted is: ",B)
we get this printout:

    Original array is:  [30, 25, 67, 99, 8, 16, 28, 63, 12, 20]
    Sorted is:  [8, 12, 16, 20, 25, 28, 30, 63, 67, 99]
    
Test your code in at least 5 arrays of your choosing, including the empty array.

In [1]:
def insertionSortC(A):
    if len(A) <= 1:
        return A
    sorted_array = [A[0]]

    for i in range(1, len(A)):
        for j in range(len(sorted_array)):
            if A[i] < sorted_array[j]:
                sorted_array.insert(j, A[i])
                break
            if j == len(sorted_array) - 1:
                sorted_array.append(A[i])
    return sorted_array


test_arrays = [[1, 2, 3, 4, 5], [623,4,324,0,-12], [], [1], [0, 0, 0]]
for array in test_arrays:
    print(insertionSortC(array))

[1, 2, 3, 4, 5]
[-12, 0, 4, 324, 623]
[]
[1]
[0, 0, 0]


## Question 3

For this question you may use Python's built-in function for producing random numbers. If you import Python's built-in module `random` by calling:

    import random

then `random.randint(low,high)` will return a random integer in the range `low` to `high` inclusive (i.e. an integer with an equal chance of it being any of the numbers in that range). Use this to write a Python function:

    def randomIntArray(s,n)

which returns an array of length `s` that in each position has a random integer in the range `0` to `n`.

For example, running `randomIntArray(5,10)` we may get back the array `[6, 2, 3, 9, 1]`, or `[6, 10, 6, 1, 1]`, etc.

In [2]:
import random

def randomIntArray(s, n):
    """
    Args:
    s - size of the array
    n - range of the random numbers from 0 to n

    Returns:
    array of size s with random numbers from 0 to n
    """
    return [random.randint(0, n) for i in range(s)]

## Question 4

Python's built-in function `time()` in the module `time` returns the current time in the form of the number of seconds since 0.00am on 1st January 1970. So, code of the form:

    t = time.time()
    <operation>
    t = time.time()-t
    
will set `t` to the time it takes to perform `<operation>`. As `time.time()` returns a floating point number rather than an integer, this could be a fraction of a second.

Use this to write a Python function:

    def appendTime(A,v)
which takes an array `A` a value `v` as input and creates a new array with the same elements as `A` but with `v` appended at its end (for example, you can use the code of the function `append` that we saw in Lecture 1). The function then returns the new array and the time it took to create it. For example:

    A = [30, 25, 67, 99, 8, 16, 28, 63, 12, 20]
    (B,t) = appendTime(A,42)
    print("Time taken to append is: ",t)
    print("The new array is: ",B)

we can get the printout:

    Time taken to append is:  8.58306884765625e-06
    The new array is:  [30, 25, 67, 99, 8, 16, 28, 63, 12, 20, 42]
Test your code in at least 5 arrays, with lengths 0, 10, 100, 1000 and 10000 respectively.

In [3]:
import time

def appendTime(A,v):
    """
    Args:
    A - array
    v - value

    Returns:
    (B, t) - a tuple of the appended array and the time taken
    """
    t1 = time.time()
    B = A.append(v)
    t2 = time.time() - t1
    return (B, t2)

tests = (randomIntArray(i,100) for i in [0,10,100,1000,10000])
for A in tests:
    (B,t) = appendTime(A,42)
    print("Time taken to append is: ",t)

Time taken to append is:  2.384185791015625e-07
Time taken to append is:  2.384185791015625e-07
Time taken to append is:  2.384185791015625e-07
Time taken to append is:  0.0
Time taken to append is:  2.384185791015625e-07


## Question 5

Write a Python method:

    def sortTimeUsing(sortf,A)
    
which returns the time taken to sort the array `A`, but does the sorting using the sorting function
passed as argument `sortf`. This uses the concept of passing a function as an
argument that was introduced in Question 6 of Lab 1.

Use this to compare the time taken to sort using selection sort
with the time taken to sort using insertion sort, the code of which you can find on the
lecture slides, on some example arrays.

In [4]:
import time

def sortTimeUsing(sortf, A):
    start_time = time.time()
    sortf(A)
    end_time = time.time()
    return end_time - start_time

def SelectionSort(A):
    for i in range(len(A)):
        min_index = i
        for j in range(i + 1, len(A)):
            if A[j] < A[min_index]:
                min_index = j
            A[i], A[min_index] = A[min_index], A[i]


def InsertionSort(A):
    for i in range(1, len(A)):
        if A[i] < A[i-1]:
            j = i
            while j > 0 and A[j] < A[j-1]:
                A[j], A[j-1] = A[j-1], A[j]
                j -= 1
    return A

array1 = [64, 25, 12, 22, 11]
array2 = [64, 25, 12, 22, 11]

time_selection_sort = sortTimeUsing(SelectionSort, array1)
time_insertion_sort = sortTimeUsing(InsertionSort, array2)

print(f"Selection Sort Time: {time_selection_sort}")
print(f"Insertion Sort Time: {time_insertion_sort}")

Selection Sort Time: 4.76837158203125e-06
Insertion Sort Time: 4.291534423828125e-06


## Question 6

Use the method `randomIntArray` from Question 3 to provide arrays to be sorted by
`sortTimeUsing`. This will enable you to test how long it takes to sort an array much longer than
one you could type in yourself. Then, fill in the following table (but see Note).

| array length |  10  | 100 | 1000 | 10<sup>4</sup> | 10<sup>5</sup> | 10<sup>6</sup> |
|:------------|------|-----|------|-------|--------|----------------|
| selection sort time (sec)| |     |      |       |        |                |
| insertion sort time (sec)| |     |      |       |        |                |

For each array length, produce a random array of that length, sort it via `sortTimeUsing` using selection sort and insertion sort (make sure you sort the same array twice!), and fill in the table the corresponding times.

Note: sorting arrays of length greater than 10<sup>5</sup> may make your computer run out of
memory and hang. For that reason, you can skip filling in the last column in the table. If you do want to fill it in, make sure you save everything before and be ready to hard-restart your computer!

It would also make sense to stop a test if it runs over a few minutes and fill in “timeout” in
the respective column.

In [7]:
array_lengths = [10, 100, 1000, 10000, 10000]
for length in array_lengths:
    random_array = randomIntArray(length, 1000)

    time_selection_sort = sortTimeUsing(SelectionSort, random_array.copy())
    time_insertion_sort = sortTimeUsing(InsertionSort, random_array.copy())

    print(f"Array length {length}")
    print(f"Selection Sort Time : {time_selection_sort}")
    print(f"Insertion Sort Time: {time_insertion_sort}")
    

Array length 10
Selection Sort Time : 4.76837158203125e-06
Insertion Sort Time: 3.0994415283203125e-06
Array length 100
Selection Sort Time : 0.00017976760864257812
Insertion Sort Time: 0.00011610984802246094
Array length 1000
Selection Sort Time : 0.02216172218322754
Insertion Sort Time: 0.01528024673461914
Array length 10000
Selection Sort Time : 2.005911350250244
Insertion Sort Time: 1.6648569107055664
Array length 10000
Selection Sort Time : 1.8753111362457275
Insertion Sort Time: 1.5740365982055664


## Question 7

Write a version of insertion sort where the `insert` function uses binary search. More precisely, to insert a value `v` in an array `A` where the part `A[:i]` is sorted, the `insert` function will:
- use binary search to find the position in `A[:i+1]` where `v` needs to be inserted
- move elements from that position one place to the right and insert `v`

What is the complexity of this version of insertion sort: O(n<sup>2</sup>) or O(nlogn)?

In [9]:
def binary_search(arr, val, start, end):
    while start < end:
        mid = (start + end) // 2
        if arr[mid] < val:
            start = mid + 1
        else:
            end = mid
    return start

def insertion_sort(arr):
    for i in range(1, len(arr)):
        val = arr[i]
        pos = binary_search(arr, val, 0, i)

        arr.pop(i)
        arr.insert(pos, val)
    return arr

arr = [4, 2, 7, 1, 3]
sorted_arr = insertion_sort(arr)
print(sorted_arr)

# The time complexity is O(nlog(n))

[1, 2, 3, 4, 7]
