### MY470 Computer Programming
# Algorithms and Order of Growth
### Week 9 Lecture: \*\*\* Example Answers \*\*\*

## Space Complexity: Exercise 1

In [27]:
# What is the space complexity of the function pair_list_sum()?

def pair_list_sum(lst):
    """Sum all elements of a list using pair_sum.
    Assume elements of the list are numeric type.
    """
    res = 0
    for i in range(len(lst)):
        if i%2 == 0:
            res += pair_sum(lst[i], lst[i+1])
    return res 
    
def pair_sum(a, b):
    """Take numbers a and b and return their sum a + b."""
    return a + b

pair_list_sum(list(range(100)))

4950

The space complexity of this algorithm is $O(1)$. There are $O(n)$ calls to the function `pair_sum`, where $n$ is the length of `lst`, but they do not exist simultaneously on the call stack. So you only need O(1) space.

## Space Complexity: Exercise 2

In [28]:
# What is the space complexity of the fib() function?

def fib(n):
    """Assume n is a non-negative integer.
    Find the n-th Fibonacci number using recursion.
    """
    if n <= 1:
        return n
    else:
        return fib(n - 1) + fib(n - 2)
    return answer

print(fib(10))

55


The space complexity of this algorithm is $O(n)$. Although we have $O(2^n)$ recursive calls in total, only at most $O(n)$ of them exist at any given time.

## Analyzing and Benchmarking Your Code: Exercise 1

In [19]:
# What is the time complexity and actual runtime 
# of the two get_centroid functions?

import csv

def get_data():
    """Read the file Wholesale customers data.csv 
    and return part of the data as a list of lists.
    """
    with open('Wholesale customers data.csv') as f:
        reader = csv.reader(f)
        data = [[int(i) for i in row[2:]] for row in reader if row[0] != 'Channel']
    return data

def get_centroid(pointLists):
    """Estimate the centroid for a collection of n-dimensional points.
    Assume pointLists is a collection of lists of numerical values.
    Return a list of numerical values (the coordinates of the centroid).
    """
    num = len(pointLists)
    centroids = []
    demension = len(pointLists[0])
    
    for a in range(demension):
        centroids.append(0.0)

    for i in range(num):
        point =  pointLists[i]
        for d in range(demension):
            centroids[d] = centroids[d] + point[d]
            
    for a in range(demension):
        centroids[a] = centroids[a]/num

    return centroids

def get_centroid_2(points):
    """Estimate the centroid for a collection of n-dimensional points.
    Assume pointLists is a collection of lists of numerical values.
    Return a list of numerical values (the coordinates of the centroid).
    """
    centroid = []
    num_points = len(points)
    num_dims = len(points[0])
    for dim in range(num_dims):
        coord = [i[dim] for i in points]
        centroid.append(sum(coord)/num_points)
    return centroid

data = get_data()
print(get_centroid(data))
print(get_centroid_2(data))

[12000.297727272728, 5796.265909090909, 7951.277272727273, 3071.931818181818, 2881.4931818181817, 1524.8704545454545]
[12000.297727272728, 5796.265909090909, 7951.277272727273, 3071.931818181818, 2881.4931818181817, 1524.8704545454545]


In [22]:
from timeit import timeit

# Note that we can import both functions and variables
timeit('get_centroid(data)', 
       setup='from __main__ import get_centroid, data', 
       number=1000)

0.29620604699994146

In [23]:
timeit('get_centroid_2(data)', 
       setup='from __main__ import get_centroid_2, data', 
       number=1000)

0.11942053200027658

* The algorithms have the same order of growth: $O(pd)$ where $p$ is the number of points and $d$ is the number of dimensions
* However, they have different performance. 
  * The first algorithm unnecessarily loops two additional times over $d$. Further, in the second loop, it iterates over the indeces of $p$ instead of the items and indexing adds additional overhead. 
  * The second algorithm is faster even if it does $d$ extra $p$-loops with `sum()` because it uses a list comprehension and a built-in function. The second algorithm is also more concise and easier to follow.

## Analyzing and Benchmarking Your Code: Exercise 2

In [None]:
# The algorithms take two inputs: list coauthors of length c 
# and dictionary author_dic of length d.
# What is the time complexity of the two algorithms?

# Algorithm 1
for k, v in author_dic.items():
    vlst = [sub_list[1] for sub_list in coauthors if sub_list[0] == k and sub_list[1] != k]
    vlst = sorted(vlst)
    author_dic[k] = vlst
    
# Algorithm 2
for i, j in coauthors:
    if j != i:
        author_dic[i].append(j)

The algorithms have different orders of growth:

* Algorithm 2: $O(c)$
    * Note that indexing a dictionary and appending to a list are both $O(1)$
* Algorithm 1: $O(dc \log c)$, or $O(dc)$ if sorting is omitted
    * Sorting is $O(n \log n)$
    * Indexing a dictionary or a list is $O(1)$
    * We have approximately $d (c + c \log c)$ operations but we keep the dominant term only