### MY470 Computer Programming
# Order of Growth
### Week 9 Lab: \*\*\* Example Answers \*\*\*

**Exercise 1**: The following functions show the average number of operations required to perform some algorithm on a list of length n. Give the Big-O notation for the time complexity of each algorithm:

a) $4n^2 + 2n + 2$

b) $n + \log n$

c) $n \log n$

d) 3

In [1]:
# Solution: Keep the most dominant terms and ignore constants.

#a) O(n^2)
#b) O(n)
#c) O(n log n)
#d) O(1)


In [None]:
# Exercise 2: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def sum_product(ls):
    summ, product = 0, 1
    for i in range(len(ls)):
        summ += ls[i]
    for j in range(len(ls)):
        product *= ls[j]
    return summ, product    


# Solution: O(n), where n is len(ls). The fact that there are 
# two loops over the list is irrelevant as we ignore constants.


In [None]:
# Exercise 3: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def combine(la, lb):
    for i in la:
        for j in lb:
            if i < j:
                print(i, '-', j)

                
# Solution: O(ab), where a is len(la) and b is len(lb). The function 
# has two different inputs and its runtime depends on the length of both.


In [3]:
# Exercise 4: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def sum_digits(n):
    """Take positive integer n and sum its digits."""
    summ = 0
    while n > 0:
        summ += int(n % 10)
        n = int(n / 10)
    return summ


# Solution: The runtime is the number of digits in the number. 
# A number with d digits is of size up to 10^d. If n = 10^d,
# then d = log n. Hence, the runtime is O(log n).


In [3]:
# Exercise 5: Give the order of growth for the code 
# and explain your reasoning in a couple of sentences.

a = 0;
for i in range(x):
    for j in reversed(range(i, x)):
        a = a + i + j

# Solution: The runtime is O(x^2). The code runs a total of:
# x + (x-1) + (x-2) + ... + 1 + 0 steps. The sum of the first x
# integer numbers can be expressed as x * (x + 1) / 2, so O(x^2).
# If you don't know the formula, you just need to notice that
# there are two nested loops that both depend linearly on size of x.


In [3]:
# Exercise 6: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def factorial(n):
    """Takes non-negative integer n and returns the factorial n!,
    where n! = n * (n-1) * (n-2) ... * 2 * 1
    """
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

# This is straight recursion from n to n-1, then to n-2, and so on
# down to 1. It will take O(n) time.


In [None]:
# Exercise 7: This is code submitted by a student for Problem 2 
# in Problem Set 1. Given an edge list of coauthors in data, 
# the task was to create a sorted list of all unique authors. 
# What is the order of growth of this code? What is wrong here? 
# How would you rewrite the code to make it more efficient?

lst = [] 
for i,j in coauthors:
    lst.append(int(i)) 
    unique_authors = list(set(lst))
    unique_authors.sort()
 
# Solution: The complexity of the code is O(n^2 log n), where
# n is the length of coauthors. The code calls the set function
# and sorts the list n times, which results in n * (n + n log n)
# steps, since sorting is on the order of n log n (assuming 
# the worst-case scenario where each edge introduces a unique author). 
# However, we worry only about the dominant term so this gives us  
# O(n^2 log n). If we un-indent the last statement, we will reduce 
# the complexity to O(n^2). If we were to also remove the set 
# transformation outside of the loop, the complexity is further
# reduced to O(n log n), dictated by the sorting. We can further
# reduce the actual runtime of the code by replacing the loop 
# with a list comprehension.

lst = [int(i) for i, j in coauthors] 
unique_authors = list(set(lst))
unique_authors.sort()

In [9]:
# Exercise 8: Compare the execution time for loops 
# between R and Python using Exercise 4.


# Solution: 
res = 0
for i in range(100):
    start = time.time()
    sum_digits(100000000)
    end = time.time()
    res += end - start
res = res / 100
print('%0.5f microseconds' % (res * 1000000))

# Alternative solution:
res2 = timeit('sum_digits(100000000)', setup = 'from __main__ import sum_digits', number = 100) / 100
print('%0.5f microseconds' % (res2 * 1000000))

6.68526 microseconds
5.67171 microseconds


```r
### R code ###

require(microbenchmark)

sum_digits <- function(n) {
  summ <- 0
  while (n > 0) {
    summ <- summ + as.integer(n %% 10)
    n <- as.integer(n / 10)
  }
  return(summ)
}

microbenchmark(
  sum_digits(100000000),
  times = 100
)

# Unit: microseconds
#              expr   min    lq     mean median    uq      max neval
# sum_digits(1e+08) 6.711 7.896 87.21844  8.291 8.685 7842.249   100
```

In [None]:
# Exercise 9: Create a function to multiply each element of a 
# vector `v` by a scalar `m` in R with and without a for-loop
# and compare their execution time.

```r
# Solution:

### R code ###

require(microbenchmark)

multiply <- function (v, m) {
  for (i in seq_along(v)) {
    v[i] <- v[i] * m 
  }
  return(v)
}

multiply2 <- function(v, m) {
  return(v * m)
}

microbenchmark(
  'for-loop' = multiply(1:10000, 2),
  'vectorized' = multiply2(1:10000, 2),
  times = 100
)
        
# Unit: microseconds
#       expr     min      lq       mean   median       uq     max neval
#   for-loop 767.014 804.122 1008.78336 828.2015 890.5730 6059.92   100
# vectorized  17.764  21.318   52.12061  42.2400  45.2005 1213.09   100
```