# Calculating Complexity

## Lesson Overview

Here are some helpful rules for calculating the complexity of a program.

- Variable assignments are $O(1)$ for time vary for space.

- Arithmetic operations such as addition and subtraction are $O(1)$ for both time and space.

- Creating an empty array or map is $O(1)$ for both time and space.

- A `for` or `while` loop is $O(n)$ for time and $O(1)$ for space, where $n$ is the number of iterations.

- When an operation occurs within a loop, the total complexity that the operation contributes to the program is the product of the single-line complexity of the operation and the number of iterations of the loop. For example, an $O(1)$ operation within a loop of $n$ iterations contributes a complexity of $O(n)$.

- When a loop occurs within another loop, the total number of iterations is the product of the number of iterations of each loop. For example, if a loop of $m$ iterations is contained within a loop of $n$ iterations, the combination of both loops has a total of $nm$ iterations.

In general, the complexity of a function or program is calculated by calculating the complexity of each line, and adding them together. Since $O(1)$ is the lowest complexity, any $O(1)$ operations can be ignored. (Regardless of how many $O(1)$ operations there are, the sum of them will still be $O(1)$.)

### Calculating the complexity of a function

Consider this function that creates `n` arrays of length `n`.

In [None]:
def fun_with_complexity(n):
  """Creates an array with n arrays of length n."""
  outer = []
  counter = 0
  for _ in range(n):
    inner = []
    while len(inner) < n:
      inner.append(counter)
      counter += 1
    outer.append(inner)
  return outer

In [None]:
fun_with_complexity(5)

Before looking at the commented code below, try to calculate the time and space complexity of each single line of this function.

In [None]:
def fun_with_complexity(n):
  """Creates an array with n arrays of length n."""
  # The following are the complexities of each line alone, if executed once.
  outer = [] # time: O(1), space: O(1)
  counter = 0 # time: O(1), space: O(1)
  for _ in range(n): # time: n iterations, space: 0
    inner = [] # time: O(1), space: O(1)
    while len(inner) < n: # time: n iterations, space: 0
      inner.append(counter) # time: O(1), space: O(1)
      counter += 1 # time: O(1), space: 0
    outer.append(inner) # time: O(1), space: O(n)
  return outer # time: O(1), space: depends if outer is assigned to a variable

Since the outer `for` loop has $n$ iterations and the inner `while` loop has $n$ iterations, the operations within the `while` loop are executed $n^2$ times, and the operations within the `for` loop that are not within the `while` loop are executed $n$ times. This allows us to calculate the total time and space complexity that each line contributes.

In [None]:
def fun_with_complexity(n):
  """Creates an array with n arrays of length n."""
  # The following are the complexities that the line contributes to the
  # function, when executed the correct number of times.
  outer = [] # executed 1 time, time: O(1), space: O(1)
  counter = 0 # executed 1 time, time: O(1), space: O(1)
  for _ in range(n): # executes n times
    inner = [] # time: n*O(1) = O(n), space: n*O(1) = O(n)
    while len(inner) < n: # executes n^2 times
      inner.append(counter) # time: n^2*O(1)=O(n^2), space: n^2*O(1) = O(n^2)
      counter += 1 # time: n^2*O(1) = O(n^2), space: n^2*0 = 0
    outer.append(inner) # time: n*O(1) = O(n), space: n*O(n) = O(n^2)
  return outer # executes 1 time, time: O(1), space: depends if outer is assigned to a variable

Therefore, remembering that $O(1)$ operations can be ignored and that a function grows as fast as its fastest growing part:

\begin{align*}
time\_complexity &= O(n) + O(n^2) + O(n^2) + O(n) \\
&= O(n^2) \\
space\_complexity &= O(n) + O(n^2) + O(n^2) \\
&= O(n^2) \\
\end{align*}

### Best and worst case complexity

In general, the time complexity of an algorithm can vary on the input. In some cases, this variance can be drastic.

> When an algorithm's inputs are such that the algorithm performs as quickly as it can possibly perform, its time complexity for those inputs is the **best case time complexity**. Similarly, when an algorithm's inputs are such that the algorithm performs as slowly as it can possibly perform, its time complexity for those inputs is the **worst case time complexity**.

Similarly for space complexity, the best case space complexity is the smallest possible space complexity at which the algorithm can perform, and the worst case space complexity is the largest possible space complexity at which the algorithm can perform.

### Best and worst case complexity of a function

For example, consider a function that searches an array for whether it contains the value zero.

In [None]:
def contains_zero(arr):
  """Returns true if an array of integers contains 0."""
  for i in arr:
    if i == 0:
      return True
  
  return False

In the **best case**, `arr[0] == 0`, so the `for` loop exits after only one iteration. In this case, `contains_zero` is $O(1)$ for time.

In the **worst case**, 0 is in not in `arr` (or it is the last element of `arr`), in which case the `for` loop has to check all of the $n$ elements of `arr`. In this case, `contains_zero` is $O(n)$ for time.

## Question 1

Which of the following statements about best and worst case time complexity are true?

* An algorithm's best case time complexity must be quicker than its worst case time complexity.
  * Incorrect - If an algorithm has the same time complexity in all cases, then its best and worst case time complexities are the same.
* An algorithm's best case time complexity is the quickest possible time complexity over all inputs.
  * Correct
* An algorithm's worst case space complexity occurs when the algorithm's inputs are such that the algorithm requires more space than it would for any other set of inputs.
  * Correct
* An algorithm's best case time complexity occurs when its worst case space complexity occurs.
  * Incorrect - While it is often the case that space complexity increases when time complexity decreases, not all algorithms have this property.

### Solution

The correct solutions are **b)** and **c)**. 

**a)** If an algorithm has the same time complexity in all cases, then its best and worst case time complexities are the same.

**d)** While it is often the case that space complexity increases when time complexity decreases, not all algorithms have this property.

## Question 2

Consider the following function.

```python
def prod(arr):
  """Returns the product of the non-zero elements of arr."""
  # Block 1: variable assignment
  output = 1

  # Block 2: for loop
  for i in arr:

    # Block 3: if statement
    if i == 0:
      continue
    
    output *= i
  
  # Block 4: return statement
  return output
```

Which blocks of code contribute to the time complexity? That is, which blocks of code have a big-O time complexity greater than $O(1)$?

**a)** Block 1: variable assignment

**b)** Block 2: `for` loop

**c)** Block 3: `if` statement

**d)** Block 4: `return` statement


### Solution

The correct answers is **b)**. 

## Question 3

What is the big-O time and space complexity of the following function that finds the mean of an array of integers?

In [None]:
def mean(arr):
  """Finds the mean of a list of integers."""
  sum = 0
  len = 0
  for i in arr:
    sum += i
    len += 1
  
  # Coerce sum to float here so that the division will be float, not int.
  return float(sum) / len

In [None]:
#freetext

### Solution

Suppose $n$ is the length of the input list `arr`. Let's look at the big-O complexity of the function line by line.

In [None]:
def mean(arr):
  """Finds the mean of a list of integers."""
  sum = 0 # time: O(1), space: O(1)
  len = 0 # time: O(1), space: O(1)
  for i in arr: # executes n times, space: O(1)
    sum += i # time: O(1), space: 0
    len += 1 # time: O(1), space: 0
  
  # Coerce sum to float here so that the division will be float, not int.
  return float(sum) / len # time: O(1), space: 0

Therefore, the time complexity is

$$O(1) + O(1) + n(O(1) + O(1)) + O(1) = O(n),$$

and the space complexity is

$$O(1) + O(1) + O(1) = O(1).$$

## Question 4

Assume an array has been over allocated so that the size of the allocated array is twice the current number of elements. What is the big-O time complexity of inserting an element at the end of this array? What if the current array has not been over allocated (i.e. the current array-allocation is equal to the number of elements in the array)?

In [None]:
#freetext

### Solution

Suppose the array has $n$ elements before the new element is inserted at the end. 

If the array has been over allocated, then there is space at the $(n+1)$th position in memory for the new element to be appended. In this case, the operation is $O(1)$. 

If, however, the array has not been over allocated, then there is not necessarily space at the $(n+1)$th space in memory. (Remember, arrays utilize [contiguous blocks](https://stackoverflow.com/questions/4059363/what-is-a-contiguous-memory-block) of memory.) In this case, we must first allocate a new array of size at least $n+1$, then we need to copy over the $n$ elements of the original array. Finally, we can insert the new element at the $(n+1)$th position. Thus, the time complexity is $O(n)$.

## Question 5

What is the big-O time complexity of inserting an element into a linked list? 

In [None]:
#freetext

### Solution

Linked lists are designed with the purpose that inserting and deleting elements is efficient. Inserting an element is equivalent to creating the new node that points to the following element, and changing the previous element to point to the new element. All of these operations are $O(1)$ and do not depend on the number of nodes in the linked list, therefore the best, worst, and average case time complexities are all the same, $O(1)$.

## Question 6

The following function finds the minimum value in an array.

In [None]:
def minimum(arr):
  """Finds the minimum of a list of integers."""
  min_value = float("Inf")

  for i in arr:
    if i < min_value:
      min_value = i
  
  return min_value

In a previous lesson, we showed that this function has time complexity $O(n)$, where $n$ is the length of the array.

The function below finds the minimum of *two* arrays of length $n$. What is the big-O time complexity of `two_minimums`, in terms of $n$?

You can assume that it is always true that both arrays have the same length, $n$.

In [None]:
def two_minimums(arr1, arr2):
  """Finds the respective minimums of two n-arrays of integers."""
  return minimum(arr1), minimum(arr2)

In [None]:
#freetext

### Solution

The `two_minimums` function calls `minimum` twice, once for `arr1` and once for `arr2`. The time complexity of each call is $O(n)$. Therefore the time complexity of of `two_minimums` is

$$O(n) + O(n) = O(n),$$

using the property that the sum of functions grows as fast as its fastest growing part.

This is a good example of how two functions with the same time complexity do not necessarily have the same actual runtime. The `two_minimums` function calls `minimum` twice, so takes exactly twice as long as the `minimum` function, however both are $O(n)$.

## Question 7

As per the previous question, calculating the minimums of two respective arrays of length *n* is linear. What is the time complexity of calculating the minimums of 1000 arrays of length *n*?

In [None]:
#freetext

### Solution

Using the same logic as in the previous question, the time complexity of finding the minimum of each array is $O(n)$. When these are added together, the time complexity is

$$O(1000n) = O(n).$$

This demonstrates how for any *constant* number $N$ of $n$-arrays, the total time complexity of calculating the $N$ respective minimums of the $N$ arrays is always $O(n)$. In an algorithm, doing the same thing repeatedly for a *known* and *constant* number of times is as time-complex as doing it only once.

## Question 8

The following function finds the *m* respective minimums of the input array `arr` (of length *m*), where each element of `arr` is itself an array of length *n*. What is the big-O time complexity?

You can assume that *all* sub-arrays of the input have the same length *n*.

In [None]:
def minimum(arr):
  """Finds the minimum of a list of integers."""
  min_value = float("Inf")

  for i in arr:
    if i < min_value:
      min_value = i
  
  return min_value

In [None]:
def m_minimums(arr):
  """Calculates the m minimums of the n-arrays within arr."""
  mins = []
  for each in arr:
    mins.append(minimum(each))
  
  return mins

In [None]:
#freetext

### Hint

In the previous question, $m$ was a constant equal to 2. Now, $m$ is unknown. How many times does the `for` loop execute?

### Solution

Since each array within `arr` has length $n$, the computation `minimum(each)` is $O(n)$. There are $m$ arrays in `arr`, so the `for` loop executes $m$ times. Therefore, as `m_minimums` has $m$ iterations of an $O(n)$ loop, the total time complexity is $O(mn)$.

Note how this is different from the function in the previous question that calculates the minimums of 1000 arrays, which has a time complexity of $O(n)$. When the number of arrays is *known* and *constant*, the time complexity is $O(n)$. Once the number of arrays is itself a variable $m$ that can be parameterized in the time complexity, the complexity becomes $O(mn)$.

## Question 9

Two of your coworkers, Arron and Maple can't agree on the optimal solution to a problem the team is trying to solve. They both work on a team that stores the data for a toy factory. One of their team's primary responsibilities is to keep track of which items have been flagged as defective or unsafe, and which are acceptable to sell. Each item is stored via a unique integer identifier.

Arron thinks that the best way to record this data is in two arrays, one for "safe" items, one for "unsafe" items. Each time a new item is evaluated, its integer ID is either added to the `safe_items` array, or to the `unsafe_items` array.

Maple thinks that the best way to record this data is in one map, from the integer ID to a boolean, `True` for safe items, and `False` for unsafe items. Each time a new item is evaluated, its integer ID is added as a key to the map `item_safety`, and its safety as the value.

One of the most important use cases for this system is to be able to find an item from its integer ID and see if it is safe or unsafe, or not yet in the database.

Below is Arron's function to check if an item is safe or unsafe. Assume that there exists two arrays, `safe_items` and `unsafe_items`, containing the integer IDs of safe and unsafe items respectively. It relies on the `find` function from the Guided Exercises above.

```python
def find(arr, val):
  """Returns True if val is in the list arr."""
  for i in arr:
    if i == val:
      return True
  return False

def is_item_safe(item_id):
  """Returns True if the item is safe, False if unsafe, error if not found."""
  if find(safe_items, item_id):
    return True
  elif find(safe_items, item_id):
    return False
  else:
    raise ValueError('Item ID %d not found.' % item_id)
```

Below is Maple's function to check if an item is safe or unsafe. Assume that there exists a map `item_safety` that maps integer ID keys to boolean safety values.

```python
def is_item_safe(item_id):
  """Returns True if the item is safe, False if unsafe, error if not found."""
  # The map will throw an error if item_id does not exist as a key.
  return item_safety[item_id]
```

Arron and Maple are arguing over whose implementation is more efficient. Arron believes using the two arrays is more efficient, while Maple believes using a map is more efficient. Can you settle the disagreement? Consider the time and space complexities of each approach.

In [None]:
#freetext

### Solution

Neither approach is "more efficient" across the board. There are multiple factors to consider.

- time complexity
- space complexity
- code simplicity

**Time complexity**

- Arron's approach uses this function twice and nothing else. The `find` function is $O(n)$ in the average case, so Arron's function is also $O(n)$ in the average case.

- Maple's approach just requires looking up a single key in a map, which is a single computation. Therefore, Maple's function is $O(1)$.

**Space complexity**

- Arron's approach requires storing two separate arrays with a total of $n$ items. Since $n$ items need to be stored, the space complexity of using two arrays is $O(n)$.

- Maple's approach requires storing $n$ integers and $n$ booleans, so $2n$ allocations. This is still $O(n)$, but is approximately twice as much space as Arron's approach.

**Code simplicity**

- Arron's function requires the helper method `find` as well as an `if`/`elif`/`else` statement.

- Maple's function is a simple one-liner.

Overall, Maple's function requires about twice as much storage as Arron's, but it runs faster (and importantly its runtime doesn't grow as the number of items grows) and is simpler to understand. Which approach you would choose depends on whether time or space or simplicity are most important, but Maple's would usually be a better choice.

## Question 10

[Advanced] What is the big-O time and space complexity of the following function?

In [None]:
def powers_of_two_below(n):
  """Prints integer powers of two below the input n."""
  output = []

  j = 2**0
  while j < n:
    output.append(j)
    j *= 2

  return output

In [None]:
print(powers_of_two_below(20))

In [None]:
#freetext

### Solution

The time and space complexity for this function are the number of computational operations and memory allocations required, respectively. These can be separated into two categories: outside and within the `while` loop. All single operations and allocations outside and within the `while` loop are $O(1)$. Therefore, the big-O time and space complexities for this function are the number of iterations of the `while` loop.

First, let's `print` at the number of iterations for different values of `n`. We do this by adding a few lines to the function.

In [None]:
def powers_of_two_below(n):
  """Prints integer powers of two below the input n."""
  output = []

  j = 2**0
  n_iters = 0
  while j < n:
    output.append(j)
    j *= 2
    n_iters += 1

  print('Input: %d\nIterations: %d' % (n, n_iters))
  return output

In [None]:
for n in range(1, 20):
  print(powers_of_two_below(n))

The number of iterations increases by 1 whenever $n = 2^k+1$ for various values of $k$. More specifically, `n_iters` increases for `n = 2`, `n = 3`, `n = 5`, `n = 9`, `n = 17`. This can also be seen just from the code, since the `while` statement needs to be executed one more time whenever `n` becomes a higher power of 2.

Let $I$ be the number of iterations, and $n$ be the input. Since $n$ has to double for $I$ to increase by 1 we have

\begin{align*}
n &\approx 2^I \\
I &\approx \log_2(n). \\
\end{align*}

Note that the equality is approximate; when $n$ is not an integer power of 2 (since $I$ is always an integer), true equality does not hold. More accurately, we can capture this as

\begin{align*}
n &= 2^{\textrm{floor}(I)} \\
\log_2(n) &= \textrm{floor}(I) \\
I &= \textrm{ceil}(\log_2(n)), \\
\end{align*}

where "floor" indicates the highest integer less than or equal to, and "ceil" indicates the lowest integer greater than or equal to. But as usual in big-O analysis, these approximations don't matter. The number of iterations is $O(\log_2(n))$, therefore the time and space complexities are both $O(\log_2(n))$.

