# Linear Search

## Lesson Overview

Linear search is an algorithm that searches an array for a given value. If the array contains the given value, the algorithm returns the lowest index at which the value is found. If not, the algorithm usually returns -1 or an error.

For example, searching for `2` in the array `[5, 2, 1, 2]` returns 1 (since the first instance of `2` is at index 1), whereas searching for `2` in the array `[1, 3]` returns -1 (since `2` is not in the array).

Linear search is one of the simplest searching algorithms to implement, and, as the name suggests, is $O(n)$ (linear time) in the average case.

### Algorithm

Suppose an array `arr` is being searched for a value `v`. This is an implementation of linear search that returns -1 if `v` is not contained in `arr`.

0. *Initialize:* Set $i = 0$.
1. *Repeat:* Inspect the element with index $i$.
  - If the element is equal to `v`, return $i$ and exit.
  - Otherwise, increment $i$ by 1.
2. *Exit:* If index $i$ does not exist in `arr`, that means you have iterated through the entire array and not found `v`, so return -1.

## Question 1

Write an iterative algorithm that implements linear search.

In [None]:
def linear_search(arr, v):
  """Searches a list of integers arr for a value v."""
  # TODO(you): Implement

### Hint

Remember to return the *index* of the found element, and -1 if the element is not found.

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(linear_search([1, 2, 3, 5, 6, 8, 9], 2))
# Should print: 1

print(linear_search([1, 3, 5, 6, 8, 9], 7))
# Should print: -1

### Solution

In [None]:
def linear_search(arr, v):
  """Searches a list of integers arr for a value v."""
  for i in range(len(arr)):
    if arr[i] == v:
      return i

  return -1

## Question 2

What is the best case time complexity of `linear_search`? In what case does this occur?

In [None]:
#freetext

### Solution

In the best case, `linear_search` finds `v` in the very first iteration of the `for` loop, at index 0. In this case, the algorithm requires 1 iteration, so has a time complexity of $O(1)$.

## Question 3

What is the worst case time complexity of `linear_search`? In what case does this occur?

In [None]:
#freetext

### Solution

In the worst case, `linear_search` needs to check every element of `arr` before either concluding that `v` is not in `arr` or that `v` is the last element of `arr`. In this case, the algorithm requires all $n$ iterations of the `for` loop (where $n$ is the length of `arr`), so has a time complexity of $O(n)$.

## Question 4

Your local public library has a scanning system that keeps track of all books at the library. Each book has a unique book number (for example, all copies of *The Bell Jar* by Sylvia Plath have the number 1842659).

The library has asked you to write a function to count the number of copies of a given book number that are currently loaned out. The books that are loaned out are stored in an array. Use the principles of `linear_search` to write a `count_occurrences` function that counts the number of times an integer appears in an array.

In [None]:
def count_occurrences(arr, v):
  """Returns the number of times the value v appears in arr."""
  # TODO(you): Implement

### Hint

Modify your `linear_search` function, but instead of returning the index found, increment a counter.

In [None]:
def linear_search(arr, v):
  """Searches a list of integers arr for a value v."""
  for i in range(len(arr)):
    if arr[i] == v:
      return i

  return -1

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(count_occurrences([1, 2, 1, 3, 1, 4, 1, 5], 1))
# Should print: 4

print(count_occurrences([1, 2, 1, 3, 1, 4, 1, 5], 6))
# Should print: 0

### Solution

In [None]:
def count_occurrences(arr, v):
  """Returns the number of times the value v appears in arr."""
  count = 0

  for i in arr:
    if i == v:
      count += 1
  
  return count

## Question 5

What optimizations can be made to the `linear_search` function if the input array is known to be pre-sorted (from lowest to highest)? Write a new function `linear_search_sorted` that accepts a pre-sorted array as an input.

In [None]:
def linear_search_sorted(arr, v):
  """Searches a sorted list of integers arr for a value v."""
  # TODO(you): Below is the linear_search code. Optimize it for a sorted input.
  for i in range(len(arr)):
    if arr[i] == v:
      return i

  return -1

### Hint

If the input array is sorted, then as soon as the `for` loop hits a value greater than the search value `v` (if not already found), the algorithm should exit.

### Solution

In [None]:
def linear_search_sorted(arr, v):
  """Searches a sorted list of integers arr for a value v."""
  for i in range(len(arr)):
    if arr[i] == v:
      return i
    # Add the following if statement.
    if arr[i] > v:
      return -1

  return -1

## Question 6

Linear search can be adapted to work on a linked list of integers.

Below is an implementation of a linked list from a previous lesson. Add a `search` method that uses linear search to search the linked list for a given value. If the value is in the linked list, return the index. If the value is not in the linked list, return -1.

In [None]:
class LinkedListElement:

  def __init__(self, value):
    self.value = value
    self.next = None

In [None]:
class LinkedList:

  def __init__(self):
    self.first = None

  def search(self, v):
    #TODO(you): Implement  

### Hint

Use the following code scaffolding.

```python
def search(self, v):
  elem = self.first
  while elem is not None:
    ...
    elem = elem.next
  return -1
```

In order to return the index, you may need to introduce a `counter` variable.

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
lle1 = LinkedListElement(1)
lle1.next = LinkedListElement(2)

lle2 = LinkedListElement(3)
lle2.next = LinkedListElement(5)
lle2.next.next = LinkedListElement(6)
lle2.next.next.next = LinkedListElement(8)
lle2.next.next.next.next = LinkedListElement(9)

lle1.next.next = lle2

ll = LinkedList()
ll.first = lle1
print(ll.search(2))
# Should print: 1

lle1.next = lle2
print(ll.search(7))
# Should print: -1

### Solution

In [None]:
class LinkedList:

  def __init__(self):
    self.first = None

  def search(self, v):
    elem = self.first
    counter = 0
    while elem is not None:
      if elem.value == v:
        return counter
      counter += 1
      elem = elem.next
    return -1

## Question 7

Linear search can also be implemented using recursion. Below is an implementation of a recursive linear search. However, it is producing some weird results. What is the bug in this code? Can you fix it?

In [None]:
def linear_search_recursive(arr, v, index = 0):
  """Searches a list of integers arr for a value v using recursion."""
  if len(arr) == 0:
    return -1

  if arr[0] == v:
    return index

  return linear_search_recursive(arr[1:], v, index)


print(linear_search_recursive(
    [1, 2, 3, 5, 6, 8, 9], 2)) # returns 0, should return 1
print(linear_search_recursive(
    [1, 2, 3, 5, 6, 8, 9], 5)) # returns 0, should return 4

### Hint

What is the purpose of the `index` parameter? Why is it necessary? What is the value of `index` at each recursion?

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(linear_search_recursive([1, 2, 3, 5, 6, 8, 9], 2))
# Should print: 1

print(linear_search_recursive([1, 2, 3, 5, 6, 8, 9], 5))
# Should print: 3

print(linear_search_recursive([1, 3, 5, 6, 8, 9], 7))
# Should print: -1

### Solution

This is a very subtle bug. Identifying and fixing it relies on understanding the utility of the `index` parameter. Recursive functions often rely on an index or counter parameter that is altered at each recursive step. 

In the case of `linear_search_recursive`, when `v` is found in `arr`, `index` is returned. Since `index` is initialized at 0 and never changed, this implementation will always return 0 if `v` is found and -1 if `v` is not found.

In order to fix this, `index` must be incremented at each recursion.

In [None]:
def linear_search_recursive(arr, v, index = 0):
  """Searches a list of integers arr for a value v using recursion."""
  if len(arr) == 0:
    return -1

  if arr[0] == v:
    return index

  # Change index to index + 1.
  return linear_search_recursive(arr[1:], v, index + 1)

## Question 8

[Advanced] Why is the average case time complexity of linear search linear?

In [None]:
#freetext

### Hint

Consider two cases separately:

- If `v` is in `arr`
- If `v` is not in `arr`

Calculate the average case time complexity under each case, and show that both are $O(n)$. If the average case time complexity for both cases is $O(n)$, the average case time complexity averaged across all cases must also be $O(n)$.

### Solution

Let's first assume that `v` is in `arr`. In the average case, `v` has an equal probability of being in any index, namely $\frac{1}{n}$.

If `v` is in index 0, `linear_search` takes 1 iteration. If `v` is in index 1, `linear_search` takes 2 iterations. In general, if `v` is in index $i$, `linear_search` takes $i+1$ iterations. Since the probability of `v` being at any specific index is $\frac{1}{n}$, the average case complexity is

$$ \frac{1}{n} (1 + 2 + ... + n). $$

Using the formula that $\sum\limits_{i=1}^n i = \frac{n(n+1)}{2}$, this is

\begin{align*}
\frac{1}{n} \frac{n(n+1)}{2} &= \frac{n+1}{2} \\
&= \frac{n}{2} + \frac{1}{2} \\
&= O(n). \\
\end{align*}

Therefore, if `v` is in `arr`, the average case time complexity is $O(n)$.

Now, assume that `v` is not in `arr`. This is the worst case of linear search, and as per a previous question, and the time complexity $O(n)$. Since the average case time complexity is $O(n)$ in both cases (whether `v` is in `arr` or not), the average case time complexity of linear search is $O(n)$.