In this section we will study searching. Searching is the algorithmic process of finding a particular item in a collection of items. A search typically answers either `True` or `False` as to whether the item is present. On occasion it may be modified to return where the item is found. For our purposes here, we will simply concern ourselves with the question of membership.

In Python, there is a very easy way to ask whether an item is in a list of items. We use the `in` operator.

```python
>>> 15 in [3, 5, 2, 4, 1]
False
>>> 3 in [3, 5, 2, 4, 1]
True
```

Given the ease of _conducting_ a search in Python, you may wonder what the purpose is of studying search as an algorithms problem. The answer is that the underlying process used to enable a search is important to understand as it arises elsewhere, such as in data structures designed for fast search, and particular in databases.

It turns out that there are many different ways to search for an item in a collection. We focus here on the difference between two such ways—sequential search and binary search.


When data items are stored in a collection such as a list, we say that
they have a linear or sequential relationship. Each data item is stored
in a position relative to the others. In Python lists, these relative
positions are the index values of the individual items. Since these
index values are ordered, it is possible for us to visit them in
sequence. This process gives rise to our first searching technique, the
**sequential search**.

The diagram below shows how this search works. Starting at
the first item in the list, we simply move from item to item, following
the underlying sequential ordering until we either find what we are
looking for or run out of items. If we run out of items, we have
discovered that the item we were searching for was not present.

![Sequential search of a list of integers](figures/sequential-search.png)

The Python implementation for this algorithm is shown below. The function needs the list and
the item we are looking for and returns a boolean value as to whether it
is present. Remember in practice we would use the Python `in` operator for this purpose, so you can think of the below algorithm as what we would do if `in` were not provided for us.

In [1]:
def sequential_search(alist, item):
    position = 0

    while position < len(alist):
        if alist[position] == item:
            return True
        position = position + 1

    return False

testlist = [1, 2, 32, 8, 17, 19, 42, 13, 0]

sequential_search(testlist, 3)  # => False
sequential_search(testlist, 13)  # => True

True

Analysis of Sequential Search
-----------------------------

To analyze searching algorithms, we need to decide on a basic unit of
computation. Recall that this is typically the common step that must be
repeated in order to solve the problem. For searching, it makes sense to
count the number of comparisons performed. Each comparison may or may
not discover the item we are looking for. In addition, we make another
assumption here. The list of items is not ordered in any way. The items
have been placed randomly into the list. In other words, the probability
that the item we are looking for is in any particular position is
exactly the same for each position of the list.

If the item is not in the list, the only way to know it is to compare it
against every item present. If there are $$n$$ items, then the sequential
search requires $$n$$ comparisons to discover that the item is not there.
In the case where the item is in the list, the analysis is not so
straightforward. There are actually three different scenarios that can
occur. In the best case we will find the item in the first place we
look, at the beginning of the list. We will need only one comparison. In
the worst case, we will not discover the item until the very last
comparison, the nth comparison.

What about the average case? On average, we will find the item about
halfway into the list; that is, we will compare against $$\frac{n}{2}$$
items. Recall, however, that as *n* gets large, the coefficients, no
matter what they are, become insignificant in our approximation, so the
complexity of the sequential search, is $$O(n)$$:

Case  |  Best Case |  Worst Case | Average Case
--- | --- | --- | ---
item is present | $$1$$ |  $$n$$ |  $$\frac{n}{2}$$
item is not present | $$n$$  | $$n$$  | $$n$$


We assumed earlier that the items in our collection had been randomly
placed so that there is no relative order between the items. What would
happen to the sequential search if the items were ordered in some way?
Would we be able to gain any efficiency in our search technique?

Assume that the list of items was constructed so that the items were in
ascending order, from low to high. If the item we are looking for is
present in the list, the chance of it being in any one of the *n*
positions is still the same as before. We will still have the same
number of comparisons to find the item. However, if the item is not
present there is a slight advantage. The diagram below
shows this process as the algorithm looks for the item 50. Notice that
items are still compared in sequence until 54. At this point, however,
we know something extra. Not only is 54 not the item we are looking for,
but no other elements beyond 54 can work either since the list is
sorted.

![Sequential search of an ordered list of integers](figures/sequential-search-2.png)

In this case, the algorithm does not have to continue looking
through all of the items to report that the item was not found. It can
stop immediately. The code below shows this
variation of the sequential search function.


In [2]:
def ordered_sequential_search(alist, item):
    position = 0

    while position < len(alist):
        if alist[position] == item:
            return True

        if alist[position] > item:
            return False

        position = position + 1

    return False

testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,]
ordered_sequential_search(testlist, 3)  # => False
ordered_sequential_search(testlist, 13)  # => True

True

The table below summarizes these results. Note that
in the best case we might discover that the item is not in the list by
looking at only one item. On average, we will know after looking through
only $$\frac {n}{2}$$ items. However, this technique is still $$O(n)$$. In
summary, a sequential search is improved by ordering the list only in
the case where we do not find the item.

Case  |  Best Case |  Worst Case | Average Case
--- | --- | --- | ---
item is present | $$1$$ |  $$n$$ |  $$\frac{n}{2}$$
item is not present | $$n$$  | $$n$$  | $$\frac{n}{2}$$


It is possible to take greater advantage of the ordered list if we are
clever with our comparisons. In the sequential search, when we compare
against the first item, there are at most $$n-1$$ more items to look
through if the first item is not what we are looking for. Instead of
searching the list in sequence, a **binary search** will start by
examining the middle item. If that item is the one we are searching for,
we are done. If it is not the correct item, we can use the ordered
nature of the list to eliminate half of the remaining items. If the item
we are searching for is greater than the middle item, we know that the
entire lower half of the list as well as the middle item can be
eliminated from further consideration. The item, if it is in the list,
must be in the upper half.

We can then repeat the process with the upper half. Start at the middle
item and compare it against what we are looking for. Again, we either
find it or split the list in half, therefore eliminating another large
part of our possible search space. The diagram below shows
how this algorithm can quickly find the value 54.

![Binary search of an ordered list of integers](figures/binary-search.png)

This algorithm is
a great example of a divide and conquer strategy. Divide and conquer
means that we divide the problem into smaller pieces, solve the smaller
pieces in some way, and then reassemble the whole problem to get the
result. When we perform a binary search of a list, we first check the
middle item. If the item we are searching for is less than the middle
item, we can simply perform a binary search of the left half of the
original list. Likewise, if the item is greater, we can perform a binary
search of the right half. Either way, this is a recursive call to the
binary search function passing a smaller list.

An implementation of recursive binary search in Python may look like this:

In [3]:
def binary_search(alist, item):
    if not alist:  # list is empty -- our base case
        return False

    midpoint = len(alist) // 2
    if alist[midpoint] == item:  # found it!
        return True

    if item < alist[midpoint]:  # item is in the first half, if at all
        return binary_search(alist[:midpoint], item)

    # otherwise item is in the second half, if at all
    return binary_search(alist[midpoint + 1:], item)


testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
binary_search(testlist, 3)  # => False
binary_search(testlist, 13)  # => True

True

-------------------------

To analyze the binary search algorithm, we need to recall that each
comparison eliminates around half of the remaining items from
consideration. What is the maximum number of comparisons this algorithm
will require to check the entire list? If we start with *n* items, approximately
$$\frac{n}{2}$$ items will be left after the first comparison. After the
second comparison, there will be approximately $$\frac{n}{4}$$. Then
$$\frac{n}{8}$$, $$\frac{n}{16}$$, and so on. How many times can we split
the list? This table helps us to see the
answer:

Comparisons | Approximate Number of Items Left
--- | ---
1 |  $$\frac{n}{2}$$
2 |  $$\frac{n}{4}$$
3 |  $$\frac{n}{8}$$
... |
i  | $$\frac {n}{2^i}$$

When we split the list enough times, we end up with a list that has just
one item. Either that is the item we are looking for or it is not.
Either way, we are done. The number of comparisons necessary to get to
this point is *i* where $$\frac {n}{2^i} =1$$. Solving for *i* gives us
$$i=\log n$$. The maximum number of comparisons is logarithmic with
respect to the number of items in the list. Therefore, the binary search
is $$O(\log n)$$.

One additional analysis issue needs to be addressed. In the
solution shown above, the recursive call,

`binary_search(alist[:midpoint], item)`

uses the slice operator to create the left half of the list that is then
passed to the next invocation (similarly for the right half as well).
The analysis that we did above assumed that the slice operator takes
constant time. However, we know that the slice operator in Python is
actually $$O(k)$$. This means that the binary search using slice will not
perform in strict logarithmic time. Luckily this can be remedied by
passing the list along with the starting and ending indices. We
leave this implementation as an exercise.

Even though a binary search is generally better than a sequential
search, it is important to note that for small values of *n*, the
additional cost of sorting is probably not worth it. In fact, we should
always consider whether it is cost effective to take on the extra work
of sorting to gain searching benefits. If we can sort once and then
search many times, the cost of the sort is not so significant. However,
for large lists, sorting even once can be so expensive that simply
performing a sequential search from the start may be the best choice.
