# Chapter 1

This chapter introduces Big O notation, the binary search algorithm, and designing algorithms with recursion.

In [1]:
# Setup
import math
import numpy as np

## Binary search

The basic concept of binary search is that you eliminate half of the search space for each step, or iteration, of the algorithm.  We can calculate the maximum number of steps in a binary search algorithm using the code below.  

In [2]:
# Calculate max number of steps 
x = 100 # total number of items
i = 1 # counter
while x > 0:
    print(f'Step: {i}, number: {x}')
    i += 1
    x = x // 2 # use integer division

Step: 1, number: 100
Step: 2, number: 50
Step: 3, number: 25
Step: 4, number: 12
Step: 5, number: 6
Step: 6, number: 3
Step: 7, number: 1


We recreated the total number of steps in the first example from the book.  If we had 100 total numbers, and our friend wants us to guess which number they are thinking of, we'd find the number in at most 7 steps.  For the next example, we're asked to look for a word in a dictionary that contains 240,000 words.  Let's calculate the total number of steps that it could take to find the number if we used binary search.

In [6]:
# Dictionary has 240,000 words
words = 240_000
i = 1 
while words > 0:
    print(f'Step: {i}, number: {words}')
    i += 1
    words = words // 2

Step: 1, number: 240000
Step: 2, number: 120000
Step: 3, number: 60000
Step: 4, number: 30000
Step: 5, number: 15000
Step: 6, number: 7500
Step: 7, number: 3750
Step: 8, number: 1875
Step: 9, number: 937
Step: 10, number: 468
Step: 11, number: 234
Step: 12, number: 117
Step: 13, number: 58
Step: 14, number: 29
Step: 15, number: 14
Step: 16, number: 7
Step: 17, number: 3
Step: 18, number: 1


At most, this would take 18 steps using binary search, compared with 240,000 possible steps if we used a simple search and tried each word.  Because we're halving the search space with each step, binary search will take log2(n) steps to run, at most.  Let's verify using the two examples above.

In [11]:
# 100 numbers
round(math.log(100, 2))

7

In [12]:
# 240,000 words
round(math.log(240_000, 2))

18

In the next code chunk, we'll build a binary search function that takes an array and an item to search for.  I've made a few modifications to the function-- first, I've added a counter, `i`, in order to keep track of the number of steps to find the item.  Second, I've added several `print` statements throughout the program to walk my way through the course of the binary search.

In [3]:
def binary_search(array, item):
    """
    Implement binary search to look for an item in an array.
    If the item is in the array, return the item's position.
    If the item is not in the array, return None.
    """
    # Initialize starting low and high indexes and counter
    low = 0 
    high = len(array) - 1
    step = 1
    
    while low <= high:
        
        # Sanity checks
        print(f'Step: {step}')
        print(f'Item: {item}')
        print(f'Starting low: {low}')
        print(f'Starting high: {high}')
        
        middle = (low + high) // 2 # middle index
        print(f'Middle: {middle}')
        guess = array[middle] # grab the item at the middle index
        print(f'Guess: {guess}')
        
        # Either return the item's position or adjust search space
        if guess == item:
            return middle
        
        if guess > item:
            high = middle - 1
            print(f'Item < guess, new high: {high}')
        else: 
            low = middle + 1
            print(f'Item > guess, new low: {low}')
        print('\n')
        
        # Increment step counter
        step += 1
    
    # If item is not in the list, return None
    return None

In [37]:
my_list = list(range(101))
binary_search(my_list, 63)

Step: 1
Item: 63
Starting low: 0
Starting high: 100
Middle: 50
Guess: 50
Item > guess, new low: 51


Step: 2
Item: 63
Starting low: 51
Starting high: 100
Middle: 75
Guess: 75
Item < guess, new high: 74


Step: 3
Item: 63
Starting low: 51
Starting high: 74
Middle: 62
Guess: 62
Item > guess, new low: 63


Step: 4
Item: 63
Starting low: 63
Starting high: 74
Middle: 68
Guess: 68
Item < guess, new high: 67


Step: 5
Item: 63
Starting low: 63
Starting high: 67
Middle: 65
Guess: 65
Item < guess, new high: 64


Step: 6
Item: 63
Starting low: 63
Starting high: 64
Middle: 63
Guess: 63


63

## Exercises

1.1) What's the maximum number of steps it would take to search through a sorted list of 128 names?

We know that with binary search, it will take at most log2(n) steps to find the item.  In this case, it will take at most, log2(128) steps.  Let's compute this value below.

In [39]:
# Compute maximum number of steps
round(math.log(128, 2))

7

1.2) How many steps would it take if we doubled the size of the list?


In [42]:
# Compute maximum number of steps with 128 * 2 = 256 items
round(math.log(256, 2))

8

Other notes on running time-- in the original simple search, the algorithm runs in linear time, or O(n), meaning the running time scales linearly with the size of the problem.  In contrast, the binary search algorithm runs in logarithmic (base 2) time, or O(log2 n).  Let's verify that the example provided in the book that it takes at most 32 guesses to find a number in a list of 4 billion numbers.

In [43]:
round(math.log(4_000_000_000, 2))

32

As another exercise, let's say we wanted to search through an entire directory of all of the names of the humans alive on Earth today.  By the latest statistics (found [here](https://www.google.com/search?q=earth+population&oq=earth+population&aqs=chrome.0.0i433i512l2j0i512l8.1834j0j9&sourceid=chrome&ie=UTF-8), there are approximately 7.7 billion people.  Let's say we wanted to look up some person's information, but we need to look up their name first.  Let's find how many steps at most it would take to find the person's name and get their information.

In [44]:
round(math.log(7_700_000_000, 2))

33

Using binary search, it would take **at most** only 33 tries to find the person's name and get their information.

## Big O notation

Big O notation describes how algorithm running time increases with the size of the problem.  It can also be thought of as the running time of an algorithm in the worst case scenario.

We can estimate the time it would take both algorithms to run, assuming that each step takes 1 ms.

In [4]:
# Simple search
num_elements = 1_000_000_000
print(f'Time with simple search:\n{num_elements * 1} ms\n')

# Binary search
binary_search_time = round(math.log(num_elements, 2))
print(f'Time with binary search:\n{binary_search_time} ms')

Time with simple search:
1000000000 ms

Time with binary search:
30 ms


The code below computes the time it would take to draw 1,024 boxes using the simple search algorithm with O(n) time and binary search with O(log n) time.  Each operation takes approximately 0.1 seconds to complete.

In [13]:
# Simple search
num_operations = 1024
print(f'Time for {num_operations} operations with simple search:\n{num_operations * 0.1} seconds.\n')

# Binary search
binary_search_time = round(math.log(num_operations, 2)) * 0.1
print(f'Time for {num_operations} operations with binary search:\n{binary_search_time} s')

Time for 1024 operations with simple search:
102.4 seconds.

Time for 1024 operations with binary search:
1.0 s


## Exercises



For each of the exercises below, give the run time for each in terms of Big O.

1.3) You have a name, and you want to find the person's phone number in the phone book.

Assuming we're sorting through a dictionary of similar type of data structure, the best we could hope to achieve is O(log n) with binary search.  However, if the data are stored in a dictionary, the run time is actually O(1) according to the official Python documentation.  In other words, the run time is constant, regardless of the number of items in the dictionary.  There are some details regarding the hash function to consider, but are outside of the scope of this chapter.

1.4) You have a phone number, and you want to find the person's name in the phone book.

In this situation, we could just use the phone number as a key in the dictionary and the name as the value.  Thus, the time complexity would still be O(1).  However, the answer in the textbook is given as O(n).

1.5) You want to read the numbers of every person in the phone book.

Here, at most, the time complexity would be O(n), because we're not just searching for something; we're performing some operation on each person in the book.

## Traveling salesperson

This demonstrates the algorithm with time complexity of O(n!).  In the textbook, the author uses an example where the salesperson must travel to 5 cities, and the need to evaluate all possible routes they could take in order to find the route with the shortest total distance traveled.  In order to find all permutations, we use 5!:

In [15]:
print(f'With {5} cities, there are {math.factorial(5)} permutations to consider.')

With 5 cities, there are 120 permutations to consider.


We can see how the size of this problem scales with just a few additional cities added to the route.


In [16]:
print(f'With {10} cities, there are {math.factorial(10)} permutations to consider.')

With 10 cities, there are 3628800 permutations to consider.


That's over 3 million permutations to consider, and with the addition of even more cities, the problem essentially becomes impossible to solve.  In later chapters, we'll cover greedy algorithms, which basically try to get a good approximation for a problem like this.