# Algorithms and Data Structures 

Intro (write later)
- include space and time tradeoffs
- big O notation

## Big O Complexity

Let's start by looking a simple function and determining it's big O notation. Let's say we have a function that decodes messages that looks like this:

In [2]:
# note: this is psuedocode 
def decode(message):
    output = ''
    for letter in message:
        get new_letter from letter location in ciper
        add new_letter to output
    return output

Creating and returning the output only happen one time, no matter how long the input message is. So we would write O(2). We also have a for loop that does two things. That will add 2n computations as the input message gets longer. Now we have O(2n+2). This tells us that if the input message is 10 letters long, our program will require 22 computations to decode it. If the input message is 2 million letters long, will take 2 million and 2 computations. Of course, it is really not that simple! Consider that we haven't given any credit to the computations occuring when the for loop is called. We have also assumed that each line in the for loop is not calling other computations in the background. For instance, the ciper holds the mappings from input to output for each letter in the alphabet. Each time we call this line, the program has to go through the whole ciper, looking for the mapping for the current letter. If we take these other processes into account we have O((2+1+26)n+2) = O(29n+2). (Where 1 is the computation for calling the for loop, and 26 is the number of mappings to check in the ciper data structure). Now if the message length is 10, we need 292 computations amd if the message length is 1 million, we need 29 million computations. In practice, instead of making these detailed calculations for all our code, we present big O notation as a "worst case scenario". For this decode function, we would simplify from O(29n+2) to O(n) to say that computation time scales linearly with input length. In this case, O(n) really means "Some number of computations must be performed for EACH letter in the message." Here are some more examples:

In [None]:
"""input people: a list of "people", where each person is represented by a dictionary
a single person has properties like "name", "age", et cetera
n = the number of elements in "people"
m = the number of properties per "person" (i.e. the number of keys in a person dictionary)"""

def example1(people):
    for person in people:
        print person['name']

def example2(people):
    print person[0]['name']
    print people[0]['age']

def example3(people):
    for person in people:
        for person_property in people:
            print person_property, ": ", person[person_property]

def example4(people):
    oldest_person = "No manatees here!"
    for person1 in people:
        for person2 in people:
            if person1['age'] < person2['age']:
                oldest_person = person2['name']
            else:
                oldest_person = person1['name']
    print oldest_person

Example 1 has complexity of O(n), example 2 has complexity of O(1), example 3 has complexity of O(nm), and example 4 has complexity O(n^2). Can you see how these complexities are derived? For more information on data structures and algorithm Big O characteristics, checkout this [cheatsheet](http://bigocheatsheet.com/).

## Collections

Collections are not a specific data structure. Instead, they represent the concept of grouping things together. There are several data structures based on these concepts that we will discuss. The import concepts of collections are:

- have no inherent order
- items do not have to be the same type

#### Lists
- order is the order in which items are added
- no fixed length
- objects in list do not have to be the same type
- items can be added or removed at any position

#### Arrays
An array is similar to a list but with the following differences:
- the objects in the array are usually all the same type
- have an index that specifies the location of objects in the array. This can make insertions and deletions take longer because the index needs to be updated with each of these operations.

For those familiar with Python, you might not notice that both lists and arrays describe Python lists. The reason is that in the background, Python actually implements its list data type as an array! Inserting an obejct into a list in Python takes O(n) time, while searching for an item in a Python list takes O(1) time. More details on the time complexity of Python lists is available [here](https://wiki.python.org/moin/TimeComplexity).

### Linked Lists
- adjecent elements of linked lists have knowledge of each other. Kind of like each element knows what element is on either side of it.
- Makes it easier to add and remove elements. O(1) instead of O(n) for arrays.
- linked lists store the value at a location and pointer to the next item in the list. This is the key to fast insertions and deletions. All that needs to be done is change the where the pointer of a value is pointing. Don't need to shift all the values to a different index.
- make sure to assign the pointer from the added item before reassigning the old item so as to not lose any values.
- a doubly-linked list has pointers going both forwards and backwards, allowing for forward and backwards list traversal.

In [5]:
class Element(object):
    def __init__(self, value):
        self.value = value
        self.next = None

class LinkedList(object):
    def __init__(self, head=None):
        self.head = head

    def append(self, new_element):
        current = self.head
        if self.head:
            while current.next:
                current = current.next
            current.next = new_element
        else:
            self.head = new_element

    def get_position(self, position):
        counter = 1
        current = self.head
        if position < 1:
            return None
        while current and counter <= position:
            if counter == position:
                return current
            current = current.next
            counter += 1
        return None

    def insert(self, new_element, position):
        counter = 1
        current = self.head
        if position > 1:
            while current and counter < position:
                if counter == position - 1:
                    new_element.next = current.next
                    current.next = new_element
                current = current.next
                counter += 1
        elif position == 1:
            new_element.next = self.head
            self.head = new_element

    def delete(self, value):
        current = self.head
        previous = None
        while current.value != value and current.next:
            previous = current
            current = current.next
        if current.value == value:
            if previous:
                previous.next = current.next
            else:
                self.head = current.next
# Test cases
# Set up some Elements
e1 = Element(1)
e2 = Element(2)
e3 = Element(3)
e4 = Element(4)

# Start setting up a LinkedList
ll = LinkedList(e1)
ll.append(e2)
ll.append(e3)

# Test get_position
# Should print 3
print(ll.head.next.next.value)
# Should also print 3
print(ll.get_position(3).value)

# Test insert
ll.insert(e4,3)
# Should print 4 now
print(ll.get_position(3).value)

# Test delete
ll.delete(1)
# Should print 2 now
print(ll.get_position(1).value)
# Should print 4 now
print(ll.get_position(2).value)
# Should print 3 now
print(ll.get_position(3).value)

3
3
4
2
4
3


### Stacks
Stacks can be similar to any of the data structures already discussued, however, stacks are built around the pricinple of last in, first out (LIFO). This means stacks make it really easy to add or remove an element (O(1)) because the element is just added or removed from the top of the stack. Getting to items on the bottom of the stack is harder because you have to go through the whole stack from top to bottom.

In [6]:
"""Add a couple methods to our LinkedList class,
and use that to implement a Stack.
You have 4 functions below to fill in:
insert_first, delete_first, push, and pop.
Think about this while you're implementing:
why is it easier to add an "insert_first"
function than just use "append"?"""

class Element(object):
    def __init__(self, value):
        self.value = value
        self.next = None
        
class LinkedList(object):
    def __init__(self, head=None):
        self.head = head
        
    def append(self, new_element):
        current = self.head
        if self.head:
            while current.next:
                current = current.next
            current.next = new_element
        else:
            self.head = new_element

    def insert_first(self, new_element):
        "Insert new element as the head of the LinkedList"
        new_element.next = self.head
        self.head = new_element

    def delete_first(self):
        "Delete the first (head) element in the LinkedList as return it"
        self.head = self.head.next

class Stack(object):
    def __init__(self,top=None):
        self.ll = LinkedList(top)

    def push(self, new_element):
        "Push (add) a new element onto the top of the stack"
        self.ll.insert_first(new_element)

    def pop(self):
        "Pop (remove) the first element off the top of the stack and return it"
        if self.ll.head:
            first = self.ll.head
            self.ll.delete_first()
            return first
    
# Test cases
# Set up some Elements
e1 = Element(1)
e2 = Element(2)
e3 = Element(3)
e4 = Element(4)

# Start setting up a Stack
stack = Stack(e1)

# Test stack functionality
stack.push(e2)
stack.push(e3)
print(stack.pop().value)
print(stack.pop().value)
print(stack.pop().value)
print(stack.pop())
stack.push(e4)
print(stack.pop().value)

3
2
1
None
4


#### Queues
- FIFO, first in, first out
- first element is the head (like in a linked list)
- last element is the tail
- a deque is a double-ended queue that can be enqued or dequed from either end
- assign priority values to each element in the queue to know which element to remove next.

In [7]:
"""Make a Queue class using a list!
Hint: You can use any Python list method
you'd like! Try to write each one in as 
few lines as possible.
Make sure you pass the test cases too!"""

class Queue:
    def __init__(self, head=None):
        self.storage = [head]

    def enqueue(self, new_element):
        self.storage.append(new_element)

    def peek(self):
        return self.storage[0]

    def dequeue(self):
        return self.storage.pop(0)
    
# Setup
q = Queue(1)
q.enqueue(2)
q.enqueue(3)

# Test peek
# Should be 1
print(q.peek())

# Test dequeue
# Should be 1
print(q.dequeue())

# Test enqueue
q.enqueue(4)
# Should be 2
print(q.dequeue())
# Should be 3
print(q.dequeue())
# Should be 4
print(q.dequeue())
q.enqueue(5)
# Should be 5
print(q.peek())

1
1
2
3
4
5


## Summary of List-like Data Structures
1. Lists elements have no order
2. Array elements have a value and an index
3. Linked-lists elements have a value and a pointer to the next value
4. Stacks elements have a value and pointer to the next value and are optimized for FIFO
5. Queue elements have a value, a pointer, and a priority and are optimized for LIFO, but can be accessed at head or tail with O(n).

# Algorithms

## Binary Search
- start with sorted array of values
- is value bigger or smaller than middle value?
- look in half of array where value would be
- repeat in the new half of the array
- O(log(n)) (base 2 is assumed in computer science

In [33]:
"""You're going to write a binary search function.
You should use an iterative approach - meaning
using loops.
Your function should take two inputs:
a Python list to search through, and the value
you're searching for.
Assume the list only has distinct elements,
meaning there are no repeated values, and 
elements are in a strictly increasing order.
Return the index of value, or -1 if the value
doesn't exist in the list."""

def binary_search(input_array, value):
    """Your code goes here."""
    low = 0
    high = len(input_array) - 1
    while low <= high:
        middle = (low+high)//2
        if input_array[middle] == value:
            return middle
        elif input_array[middle] > value:
            high = middle - 1
        else:
            low = middle + 1
        input_array = input_array
    return -1

test_list = [1,3,9,11,15,19,29]
test_list2 = [1,5,7,9,14,18,25]
test_val1 = 25
test_val2 = 15
test_val3 = 9
test_val4 = 17
test_val5 = 1
test_val6 = 7
print(binary_search(test_list, test_val1)==-1)
print(binary_search(test_list, test_val2)==4)
print(binary_search(test_list, test_val3)==2) 
print(binary_search(test_list, test_val4)==-1)
print(binary_search(test_list, test_val5)==0) 
print(binary_search(test_list, test_val6)==-1)
print(binary_search(test_list2, test_val1)==6)
print(binary_search(test_list2, test_val2)==-1)
print(binary_search(test_list2, test_val3)==3)
print(binary_search(test_list2, test_val4)==-1)
print(binary_search(test_list2, test_val5)==0)
print(binary_search(test_list2, test_val6)==2)

True
True
True
True
True
True
True
True
True
True
True
True


## Sorting

- in place sorting has low space complexity

### Bubble Sort
- naive approach
- move through the array and compare adjacent values. When value on the left is bigger than value on the right, switch their positions. The largest value "bubbles up" to the end (right side) of the array.
- Keep going until you don't make any more switches
- Worst case/average case: O(n^2)
- best base: O(n) (if the array is already sorted)
- space complexity: O(1), no new data structures are created

### Merge Sort
