# Algorithm Analysis & Big-O

- Data structures and algorithms is a study area of the theoretical computer science field in the discipline of computer science.

- An algorithm is a proceedure or fomula for solving a problem.
- Big-O notation: a notation that compares algorithms based on how fast they run and how much memory (space) they use to run, independent of hardware. It examines how **quickly an algorithm's runtime grows** (not exact runtimes) relative to the input, as the input gets arbitrarily large, also known as the algorithm's complexity.
- An algorithm's complexity can either be time (above) or space (amount of resource it uses in computing a result)
- Time complexity can be determined by evaluating the number of assignmnents/ operations an algorithm makes relative to the input size.
- Big-O uses the 'n' notation to represent input size, so that various input sizes can be used to analyze the run-time alogrithm's growth.
- As n becomes arbitrarily large, the concern is only on the terms that grow the fastest relative to n thus constants are not conisdered in big-O notation analysis.
- Big-O analysis is considered asymptotic (describing limiting behaviour) i.e a function f(n + 1) may be considered asymptotically equal to to another f(n) as the value of n becomes arbitrarily large or as the limit is approached.

_See function algo_analysis below for Big O breakdown demo_

- Common Big-O notations/ functions that can be used to evaluate an algorithm are:
    
  1. Constant time O(1): the computation time does not grow when the input size grows.
  2. Logarithmic O(Log n): the computation time scales half as much relative to the size of the input
  3. Linear time O(n): the computation time scales linearly with the input size.
  4. Log Linear time O(nlog n): Combines the log & linear time complexities.
  5. Polynomial time O(n^c) : these grow c times the size of the input, where c is a constant and c > 1.           
     - They include quadratic O(n^2) & cubic O(n^3) time complexities.

  6. Exponential time O(2^n): As opposed to polynomial time where n input is multiplied c times, exponential time involves multiplying a constant base, n input number of times.
  
_Examples of implementation: https://stackoverflow.com/questions/1592649/examples-of-algorithms-which-has-o1-on-log-n-and-olog-n-complexities_

__Further explanation of Big O: https://stackoverflow.com/questions/487258/what-is-a-plain-english-explanation-of-big-o-notation/487278#487278_

- Best case vs Worst case scenarios: Usually the worst case is considered in Big O analysis but consideration for the
  best case is also important.

- Space complexity: how much memory different algorithms take in order to execute. Consider the trade offs of different algoriths for both time and space complexities.


#### Algorithm analysis using Big O proceedure:

     1. Identify the input
     2. Identify the statments interacting with the input
     3. For each of these statements, determine how many assignments/operations are done each time the function runs
     4. Express these assignments in Big O notation form in relation to the input
     5. Add the assignment/ operations from the statments in 3
     6. Simlify the expression in 5 as the input tends to infinity
     7. The overall complexity will only comprise the terms that have the biggest impact on the overall expression, as the input tends to inifinity.


In [1]:
# Time complexity examples

# 1.Constant: in this case O(3(1)) and by dropping the insignificant constant, the complexity of the function remains to be O(1).

def constant (lst):
    print (lst[0])
    print (lst[1])
    print (lst[2])
    
# n = [1, 2, 3]

# 2.Linear: in this case O(2(n) + 2(n)) but by dropping the insignifacant constant as n gets very large, the complexity of this function is O(n) i.e O(4(n)) -> O(n)

def linear (lst):
    for index_counter in range(0, (len(lst))):
        print(lst[index_counter]) # 2*n
    
    for lst_value in lst:
        print(lst_value) # 2*n
        
# n = [1, 2]

# 3.Quadratic: Falling under polynomial time complexity, the below function has O(n^2) runtime 
# i.e O((3n)*(3n))

def quadratic (lst):
    for lst_value_1 in lst:
        print (lst_value_1) # 3*n
        
        for lst_value_2 in lst:
            print (lst_value_2, end=" ") # for every n input executed in the outter loop, this loop performs 3*n operations.
            
# n = [1, 2, 3]

def multi_dim_list1 (lst):
    # For loop
#     print ('Major diagonal elements:')
#     for i in range(0, len(lst)):
#         for j in range(0, len(lst[i])):
#             if i == j:
#                 print (lst[i][j])
            
    # List comprehension
    print ('Major diagonal elements:', [lst[i][i] for i in range(0, len(lst))])
    

def multi_dim_list2 (lst):
    # For loop
#     print('Minor diagonal elements:')
#     for i in range(0, len(lst)):
#         for j in range(0, len(lst[i])):
#             if j == (len(lst)-i-1):
#                 print (lst[i][j])
    
    # List comprehension
    print ('Minor diagonal elements:', [lst[i][len(lst)-i-1] for i in range(0, len(lst))])
            

# Run code:
lst = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# multi_dim_list1 (lst)
# multi_dim_list2 (lst)


# ALGORITHM ANALYSIS:

def algo_analysis (lst):
    print(lst[0])                 # O(1): Array reading using index is a constant time operations
    
    midpoint = len(lst)/2        # O(1): Assignment operations are constant time

    for  val in lst[:midpoint]: 
        print(val)                # O(log n): O(1) operations * 1/2 the number of times of n > O(1/2(n))
# https://www.youtube.com/watch?v=kjDR1NBB9MU : describes logarithmic time compelxity clearly.

    for x in range(10):
        print("Hello world")     # O(n): 10 operations, equal to the input 10: O(10)
        

# Time complexity of function algo_analysis: O(1) + O(1) + O(1/2*n) + O(10n)
# Simplified time complexity of function algo_analysis as n tend to infinity can be determined by pluggin in different large numbers in place of n and analysing each term's contribution to the overall expression.

# If n = 1000, then: 1 + 1 + 0.5(1000) + 10(1000), the first 2 terms can be dropped as they have no significant contribution to the overal output, leaving the other two: (500 + 10,000)

# If n = 1,000,000 then: 0.5(1,000,000) + 10(1,000,000): 500,000 + 10,000,000: the first term is insignificant compared to the second, dropping leaves: (10n).

# If n = 100,000,000, then: 10(100,000,000) = 1,000,000,000 but when n = 1000 then 10(1000) = 10,000, thus the constant 10, does not have any significant impact on the output so that can be dropped as well leaving only n as the simplified term with the biggest impact on the overall expression output.

# Simplified time complexity of the function is thus: O(n)


# WORST CASE, AVERAGE CASE AND BEST CASE TIME COMPLEXITY SCENARIOS:

def worst_case_best_case (lst, match_item):
    for val in lst:
        if val == match_item:
            print("True")
    print("False")

# lst = [1 ,2 ,3, 4, 5]

# Running the function with inputs (lst, 1) yields a best case scenario since a match is found the first time a search is done, thus O(1).

# Running the function with inputs (lst, 11) yields a worst case scenario since no match is found and
# every item in the list needs to be checked, thus O(n).

# Running the function with inputs (lst, 3) yields an average case scenario as a match is foung mid way through the list and the entire list did not need searching to find a match, thus O(1/2*n) > O(log n).


# TIME VS SPACE COMPLEXITIES DEMO:

def create_list(n):
    new_list = []
    
    for num in range(n):
        new_list.append('new')

# n = [1, 2, 3]

# The function create a new list in memory with the same size as the input list every time it runs, thus consumes memory linearly with the size of the input: O(n).

def print_stuff():
    for num in range(10):
        print("Hello world")
        
# The above will print the string "hello world" ten time but does not take up 10 slots of memory rather reads from just one for each print operation, thus a space complexity of O(1).

### Amortized Time Analysis

- Is an algorithm design pattern, useful when performing a series of operations such that we break even in terms of the operations' costs, combined. By using this approach we can then 'pay' for an expensive operation by preceeding it with a number of cheap operations and accumulating their savings to use use on the upcoming expensive operation.
- The motivation for this approach is that taking the worst-case for such operations can be too pesimistic and as such a more realistic approach is preferred, that looks at the actual operations' costs and cost bounds e.g time/operations.
- So rather than having one sort operation for an array, look at a series of sorts, look-ups, deletes to a database and strive to make them efficient.

_The amortized cost per operation of a sequence of n operations is their total cost / n_

- For example, if we have 100 operations at cost 1, followed by one operation at cost 100, the amortized cost per operation is 200/101 < 2.

- The reason for considering amortized cost is that we will be interested in data structures that occasionally can incur a large cost as they perform some kind of rebalancing or improvement of their internal state, but where such operations cannot occur too frequently. In this case, amortized analysis can give a much tighter bound on the true cost of using the data structure than a standard worst-case-per-operation bound. 

- A Potential function is a function of the state of a system, that generally should be non-negative and start at zero and is used to smooth out the anaylysis of some algorithm or process.
- The potential function is like a bank account where we put the savings from the cheap operations to use on the expensive functions when needed and for purposes of this model, it should never be negative otherwise the model fails.

- Example: implementing a stack as an array:

Say we want to use an array to implement a stack. We have an array A, with a variable top that points to the top of the stack (so A[top] is the next free cell).

• To implement push(x), we just need to perform: A[top] = x; top++;
• To implement x=pop(), we just need to perform: top--; x = A[top]; 

`Cost: generally O(1)`

However, what if the array is full and we need to push a new element on?

In that case we can allocate a new larger array, copy the old one over, and then go on from there. This is going to be an expensive operation, so a push that requires us to do this is going to cost a lot. `{O(n) + (1)}`

But maybe we can “amortize” the cost over the previous cheap operations that got us to this point. 

So, on average over the sequence of operations, we’re not paying too much.

To be specific, let us define the following cost model.

Cost model: 
Let’s say that inserting into the array costs 1, taking an element out of the array
costs 1, and the cost of resizing the array is the number of elements moved. 

(Say that all other operations, like incrementing or decrementing “top”, are free.)

Question 1: What if when we resize we just increase the size by 1? Is that a good idea?
Answer 1: Not really. If our n operations consist of n pushes then we will incur a total cost
1 + 2 + 3 + 4 + ... + n = n(n + 1)/2. That’s an amortized cost of (n + 1)/2 per operation.

Question 2: What if we instead decide to double the size of the array when we resize?
Answer 2: This is much better. Now, in any sequence of n operations, the total cost for resizing is 1 + 2 + 4 + 8 + ... + 2i for some 2i < n (if all operations are pushes then 2i will be the largest power of 2 less than n). 

This sum is at most 2n − 1. 

Adding in the additional cost of n for inserting/removing, we get a total cost < 3n, and so our amortized cost per operation is < 3.

Another way to look at this is:

What is the sum of 1 + 2 + 4 + 8 + 16 +... +X? If you read this sum left to right, it starts with 1 and doubles until it gets to X. 

If you read right to left, it starts with X and halves until it gets to 1.

What then is the sum of X + X/2 + X/4 + X/8 + ... + 1? This is roughly 2X. 

Therefore, X insertions take O( 2X) time.The amortized time for each insertion is O(1) .

# Data Structures

- Data structures are data organization, management, and storage formats that enables efficient access and             modification of the data. They are also a collection of data values, the relationships among them, and the           functions or operations that can be applied to the data.
- Big O for python data structure: structures such as list and dictionaries have their own built in methods which are
  built to be efficient operating on the data structures, with alot of the common opertions on them having a constant   time complexity e.g indexing and assigning to indexed positions.
- Time complexities for various list & dictionary operations in Lecture 43.

In [2]:
# PYTHON LIST CREATION OPTIONS, EFFICIENCY ANALYSIS:
# Creating single list items and assigning them to them main list (SLOWEST)
def method_1():
    list = []

    for list_value in range(10000):
        list = list + [list_value]

# Using Python's inbuilt list append method (FASTER)
def method_2():
    list = []

    for list_value in range(10000):
        list.append(list_value)

# Using list comprehension to generate the list (FAST)
def method_3():
    list = [list_value for list_value in range(10000)]

# Using Python's inbuilt range() method (FASTEST)
def method_4():
    list = range(10000)

# Test code: %timeit function returns how fast a given function takes to run.
%timeit method_1()
%timeit method_2()
%timeit method_3()
%timeit method_4()

123 ms ± 9.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
707 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
348 µs ± 22.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
274 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### Array Sequences

- Array sequences in Python: Lists, Tuples and Strings, which all support indexing and index-based assignment and are
  referencial arrays.

A) Low level arrays & computer architecture:

  - Data is stored in memory in form of byte(8 bits)
  - Computers use unique consequtive memory addresses to refer to each byte e.g. byte #2144 or byte #2147
  - Computer memory is designed so that any byte can be accessed efficiently e.g constant time thus the name RAM.
  - Programming languages keep an association between an identifier e.g varible name and the memory address for           value stored in memory.
  - A group of related variables can be stored in a contiguous (neighbouring) portion of memory, which can be deonted as an array.
  - A text string is stored an an ordered sequence of unicode characters.
  - Python stores each character using 16bits (2 btyes). Thus "SAMPLE" would take 12 consequtive bytes of memory each with a unique address.
  - Each location within an array(2bytes always) is a cell, which uses an index to describe its location.
  - Memory addresses used to find any item in memory can then be computed in constant time to find any location by using the formula:
  
$$Start Cell Address = Array Start Address + (Cell Size * Index)$$
  
B) Referential Arrays (High level abstraction): 

- Using arrays to represent any number of strings but to ensure access to any of them is done efficiently i.e           constant time, then each cell of the array must have the same number of bytes, and in python for characters its 2.
- The solution is not to have arrays of the characters (objects) themselves rather have arrays of references to the     objects stored in memory, and each of the array cells will have 2 bytes.
- Python stores arrays in this manner as object references and as such all array operations involve creating or         changing references to objects, often a list may contain multiple references to the same object and an object may     be referenced by multiple lists.

- Examples:
 - List slicing results to a new list but referencing/pointing the same old objects in the original list.
 - Re-assignment: e.g of a list items to another list will result in the new list pointing to the old list items. If    one of the cells of the new items is assigned a different value not in the original list, then a new object is        created away from the two list and the cell pointed to that new object.
 - Shallow copies ie. backup = list(primes), duplicates the previous object's reference list, thus if the objects are    mutable, changing the object reference from the new list, will change the old list references as well.
 - Deep copies: if the object are mutable, you can create a new list with new references using the deepcopy function,    thus the two list will now be independent.
 - counter = [0]*8 means create a list of 8 cells all with zeros, but this actually creates one integer object and a    list counter with 8 cells all referencecing the one object. counter [2] +=1, will create a new object as per the      operation and change the reference of cell [2] to the new object.\
 - primes.extend(extras), means that references to the objects referenced by the extras list are added to the end of    the primes reference list.

C) Dynamic arrays:

- In Python you dont need to specify the array size when creating one. Python already holds some memory, usually abit   larger than is curently needed and as the list grows, it will grab more memory in stages and again abit more that     it needs at that point in time. See demo below.
- Theoretical implementation of a dynamic array: beginning with the first-sized array (A), create a second array (B)   twice the size of the orginal one. Create reference in array B to point to the objects currently in array A, then     reassign reference of array A to array B and garbage collect the memory previously used by array A, so now we have   new array A with all the object references as before but with additional memory space.

### Arrays vs List in Python

- Similarities:

    - Items are enclosed in square brackets.
    - Are ordered – i.e. the items in the list appear in a specific order. This enables us to use an index to       access to any item.
    - Are mutable, which means you can add or remove items after a list's creation.
    - Elements do not need to be unique. Item duplication is possible, as each element has its own distinct         place and can be accessed separately through the index.

- Differences:

    - Elements can be of different types in lists while thye need to be of the same type in arrays
    - Arrays need to be imported from the array module
    - Arrays store data more compactly compared to lists thus are more efficient for storing large data
    - Array are more efficient for numerical operations

_NB: Numpy arrays can store elements of different data types_    


In [1]:
# HOW DYNAMIC ARRAYS WORK
def dynamic_arrays():
    import sys

    n = 50
    array = []

    for i in range(n):
        a = len(array)
        b = sys.getsizeof(array)
        print ("Length of array is {} and size of the array is {} bytes".format(a, b))
        array.append(n)

# Test code:
dynamic_arrays()

Length of array is 0 and size of the array is 64 bytes
Length of array is 1 and size of the array is 96 bytes
Length of array is 2 and size of the array is 96 bytes
Length of array is 3 and size of the array is 96 bytes
Length of array is 4 and size of the array is 96 bytes
Length of array is 5 and size of the array is 128 bytes
Length of array is 6 and size of the array is 128 bytes
Length of array is 7 and size of the array is 128 bytes
Length of array is 8 and size of the array is 128 bytes
Length of array is 9 and size of the array is 192 bytes
Length of array is 10 and size of the array is 192 bytes
Length of array is 11 and size of the array is 192 bytes
Length of array is 12 and size of the array is 192 bytes
Length of array is 13 and size of the array is 192 bytes
Length of array is 14 and size of the array is 192 bytes
Length of array is 15 and size of the array is 192 bytes
Length of array is 16 and size of the array is 192 bytes
Length of array is 17 and size of the array is

#### Demo for a dynamic array class implementation

In [None]:
import ctypes

class DynamicArrays(object):
    ''' - This class creates and manipulates new dynamic arrays.
        - Actions of the class include:
            1. Creating a dynamic array, which resize effiently to accomodate items as they are added.
            2. Getting the length of a created array object.
            3. Appending items to created arrays.  '''
    
    def __init__(self):
        ''' - Class constructor that initializes a new object for the class, when called.
            ** To create a new class object: new_DynamicArray_object = DynamicArray() '''
        
        self.array_items_count = 0  # No items in a new array object
        self.array_capacity_count = 1  # Each new array object can accomodate 1 items by default
        self.array_A = self._makeRawArray(self.array_capacity_count)  # Initialize a default array with capacity of 1
        
    def __getitem__(self, index):
        ''' - Item retrieves an item from an array created, given its index. 
            ** To get an item from an array 'test_array': testArray[index] '''
        
        if 0 > index > self.array_items_count:
            return indexError('Array index is large than array size')
        
        return self.array_A[index]
        
    def length(self):
        ''' - Gets the length of a created array object, returns the length as an integer, representing the 
              number of items are in the array.
            ** To use this method on an array 'test_array': length(test_array) '''
        
        return self.array_items_count
    
    def append(self, item_to_add):
        ''' -  Adds a new item at the end of an array.
            ** To use this method on an array 'test_array': test_array.append(item_to_append) '''
        
        # If the array cannot accomodate any more items, resize it to 2x its current capacity        
        if self.array_items_count == self.array_capacity_count:
            self._resize(2*self.array_capacity_count)
        
        # Then append the new item
        self.array_A[self.array_items_count] = item_to_add
        
        # Then pdate the current array object's item count
        self.array_items_count +=1
        
    def _resize(self, new_array_capacity_count):
        ''' - This private method 'cannot be accesed via class objects' ammends the size of the array,
             when new items are added, hence giving this class the dyanamic capability '''
        
        # Create a temp array twice the size of the current one
        array_B = self._makeRawArray(new_array_capacity_count)
        
        
        # Add all items in array_A to array_B {creating pointers to the array_A elements in array_B}
        for item_index in range(self.array_items_count):
            array_B[item_index] = self.array_A[item_index]
        
        # Re-assign reference of array_B back to array_A
        self.array_A = array_B
        
        # Then update the current capacity of the new array object
        self.array_capacity_count = new_array_capacity_count
        
    def _makeRawArray(self, required_array_capacity_count):
        ''' - This private method creates the required array in memory and returns it.
            - It uses the python ctypes library to achieve this '''
        
        return (required_array_capacity_count * ctypes.py_object)()

#### Test code for the DynamicArray class

In [None]:
# Create new array
my_dynamic_array = DynamicArrays()

In [None]:
# Get the length of the new array
my_dynamic_array.length()

In [None]:
# Add 3 three items to the array and display it's contents and new length
for i in range(4):
    my_dynamic_array.append(i)
    print(my_dynamic_array[i])

In [None]:
# Fetch last item in the array
my_dynamic_array[my_dynamic_array.length()-1]

In [None]:
# Proof that arrays from this class are dynamic
import sys
test_dynamic_array = DynamicArrays()

n = 50
for i in range(n):
    a = test_dynamic_array.length()
    b = sys.getsizeof(test_dynamic_array)
    print ("Test_dynamic_array length is {} and its size is {} bytes".format(a, b))
    test_dynamic_array.append(n)      

### Array Sequences Interview Questions

In [None]:
# TDD for array interview questions

import unittest

class TestArrayInterviewMethods(unittest.TestCase):
    
    def setUp(self):
        self.test = 'clint eastwood'
        
    
# Anagram solution tests    
    def test_anagram(self):
        
        self.assertEqual(anagram(self.test, 'old west action'), True)
        self.assertEqual(anagram('clint eastwood', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram('oldwestaction', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram('clint eastwood', 'my neighbour'), False)
        self.assertEqual(anagram('old west action', 'my neighbour'), False)
        self.assertEqual(anagram('old west action', 1), False)
    
    def test_anagram_sol1(self):
        
        self.assertEqual(anagram_sol1('clint eastwood', 'old west action'), True)
        self.assertEqual(anagram_sol1('clint eastwood', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram_sol1('oldwestaction', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram_sol1('clint eastwood', 'my neighbour'), False)
        self.assertEqual(anagram_sol1('old west action', 'my neighbour'), False)
        self.assertEqual(anagram_sol1('old west action', 1), False)
        
    def test_anagram_sol2(self):
        
        self.assertEqual(anagram_sol2('clint eastwood', 'old west action'), True)
        self.assertEqual(anagram_sol2('clint eastwood', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram_sol2('oldwestaction', 'CLINT EASTWOOD'), True)
        self.assertEqual(anagram_sol2('clint eastwood', 'my neighbour'), False)
        self.assertEqual(anagram_sol2('old west action', 'my neighbour'), False)
        
        
# Pair sum solution tests
    def test_pair_sum(self):
        
        self.assertEqual(pair_sum([1, 2, 4, 5], 'my neighbour'), 'Pass in inputs in the form: ([int], int)') 
        self.assertEqual(pair_sum('my neighbour', 6), 'Pass in inputs in the form: ([int], int)')
        self.assertEqual(pair_sum([1, 3, 2, 2], 4), 2)
#         self.assertEqual(pair_sum([1, 2, 3, 1], 3), 1)   > My solution has a bug
        self.assertEqual(pair_sum([1, 9, 2, 8, 3, 7, 4, 6, 5, 5, 13, 14, 11, 13, -1], 10), 6)
        
    def test_pair_sum2(self):
        
        self.assertEqual(pair_sum2([1, 2, 4, 5], 'my neighbour'), 'Pass in inputs in the form: ([int], int)') 
        self.assertEqual(pair_sum2('my neighbour', 6), 'Pass in inputs in the form: ([int], int)')
        self.assertEqual(pair_sum2([1, 3, 2, 2], 4), 2)
        self.assertEqual(pair_sum2([1, 2, 3, 1], 3), 1)
        self.assertEqual(pair_sum2([1, 9, 2, 8, 3, 7, 4, 6, 5, 5, 13, 14, 11, 13, -1], 10), 6)
        
        
# Find missing elements solution tests
    def test_finder(self):
        
        self.assertEqual(finder([5,5,7,7],[5,7,7]), 5)
        self.assertEqual(finder([1,2,3,4,5,6,7],[3,7,2,1,4,6]), 5)
        self.assertEqual(finder([9,8,7,6,5,4,3,2,1],[9,8,7,5,4,3,2,1]), 6)
        self.assertEqual(finder([5,5,7,7],[5,5,7]), 7)
        self.assertEqual(finder(['a','b','c','d'],['b','d','a']), 'c')
        
        
# Find the largest continuous sum solution tests
    def test_largest_cont_sum(self):
        
        self.assertEqual(largest_cont_sum([1,2,-1,3,4,-1]),9)
        self.assertEqual(largest_cont_sum([1,2,-1,3,4,10,10,-10,-1]),29)
#         self.assertEqual(largest_cont_sum([-1,1]),1)  # Mine does not ignore -ve sums
        self.assertEqual(largest_cont_sum([1,-1,2]),2)
        
    def large_cont_sum2(self):

        self.assertEqual(largest_cont_sum([1,2,-1,3,4,-1]),9)
        self.assertEqual(largest_cont_sum([1,2,-1,3,4,10,10,-10,-1]),29)
        self.assertEqual(largest_cont_sum([-1,1]),1)
        self.assertEqual(largest_cont_sum([1,-1,2]),2)
    
    
# String reveral solution tests
    def test_reverse_words(self):
        
        self.assertEqual(reverse_words('    space before'),'before space')
        self.assertEqual(reverse_words('space after     '),'after space')
        self.assertEqual(reverse_words('   Hello John    how are you   '),'you are how John Hello')
        self.assertEqual(reverse_words('1'),'1')
        
    def test_reversed_words2(self):
        
        self.assertEqual(reversed_words2('    space before'),'before space')
        self.assertEqual(reversed_words2('space after     '),'after space')
        self.assertEqual(reversed_words2('   Hello John    how are you   '),'you are how John Hello')
        self.assertEqual(reversed_words2('1'),'1')
        
# String compression (Run length data compression)
    def test_string_compress(self):
        
        self.assertEqual(string_compress(''),'')
        self.assertEqual(string_compress(' '),'')
        self.assertEqual(string_compress(442333),'Input not a string')
        self.assertEqual(string_compress('AABBCC'),'A2B2C2')
        self.assertEqual(string_compress('AAABCCC2DDD'),'A3B1C321D3')
        self.assertEqual(string_compress('AAcccEbbdDdD'),'A2c3E1b2d1D1d1D1')
        self.assertEqual(string_compress('22#??UUa///3*'),'22#1?2U2a1/331*1')
        
# Unique characters
    def test_uni_char(self):
        
        self.assertEqual(uni_char('goo'), False)
        self.assertEqual(uni_char('abcd'), True)
        self.assertEqual(uni_char(''), True)
        self.assertEqual(uni_char(' '), True)
        self.assertEqual(uni_char('  '), False)
        self.assertEqual(uni_char('1233'), False)
        self.assertEqual(uni_char('456'), True)
        
    def test_uni_char1(self):
        self.assertEqual(uni_char1('goo'), False)
        self.assertEqual(uni_char1('abcd'), True)
        self.assertEqual(uni_char1(''), True)
        self.assertEqual(uni_char1(' '), True)
        self.assertEqual(uni_char1('  '), False)
        self.assertEqual(uni_char1('1233'), False)
        self.assertEqual(uni_char1('456'), True)
        

        
# Run tests       
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

In [None]:
# 1. ANAGRAM CHECK

''' Problem: Given two strings, check to see if they are anagrams. An anagram is when the two strings
    can be written using the exact same letters (so you can just rearrange the letters to get a different phrase or
    word). '''

# My Solution:
def anagram(input1, input2):
    ''' Compares input1 and input2 and returns True if they are anagrams and False if they are not or if either
        inputs is not a str '''
    
    ''' 1. Clean the strings (remove spaces and make them all lower case) 
        2. Check condition1, if the strings are of same length and if true,
        3. Sort the string in alphabetical order, then
        4. Check condition 2, if the strings have matching characters '''
    
    check_results = False
    
    if (isinstance(input1, str) and isinstance(input2, str)):
         
        clean_string1 = input1.lower().replace(' ', '')
        clean_string2 = input2.lower().replace(' ', '')
        
        if (len(clean_string1) == len(clean_string2)):
            
            sorted_clean_string1 = ''.join(sorted(clean_string1))
            sorted_clean_string2 = ''.join(sorted(clean_string2))
            
            for index in range(0, len(sorted_clean_string1)):
                if (sorted_clean_string1[index] != sorted_clean_string2[index]):
                    check_results = False
                    break
                else:
                    check_results = True
    
    return check_results

# Tutorial Solution 1:
def anagram_sol1(s1, s2):
    ''' (Preferred, because it uses non-python specific features) : LOGIC > If two strings has the same letter
        frequency/ occurence, they are anagram '''
    
    # Check if inputs are str type
    if (not(isinstance(s1, str)) or not(isinstance(s2, str))):
        return False
    
    else:
        # Remove spaces make lower case and compare lengths
        s1 = s1.replace(' ', '').lower()
        s2 = s2.replace(' ', '').lower()
        
        if len(s1) != len(s2):
            return False
        
        else:
            # Check if every letter in s1 occurs the same number of times in s2
            count = {}

            for letter in s1:
                if letter in count:
                    count[letter] += 1  # Add the value in the found key by 1
                else:
                    count[letter] = 1  # Create a new key for letter not found and assign 1 to as it value

            for letter in s2:
                if letter in count:
                    count[letter] -= 1  # Subtract the value of found key by 1
                else:
                    count[letter] = 1  # Create a new key for letter not found and assign 1 to as it value

            for key in count:
                if count[key] != 0:
                    return False

            return True
    
    
# Tutorial Solution 2:
def anagram_sol2(s1, s2):
    ''' LOGIC > If two strings are equal to each other once sorted, they are anagrams '''
    
    s1 = sorted(s1.replace(' ', '').lower())
    s2 = sorted(s2.replace(' ', '').lower())
    
    return s1 == s2    

In [3]:
# 2. PAIR_SUM CHECK

''' Problem: Given an integer array, output all the unique pairs that sum up to a specific value k
    e.g. pair_sum([1,3,2,2],4) outputs 2 '''

# {HAS BUG} - My Solution: O(n^2)
def pair_sum(int_list, k):
    ''' Solution steps:
            1. For each item in the list, add it to the other items and check if the sum is equal to k, if it
                is, create a tuple of the two items and append to a list.
            2. Since ther will be a duplicate of each tuple pair, count the total number and divide by two :-) '''
    
    # Edge case checks
    if not(isinstance(int_list, list)) or not(isinstance(k, int)) or len(int_list) < 2:
        return 'Pass in inputs in the form: ([int], int)'
    
    else:
        tuples_list = []
        tuples_count = 0
    
        # Step 1:
        for i in range(len(int_list)):
            for j in range(len(int_list)):
                if (int_list[i] + int_list[j] == k) and (i != j):
                    tuples_list.append((int_list[i], int_list[j]))
        
        # Step 2:
        for z in tuples_list:
            if isinstance(z, tuple):
                tuples_count += 1
        
        return int(tuples_count/2)
    
    
# Tutorial Solution: (O(n) by using sets to reduce more than one pass over the array to one pass. This is a 
# common strategy for tackling such problems like this one)
def pair_sum2(int_list, k):
    
    # Edge case checks
    if not(isinstance(int_list, list)) or not(isinstance(k, int)) or len(int_list) < 2:
        return 'Pass in inputs in the form: ([int], int)'
    
    else:
        
        # Set to track array items already worked on
        seen = set()
        # Set to hold unique pairs that add up to k
        output = set()
        
        for num in int_list:
            # The value to look for in the array that added to the current item in the loop adds up to k
            target = k - num
            
            ''' Logic analogy: I am num and spouse to be is target, for evey num I ask is my spouse to be in                  the engagement zone(seen), if NOT, then I take myself to the engagement zone and wait for her,                if she is, the I pick her from the engagement zone and we both head to the marriage zone                      (output)
            '''
            
            if target not in seen:
                seen.add(num)
            
            else:
                output.add( (min(num, target), max(num, target)) )
        
        
        # Python printing trick.
#         print('\n'.join(map(str, list(output))))
        
        ''' map() function returns a list of the results after applying the given function to each item of
            a given iterable (list, tuple etc.).

        Syntax : map(fun, iter)
        Parameters :
        fun : It is a function to which map passes each element of given iterable.
        iter : It is a iterable which is to be mapped.  '''     
        
    return len(output)

In [6]:
# 3. FIND MISSING ELEMENT

''' Consider an array of non-negative integers. A second array is formed by shuffling the elements of the
    first array and deleting a random element. Given these two arrays, find which element is missing in the
    second array.
    
    Input: finder([1,2,3,4,5,6,7],[3,7,2,1,4,6])
    Output: 5 is the missing number
'''

# My solution: O(n), where n is the len(shuf_list)
def finder(orig_list, shuf_list):
    # Test for edge case: Both list have atleast two elements.
    
    # Check for missing element
    for element in shuf_list:
        if element in orig_list:
            orig_list.pop(orig_list.index(element))          
            
    if len(orig_list) > 0:
        return orig_list[0]
    
    return 0

# Usig set difference (works for diffs greater than 1, otherwise set(arr) - set(arr2), works for diffs == 1)
def list_difference(arr1, arr2):

    missing_elem = set(arr1).symmetric_difference(set(arr2))
    return list(missing_elem)[0]

list_difference([1,2,3,4,5,6,7],[3,7,2,1,4,6])

5

In [None]:
# 4. LARGEST CONTINUOUS SUM

''' Given an array of integers (positive and negative) find the largest continuous sum.
    Input: large_cont_sum([1,2,-1,3,4,10,10,-10,-1])
    Output: 29
'''

# My Solution: {DOES NOT IGNORE -VE SUMS}
def largest_cont_sum(list):
    
    # Do edge case checks
    
    # Problem check
    sum = 0
    sum_list = []
    
    for i in range(len(list)):
        sum += list[i]
        sum_list.append(sum)
            
    return max(sum_list)


# Lecture solution ignores -ve sums:
def large_cont_sum2(arr):
    
    '''
        If the array is all positive, then the result is simply the sum of all numbers. The negative numbers
        in the array will cause us to need to begin checking sequences.

        We start summing up the numbers and store in a current sum variable. 
        
        After adding each element, we check whether the current sum is larger than maximum sum encountered so         far. If it is, we update the maximum sum. 
        
        As long as the current sum is positive, we keep adding the numbers.
        
        When the current sum becomes negative, we start with a new current sum. Because a negative current            sum will only decrease the sum of a future sequence. 
        
        Note that we don’t reset the current sum to 0 because the array can contain all negative integers.            Then the result would be the largest negative number.
    
    '''
    
    # Check to see if array is length 0
    if len(arr)==0: 
        return 0
    
    # Start the max and current sum at the first element
    max_sum=current_sum=arr[0] 
    
    # For every element in array
    for num in arr[1:]: 
        
        # Set current sum as the higher of the current max and num
        current_sum=max(current_sum+num, num)
        
        # Set max as the higher between the currentSum and the current max
        max_sum=max(current_sum, max_sum) 
        
    return max_sum 

In [None]:
# 5. SENTENCE REVERSAL

''' Given a string of words, reverse all the words. 
        Given: 'This is the best'
        Return: 'best the is This'
        
    As part of this exercise you should remove all leading and trailing whitespace. So that inputs such as:
        Given: '  space here'  and 'space here      '
        both become: 'here space'
'''

# My solution (Pythonic way):
def reverse_words(sentence):
    # Edge case check: sentence is not empty and is of type str
    
    # Problem solution
    word_list = sentence.split()
    reversed_word_list = word_list[::-1]
    reversed_sentence = " ".join(reversed_word_list)
    
    return reversed_sentence

# Language independent way:
def reversed_words2(string):
    # Edge case check: sentence is not empty and is of type str
    
    # Proble solution
    word = ''
    word_list = []
    tracker = 0
    
    # Fetch the words
    while tracker < (len(string)):
        count = tracker
        while string[count] != ' ':
            word += string[count]
            if count == (len(string)-1):
                break
            count +=1
        if word:
            word_list.append(word)
        word = ''
        tracker = count
        tracker+=1
        
    # Reverse words 
    reversed_word_list = []
    tracker1 = (len(word_list)-1)
    while tracker1 >=0:
        reversed_word_list.append(word_list[tracker1])
        tracker1 -=1
        
    # Return reversed joined word list to string           
    return ' '.join(reversed_word_list)

In [7]:
# 6. STRING COMPRESSION
''' Given a string AABBcc, compress it such that the result is A2B2c2, being mindful of letter case
    THIS ALGORITHM IS CALLED THE RUN LENGTH COMPRESSION ALGORITHM (RLE), and is a simple form of lossless data
    compression.
'''

# My solution: O(n)
def string_compress(string):
    ''' Returns the sequential number of times a char has been repeated in a string, does not include white
        spaces
    '''
    
    if not(isinstance(string, str)):
        # Test for non-strings
        return 'Input not a string'

    elif len(string) == 0:
        # Test for empty string
        return ''
    
    #Add check space at end of string
    my_str = string + ' '
    
    i = 0  # loop counter
    count = 1  # Character count tracker
    compressed_lst = []
    
    while True:
        # Loop through the string comparing sequential characters
        if my_str[i] == my_str[i+1] and my_str[i] != ' ':
            count += 1
        elif my_str[i] != my_str[i+1] and my_str[i] != ' ':   
            compressed_lst.append(my_str[i] + str(count))
            count = 1
        
        # Emulate a do while loop to ensure the last character is also parsed
        if my_str[i+1] == ' ':
            break
        
        # Increment loop counter   
        i += 1
        
    return ''.join(compressed_lst)

string_compress('wwwaadexxxxxybb')

'w3a2d1e1x5y1b2'

In [20]:
# 7. UNIQUE CHARACTERS
''' Given a string, determine if it is comprised of all unique characters. For example, the string 'abcde' has
    all unique characters and should return True. The string 'aabcde' contains duplicate characters and should
    return false. 
'''

    
# My solution improved: O(1)
def uni_char(string):
    return len(set(string)) == len(string)

# MANUAL WAY, RECOMMENDED FOR INTERVIEWS: O(n)
def uni_char1(string):
    set1 = set()
    
    for char in string:
        if char in set1:
            return False
        else:
            set1.add(char)
    
    return True

### Stacks
- Are ORDERED item collections where the addition and removal of items always happens on the same end ('TOP') i.e follows a LIFO principle.
- Just like arrays, queues and deques these are linear structures, in that elements are arranged sequentially and only one element can be directly reached.
- The opposite end is called the 'BASE'
- They provide ordering based on the lenght of time in the collection, with items closer to the base being in the stack for longer.
- To add items to the top, you perform a 'push' and to remove you perform a 'pop'
- Stacks are important as they can be used to reverse the order of items by simply pushing and popping between stacks.
- Navigating between webpages is an example of stack implementation, where the 1st page viewed goes onto the base of the stack and subsequent pages follow, navigating back using the back button, pops the page urls in the reverse order.
- Some useful stack methods:

 1. Stack() - creates an empty stack
 2. push() - adds an item to the top of the stack
 3. pop() - removes an item from the top of a stack
 4. peek() - returns the top item without popping it
 5. isEmpty() - returns a boolean if the stack is empty or not
 6. size() - returns the number of elements in the stack
 
 
### Implementing a stack class

In [None]:
class Stack(object):
    ''' Implements a stack using a list. By doing this we restrict array operations to stack operations only '''
    
    def __init__(self):
        self.items = []
        
    def push(self, item):
        self.items.append(item)
        
    def pop(self):
        return self.items.pop()
    
    def peek(self):
        return self.items[-1]
    
    def isEmpty(self):
        return len(self.items) == 0
    
    def size(self):
        return len(self.items)
    
    def get_stack(self):
        return self.items

### Queues

- Ordered item collectons where items you 'enqueue' to add an item at the 'rear' and 'dequeue' to remove an item from the 'front'.
- Items are removed sequentially starting with the one at the front to the one at the rear.
- Queues follow the FIFO principle.
- Some useful queue methods:

    1. Queue() - creates a new queue.
    2. enqueue(item) - adds a new item to the rear of the queue.
    3. dequeue() - removes the item at the front of the queue.
    4. isEmpty() - check whether the queue is empty
    5. size() - return the number of items in the queue.
    
    
### Implementing a Queue

In [None]:
class Queue:
    
    def __init__(self):
        self.items = []
        
    def isEmpty(self):
        return self.items == []
    
    def enqueue(self, item):
        self.items.insert(0, item)  # Index 0 will be the rear
        
    def dequeue(self):
        self.items.pop()
        
    def size(self):
        return len(self.items)

### Deques

- An ordered double-sided item collection, having a 'front' and a 'rear'.
- Items can be added or removed from both ends. They dont enforce LIFO or FIFO principles, thus its up to the
  implementer to use them consistently.
- More on deques: 
    https://docs.python.org/3.3/library/collections.html#collections.deque
    https://www.geeksforgeeks.org/deque-in-python/


### Implementing a deque class using a list

    1. Deque() - creates a new dequeue
    2. addFront() - adds an item to the front
    3. addRear() - adds an item to the rear
    4. removeFront() - removes an item from the front
    5. removeRear() - removes an item from the rear
    6. isEmpty() - returns whether the dequeue is empty or not
    7. size() - returns the size of the dequeue.
    

In [None]:
class Deque:
    
    def __init__(self):
        self.items = []
        
    def isEmpty(self):
        return self.items == []
    
    def addFront(self, item):
        self.items.append(item)
        
    def addRear(self, item):
        self.items.insert(0, item)  # Index 0 will be the rear
        
    def removeFront(self):
        return self.items.pop()
    
    def removeRear(self):
        return self.items.pop(0)
    
    def size(self):
        return len(self.items)

### Stack, Queue and Deque Interview Problems

In [None]:
# TDD for S,Q&D interview problems

import unittest

class TestSQDInterviewMethods(unittest.TestCase):        
    
# Balanced parentheses tests    
    def test_balance_check(self):
        
        self.assertEqual(balance_check('[](){([[[]]])}('), False)
        self.assertEqual(balance_check('[[[]])]'), False)
        self.assertEqual(balance_check('[{{{(())}}}]((()))'), True)
        
        
        
# Run tests       
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

In [None]:
# 1. Balanced parentheses problem (Balanced Parentheses algorithm and a common interview question)

''' Given a string of opening and closing parentheses, check whether it’s balanced. We have 3 types of
    parentheses: round brackets: (), square brackets: [], and curly brackets: {}. Assume that the string doesn’t
    contain any other character than these, no spaces words or numbers. As a reminder, balanced parentheses
    require every opening parenthesis to be closed in the reverse order opened. 
    
    For example ‘([])’ is balanced but ‘([)]’ is not. You can assume the input string has no spaces.
    
    
    (NB) For a sequence of brackets to be balanced:
        1. It must not contain unmatched parentheses
        2. The brackets enclosed within itself must also be matched
'''

# Implement a Stack class:
class Stack:
    ''' Adds and removes items from the TOP of the stack (i.e the nth index) '''
    
    def __init__(self):
        self.items = []
        
    def add_elem(self, item):
        self.items.append(item)
        
    def remove_element(self):
        return self.items.pop()
    
    def is_empty(self):
        return len(self.items) == 0
    
    
def balance_check(s):
    ''' Use the Stack class '''
    ''' Can also use a list as a stack '''
    
    # If the total number of parentheses is odd, they cannot be matched
    if len(s)%2 != 0:
        return False
    
    # Ref tuple of opening paretheses
    opening_parens = ('{[(')
    
    # Ref tuple of tuples of matching pair paratheses
    matching_parens = (('{','}'), ('[',']'), ('(',')'))
    
    # Initialize the Stack class
    my_stack = Stack()
    
    # Iterate through the string of parantheses to check if balanced
    for paran in s:        
        if  paran in opening_parens:
            # if curr paran is opening add it to stack
            my_stack.add_elem(paran)
        
        else:
            # is curr paran is closing check if there there opening paratheses to compare with, if not then by
            # the rules of a balanced parantheses string, there is no match for the curr paran, thus the entire
            # string is not balanced.
            
            if my_stack.is_empty():
                return False
            
            # Else get the last item in stack (opening paran) and compare with curr paran (closing paran), if not
            # a match then entire string is not balanced by the rules of a balanced parantheses string.
            if (my_stack.remove_element(), paran) not in matching_parens:
                return False
        
    # If non of the code in the 'else' block is triggered, then the string is balanced
    return True

# 2. Implement a queue using two stacks. CLASSIC INTERVIEW PROBLEM!

''' Given the Stack class below, implement a Queue class using two stacks! Note, this is a "classic" interview
    problem. Use a Python list data structure as your Stack. '''

# My solution:
class Queue2Stacks(object):
    
    def __init__(self):
        
        # Two Stacks
        self.instack = []
        self.outstack = []
     
    def enqueue(self,element):
        
        # Add an enqueue with the "IN" stack
        self.instack.append(element)
    
    def dequeue(self):
        if not self.outstack:
            while self.instack:
                # Add the elements to the outstack to reverse the order when called
                self.outstack.append(self.instack.pop())
                
        if len(self.outstack) != 0:
            return self.outstack.pop()
        return 'No more items in the queue'
    
    
# Test the class
q2s = Queue2Stacks()

q2s.enqueue('Ken')
q2s.enqueue('Shee')
print(q2s.dequeue())
q2s.enqueue('Chris')
print(q2s.dequeue())
q2s.enqueue('Mum')
print(q2s.dequeue())
print(q2s.dequeue())
print(q2s.dequeue())

### Singly Linked Lists

- A collection of nodes that collectively form a linear sequence.
- Each node has:
    
    1. A ref to an object that is an element of the sequence
    2. a reference to the next node in the sequence
    
    
- The list instance maintains a member named 'head' that references the first node of the list also called the head.
- Since the last node of the list (the tail) points to nothing, some implementation of a singly linked list maintain a member called tail that also points the tail even though the (n-1) has a reference to the tail as well. Note the tail can be identified as the node that has 'None' in its reference.
- Traversing the list is moving through the nodes to check the pointers, aka (link or pointer hopping).
- Linked list do not have a predetermined fixed size, rather use space proportionally to its current number of elements.
- To insert a node at the head:

    1. Create the new node
    2. Set it element to the new element
    3. Set its reference to point to the current head
    4. Set the list's head to point to the new node

- To insert an node at the tail of the list:

    1. Create a new node
    2. Set it element to the new element
    3. Assign it's reference to None
    4. Set the next reference in the list to point to the new node
    5. Set the list's tail reference to the new node
    
- To delete a node a the head:

    1. Set the head reference of the list to the next next
    2. Delete the current node a the list head.
    
    
- Removing a node from the tail of a singly linked list efficiently is challenged by the inability to identify the (n-1) node, since we cannot traverse the list from the tail. Such an operation is possible efficiently with a doubly linked list.


### Implementing a singly linked list

In [None]:
# Usually create a node class and use it to implement the linked list

class Node:
    
    def __init__(self, name, value):
        self.name = name
        self.value = value  # Stores the value stored in a node
        self.next_node = None  # Will be the next node in the list

# Create nodes and assign their elements
a = Node('a', 1)  # Node a with value 1
b = Node('b', 2)  # Node a with value 1
c = Node('c', 3)  # Node a with value 1

# Set the node pointer
a.next_node = b.name
b.next_node = c.name
c.next_node = None

# Check the linked list
print('Node a: Value :', a.value, ' Next node: ', a.next_node)
print('Node b: Value :', b.value, ' Next node: ', b.next_node)
print('Node c: Value :', c.value, ' Next node: ', c.next_node)

### Doubly Linked Lists

- Linked lists that keep references to the nodes before and after to allow greater variety of O(1) operations.
- In addition to 'next' referring to the next node, we have 'prev' for the node before.
- They have 'dummy' nodes (aka 'sentinels'/ 'guards) at the beginning (aka header node) and at the end (aka trailer node).
- Nodes are added between existing nodes e.g. adding the beginning is done between the header and the node following the header.
- Creating a new node between node D & E for example:

    1. Create the new node and set its value
    2. Set its pointer to nodes D (prev) & E (next)
    3. Set the D's next pointer to the new node and E's prev pointer to the new node.
    
- To delete nodes, simply link them out i.e given nodes C, D and E, to delete node D:

    1. Set the next point for node C to node E
    2. Set the prev pointer for node E to C
    3. Node D then ceases being part of the linked list and will be reclaimed by the system.
    

### Implementing a doubly linked list

In [None]:
# Note this can also be a circularly linked list where the trailer points to the header, forming a continous loop
class Node:
    
    def __init__(self, name, value):
        
        self.name = name
        self.value = value
        self.next_node = None
        self.prev_node = None
        

# Implement the Node class
# Set up the nodes
a = Node('A', 1)
b = Node('B', 2)
c = Node('C', 3)

# Set the node pointers
a.next_node = b.name
b.prev_node = a.name
b.next_node = c.name
c.prev_node = b.name

# Check the nodes
print('Node a: Value :', a.value, 'Prev Node:', a.prev_node, 'Next node: ', a.next_node)
print('Node b: Value :', b.value, 'Prev Node:', b.prev_node, 'Next node: ', b.next_node)
print('Node c: Value :', c.value, 'Prev Node:', c.prev_node, 'Next node: ', c.next_node)

### Linked List Interview problems

In [None]:
# TDD for Linked List interview problems

import unittest

class Node(object):
    
    def __init__(self,value):
        
        self.value = value
        self.nextnode = None
        

class TestLinkedListMethods(unittest.TestCase):
    
# Singly list cycle check
    def test_cycle_check(self):
        
        # CREATE CYCLE LIST
        a = Node(1)
        b = Node(2)
        c = Node(3)
        a.nextnode = b
        b.nextnode = c
        c.nextnode = a # Cycle Here!

        # CREATE NON CYCLE LIST
        x = Node(1)
        y = Node(2)
        z = Node(3)
        x.nextnode = y
        y.nextnode = z
        
        # TEST THE TWO LINKED LISTS
        self.assertEqual(cycle_check(a), True)
        self.assertEqual(cycle_check(x), False)

# Singly list nth_to_last node
    def test_nth_to_last_node(self):
        
        # Set up linked list
        a = Node(1)
        b = Node(2)
        c = Node(3)
        d = Node(4)
        e = Node(5)

        a.nextnode = b
        b.nextnode = c
        c.nextnode = d
        d.nextnode = e
        
        # Run test
        self.assertEqual(nth_to_last_node(5, a), 1)
        
        
        
# Run tests       
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

In [None]:
# 1. Singly linked list interview problem

''' Given a singly linked list, write a function which takes in the first node in a singly linked list and
    returns a boolean indicating if the linked list contains a "cycle".

    A cycle is when a node's next point actually points back to a previous node in the list. This is also
    sometimes known as a circularly linked list. '''

# -------------------------------------------------------------------------------------------------------------

''' Lecture solution: Set up runners who will run along the linked list, runner_1 and runner_2. Set up runner_2
    to always point two nodes ahead of runner_1, essentially making runner_2 faster than runner_1.
    
    If they both run along the linked list a couple of time, eventually runner_2 will catch up to runner_1
    (lapping runner_1) and they both will end up pointing at the same node. '''

def cycle_check(node):
    
    # Set both runner to the first node in the list
    runner_1 = node
    runner_2 = node
    
    # Loop as long as the faster runner has not reached the end of linked list. If the runner_2 reaches the
    # the end of the list, then there is no cycle and the loop teminates
    while runner_2 != None and runner_2.nextnode != None:
        
        # Move each runner to the next node in the list
        runner_1 = runner_1.nextnode
        runner_2 = runner_2.nextnode.nextnode
        
        # Check if the two runners ahve met up
        if runner_1 == runner_2:
            return True  # There is a cycle, 'return' exits the loop and the function
    
    # If there is no cyle, the loop conditions turn false, as runner_2.nextnode will be 'None' at the end of the
    # linked list, thus exiting the loop
    return False


# 2. Linked list reversal list interview problem

''' Write a function to reverse a Linked List in place. The function will take in the head of the list as input
    and return the new head of the list. '''

class Node(object):
    ''' Test class '''
    
    def __init__(self,name,value):
        
        self.name = name
        self.value = value
        self.nextnode = None

        
# Lecture solution:
def reverse_linked_list(first_node):
    ''' Since we want to do this in place we want to make the funciton operate in O(1) space, meaning we don't
        want to create a new list, so we will simply use the current nodes! Time wise, we can perform the
        reversal in O(n) time.

        We can reverse the list by changing the next pointer of each node. Each node's next pointer should point
        to the previous node.

        In one pass from head to tail of our input list, we will point each node's next pointer to the previous
        element.

        Make sure to copy current.next_node into next_node before setting current.next_node to previous. Let's
        see this solution coded out: '''
    
    current_node = first_node
    prev_node = None
    next_node = None
    
    # While there is a node, will change to false when None is hit at the end of the L.List
    while current_node:        
        next_node = current_node.nextnode  # Save the next node to move to
        current_node.nextnode = prev_node  # Change current node's pointer to previous node (the reversal)
        prev_node = current_node  # Save the next previous node
        current_node = next_node  # Move to the next in the L.list
    
        # When we get to the end of the L.list (where the current node becomes None, return the previous node, coz
        # it will be the last non-None node which is the L.list's tail)
    return prev_node


## TEST THE REVERSE FUNCTION
# Create the linked list
# a = Node('A', 1)
# b = Node('B', 2)
# c = Node('C', 3)
# d = Node('D', 4)

# # Set up order a,b,c,d with values 1,2,3,4
# a.nextnode = b
# b.nextnode = c
# c.nextnode = d

# # Print the list node values
# print(a.nextnode.name)
# print(b.nextnode.name)
# print(c.nextnode.name)

# Reverse the list
# reverse_linked_list(a)

# Print the list node values after reversal
# print(d.nextnode.name)
# print(c.nextnode.name)
# print(b.nextnode.name)


# 3. Linked to nth_last node

'''  Write a function that takes a head node and an integer value n and then returns the nth to last node in the
    linked list. '''

# My solution:
def nth_to_last_node(req_node, list_head):
    
    # Reverse the list
    current_node = list_head
    prev_node = None
    next_node = None
    
    while current_node:       
        next_node = current_node.nextnode
        current_node.nextnode = prev_node
        prev_node = current_node
        current_node = next_node
    
    # Get the next node n times (Last node becomes the first)
    current_node = prev_node  # New first node
    node_count = 1
    
    while node_count != req_node:
        current_node = current_node.nextnode
        node_count += 1
        
    return current_node.value

#-----------------------------------------------------------

# Lecture solution:def nth_to_last_node(n, head):
def nth_to_last_node2(n, head):
    
    left_pointer  = head
    right_pointer = head

    # Set right pointer at n nodes away from head
    for i in range(n-1):
        
        # Check for edge case of not having enough nodes!
        if not right_pointer.nextnode:
            raise LookupError('Error: n is larger than the linked list.')

        # Otherwise, we can set the block
        right_pointer = right_pointer.nextnode

    # Move the block down the linked list
    while right_pointer.nextnode:
        left_pointer  = left_pointer.nextnode
        right_pointer = right_pointer.nextnode

    # Now return left pointer, its at the nth to last element!
    return left_pointer.value

# Test lecture solution
a = Node('A', 1)
b = Node('B', 2)
c = Node('C', 3)
d = Node('D', 4)

a.nextnode = b
b.nextnode = c
c.nextnode = d

# nth_to_last_node2(4, a)

# Recursion

- Is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem (as opposed to iteration).
- Iteration is the technique for executing a block of statements within a computer program for a defined number of repetitions.
- Recursion and iteration can be employed to the same effect but the primary difference is that recursion can be employed as a solution without prior knowledge as to how many times the action will have to repeat, while a successful iteration requires that knowledge forehand.
- Recursion uses the divide and conquor strategy by breaking down a problem into smaller identical versions of itself until it cant be broken down further and solving combining the solutions of all the mini-problems.
- Recursion uses a base case, which is the point at which the problem cant be further sub-divided into an identical version of itself and is what will stop the recursion process.
- Recursive case is the mini-problem being solved at each stage of the recursion process and is the bit that gets repeated.
- There are two main instances of recursion. 

  1. The first is when recursion is used as a technique in which a function makes one or more calls to itself.
  2. The second is when a data structure uses smaller instances of the exact same type of data structure when it      represents itself. Both of these instances are use cases of recursion.
  
  
- The first instance is the most common, in which case the function calls itself feeding the next recursive case as an input, until the base case is reached then the fucntion no longer calls itself and returns the combined results from all the recursive cases.

- The factorial problem is a classic recursion problem where:

    n! = n.(n-1)! = n.(n-1).(n-2).(n-3)...... .(n-n) : where the factor (n-k) is the recursive case
    
   If the recursive case (n-n) is reached then this is the base case and the recursion stops, because at that        point.
   n = 0 and 0! = 1
   
   - Thus for 4! : Break down the problem until it cant be broken down any further i.e the base case is reached
   
           4! = 4.(3)!  (can't be solved until we know what 3! is)
           3! = 3.(2)!  (can't be solved until we know what 2! is) 
           2! = 2.(1)!  (can't be solved until we know what 1! is)
           1! = 1.(0)!  (can't be solved until we know what 0! is)
           0! = 1       (Base case reached, we can evaluate this)
       
   - Move back up combining the results of the mini-problems
     
           1! = 1.1 = 1
           2! = 2.solution from previous recursive case (1) = 2
           3! = 3.solution from previous recursive case (2) = 6
           4! = 4.solution from previous recursive case (6) = 24
           
   - Return the solution of the overall problem
   
   
 ### Factorial problem coded

In [None]:
def factorial(n):
    if n == 0:  # Base case has been reached
        return 1  # The solution for the first solvable recursive mini-problem
    
    else:  # Continue breaking down the problem
        return n*factorial(n-1) # The function returns to itself the solution of the current mini-problem
                                # until it has solved all the possible mini problems, then returns the entire
                                # solution.

# factorial(4)

In [None]:
# J. Portilla: Recursion homework

# Problem 1:

''' Write a recursive function which takes an integer and computes the cumulative sum of 0 to that integer
    For example, if n=4 , return 4+3+2+1+0, which is 10. '''

def recursion_sum(n):
    if n == 0:
        return 0
    else:
        return n + recursion_sum(n - 1)

# recursion_sum(10)

# Problem 2:

''' Given an integer, create a function which returns the sum of all the individual digits in that integer.
    For example: if n = 4321, return 4+3+2+1 '''

def sum_func(n):
    if n == 0:
        return 0
    else:
            # (Modulo returns the remainder of the division) 4321%10 = 1 & (Floor division returns the integer 
            # part of the quotient/ result) 4321// = 432
        return (n%10) + sum_func(n//10)

# sum_func(4321)

# Problem 3:

''' Create a function called word_split() which takes in a string phrase and a set list_of_words.
    The function will then determine if it is possible to split the string in a way in which words can be
    made from the list of words. You can assume the phrase will only contain words found in the dictionary
    if it is completely splittable. 
    
    e.g.
    word_split('themanran',['the','ran','man'])
    >> ['the', 'man', 'ran'] '''

def word_split(phrase,list_of_words, output = None):
    
    if output is None:
        output = []
        
    for word in list_of_words:
        if phrase.startswith(word):
            output.append(word)
            return word_split(phrase[len(word):],list_of_words, output)

    return output
    
# word_split('themanran',['the','ran','man'])
# word_split('themanran',['clown','ran','man'])

## Memoization

- In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. 
- Memoization has also been used in other contexts (and for purposes other than speed gains), such as in simple mutually recursive descent parsing. 
- Although related to caching, memoization refers to a specific case of this optimization, distinguishing it from forms of caching such as buffering or page replacement. 
- In the context of some logic programming languages, memoization is also known as tabling.

### Implementation
- A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters.
- A function can only be memoized if it is referentially transparent; that is, only if calling the function has exactly the same effect as replacing that function call with its return value.
- Memoization is a way to lower a function's time cost in exchange for space cost; that is, memoized functions become optimized for speed in exchange for a higher use of computer memory space.
- For every integer n such that n≥0, the final result of a factorial function is invariant; if invoked as x = factorial(3), the result is such that x will always be assigned the value 6. 
- The non-memoized implementation above, given the nature of the recursive algorithm involved, would require n + 1 invocations of factorial to arrive at a result, and each of these invocations, in turn, has an associated cost in the time it takes the function to return the value computed:

    1. The cost to set up the functional call stack frame.
    2. The cost to compare n to 0.
    3. The cost to subtract 1 from n.
    4. The cost to set up the recursive call stack frame. (As above.)
    5. The cost to multiply the result of the recursive call to factorial by n.
    6. The cost to store the return result so that it may be used by the calling context.
    
    
- In a memoized function, if factorial is first invoked with 5, and then invoked later with any value less than or equal to five, those return values will also have been memoized, since factorial will have been called recursively with the values 5, 4, 3, 2, 1, and 0, and the return values for each of those will have been stored. 
- If it is then called with a number greater than 5, such as 7, only 2 recursive calls will be made (7 and 6), and the value for 5! will have been stored from the previous call. In this way, memoization allows a function to become more time-efficient the more often it is called, thus resulting in eventual overall speed up.


##### Memoized factorial Demo

In [None]:
# Create cache for known results
factorial_memo = {}

def factorial_mem(k):
    
    if k < 2: 
        return 1
    
    if not k in factorial_memo:
        factorial_memo[k] = k * factorial_mem(k-1)
        
    return factorial_memo[k]

# factorial_mem(4)

In [None]:
# We can also encapsulate the memoization process into a class
def fact(k):
    if k < 2:
        return 1
    return k * fact(k - 1)

class Memoize:
    def __init__(self, f):
        self.f = f
        self.memo = {}
        
    def __call__(self, *args):
        if not args in self.memo:
            self.memo[args] = self.f(*args)
        return self.memo[args]

# Instantiate the class passing in the factorial function
m_obj = Memoize(fact) 

# Call the instance of the class via the __call__ method to compute the factorial of a positive integer,
# using the factorial function passed in during instantiation, then print out the return value from the
# object call

# print(m_obj(5))

In [None]:
# Time the factorial algorithm variations
%timeit factorial(4)
%timeit factorial_mem(4)
%timeit m_obj(4)

In [None]:
# TDD for recursion interview problems

import unittest

class TestRecursinInterviewMethods(unittest.TestCase):       
    
# String reversal     
    def test_reverse(self):
        
        self.assertEqual(reverse('hello'), 'olleh')
        self.assertEqual(reverse('hello world'), 'dlrow olleh')
        self.assertEqual(reverse('12345'), '54321')
        
    def test_permute(self):
        self.assertEqual(sorted(permute('abc')), sorted(['abc', 'acb', 'bac', 'bca', 'cab', 'cba']))
        self.assertEqual(sorted(permute('dog')), sorted(['dog', 'dgo', 'odg', 'ogd', 'gdo', 'god']))
        
    def test_fib_ite(self):
        self.assertEqual(fib_ite(1), 1)
        self.assertEqual(fib_ite(9), 34)
        self.assertEqual(fib_ite(10), 55)
        
    def test_fib_rec(self):
        self.assertEqual(fib_rec(1), 1)
        self.assertEqual(fib_rec(9), 34)
        self.assertEqual(fib_rec(10), 55)
        
    def test_fib_dyn(self):
        self.assertEqual(fib_dyn(1), 1)
        self.assertEqual(fib_dyn(9), 34)
        self.assertEqual(fib_dyn(10), 55)  
        
        
        
# Run tests       
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

In [None]:
# RECURSION INTERVIEW PROBLEMS

# 1. String reversal
def reverse(s):
#     Base case
    if len(s) == 1:
        return s
    
#     Recursive case
    else:
        return s[-1] + reverse(s[:-1])
    
# 2. String permutation
# (NB) PYTHON HAS AN 'ITERTOOLS' LIBRARY THAT WOULD MAKE GETTING STRING PERMUTATIONS & SIMILAR PROBLEMS EASIER

def permute(s):
#     print('\npermute called!')
#     print('s is:', s)
    f_list = []
    
#     Base case: If the 
    if len(s) == 1:
        f_list = [s]

#     Recursive case: For every letter ch in s, get a list of the permutations of the other characters, then
#     ch to each of the the elements in the permutations list, then add the results to the final list f_list.
    else:
        for index, letter in enumerate(s):   # For every ch in s
#             print('\nStarting loop 1...........')
#             print('index is:', index, 'and curr_letter is:', letter)
            
            for permutation in permute(s[:index] + s[index + 1:]):   # For every permutation of the other chs in s
#                 print('\nStarting loop 2........')
#                 print('curr_perm is:', permutation)
                
                f_list += [letter + permutation]
#                 print('Final list:', f_list)
                
    return f_list

# 3. Fibonacci Sequence solution:
def fib_rec(n):  # O(2^n)
    ''' Solve the fibonacci sequence for the nth number recursively.
        For fibonacci sequence, f(n) = f(n-1) + f(n-2). Start from the desired nth value and works back until n==0
        or n ==1
    '''
    # Base case
    if n == 0 or n == 1:
        return n
    
    # Recursive case
    else:
        return fib_rec(n-1) + fib_rec(n-2)
    
def fib_ite(n):
    ''' Solve the fibonacci sequence for the nth number iteratively.
        Iterate from 0 to n, computing the next fibonacci number until the n iterative steps are complete, then
        return the nth number.
    '''
    
    a,b = 0,1 # Mulitple assignments aka tuple unpacking
    
    for i in range(n):
        a,b = b, a+b
    
    return a

# Set up the cache for storing the sequence numbers arlready computed
n = 10  # Static n
cache = [None]*(n+1)  

def fib_dyn(n):
    ''' Solve the fibonacci sequence for the nth using dynamic programming (memoization).
            Uses memoization to store the numbers already computed instead of recalculating them for every
            recursive call
    '''
    
    # Base case
    if n ==0 or n == 1:
        return n
    
    # Check cache
    if cache[n] != None:
        return cache[n]
    
    # Keep setting cache
    cache[n] = fib_dyn(n-1) + fib_dyn(n-2)
    
    return cache[n]

# 4. Coin Change problem solution:
def coin_rec(target_amt, coin_arr, cached_result):
    ''' Aka the Change-making problem derived from the Knapsack' problem. The problem can be solved recursively
        but this approach is inefficient and may even result in errors for certain inputs. Its inefficiency
        arising from the the numeruous recurvise calls that may need to be made to solve the problem but more so
        because some of the recursive calls are computing results that were already computing in prior recursive
        calls. SOLUTIONS notebook has the recursive sol.
        
        A better way to solve this problem is via dynamic programming, where previously computed results are
        stored for re-use down in the recursive tree, thus are simply re-called rather that re-computed when
        traversing back up the recursive tree, to get the final result. Results are stored in a dict.
    '''
    
    # Set default output to the target
    min_coins = target_amt
    
    # Set the base case : If a coin value equalling the target amount exists, then we need only 1 coin
    if target_amt in coin_arr:
        cached_result[target_amt] = 1
        return 1
    
    # Return a cached result if it is > 1
    elif cached_result[target_amt] > 0:
        return cached_result[target_amt]
    
    else:
        # For every coin value that is <= target_amnt
        for i in [c for c in coin_arr if c<= target_amt]:
            
            num_coins = 1 + coin_rec(target_amt-i, coin_arr, cached_result)
            
            # Reset minimum if we have a new minimu
            if num_coins < min_coins:
                min_coins = num_coins
                
                # Reset the cache
                cached_result[target_amt] = min_coins
                
        return min_coins
    
# Runner code:
target_amt = 26
coin_arr = [1,5,10,25]
cached_result = [0]*(target_amt+1) # Set the cache with zeros equal to the target amount + 1
coin_rec(target_amt, coin_arr, cached_result)

In [None]:
# Check which of the 3 fibonacci sequence algorithms is fastest
%timeit fib_rec (10)
%timeit fib_ite (10)
%timeit fib_dyn (10)

In [None]:
# See the recursive tree for a recursive solution to the coin change problem with a inputs (26, [1,5,10,25])
from IPython.display import Image
Image(url='http://interactivepython.org/runestone/static/pythonds/_images/callTree.png')

### Tail Recursion

- A recursive function is tail recursive when recursive call is the last thing executed by the function. For example the following C++ function print() is tail recursive.
- The tail recursive functions considered better than non tail recursive functions as tail-recursion can be optimized by compiler. 
- The idea used by compilers to optimize tail-recursive functions is simple, since the recursive call is the last statement, there is nothing left to do in the current function, so saving the current function’s stack frame is of no use.
- Consider the following function to calculate factorial of n. It is a non-tail-recursive function. Although it looks like a tail recursive at first look. 
- If we take a closer look, we can see that the value returned by fact(n-1) is used in fact(n), so the call to fact(n-1) is not the last thing done by fact(n)

    def fact(n):
        ''' A NON-tail-recursive function. The function is not tail recursive because the value returned by               fact(n-1) is used in fact(n) and call to fact(n-1) is not the last thing done by fact(n) '''

        if (n == 0):
           return 1

        return n * fact(n-1)
        
- The above function can be written as a tail recursive function. The idea is to use one more argument and accumulate the factorial value in second argument. When n reaches 0, return the accumulated value.
- Note fact is tail recursive as it calls factTR as its last task.

    def factTR(n, a): 
        ''' A tail recursive function to calculate factorial '''
        
        if (n == 0): 
            return a 
  
        return factTR(n - 1, n * a) 
  

    def fact(n):
        ''' A wrapper over factTR '''
        
        return factTR(n, 1)

## Trees

- Trees as a data structure has a root, branches and leaves, with the root being the top and the leaves at the bottom. It consists of a set of nodes and edges, with the edges connecting pairs of nodes.
- Life examples of tree implementation:

    - Animal classifications (Biology) with the more abstract classifications at the top (root) and less abstract classifications at the bottom.
    - Computer file system structure is in the form of a tree
    - A HTML webpage has its elements structured as a tree (DOM) with the root being at the html tag level
    - A book and its; chapters, sections and sub-sections.
    
- Tree terminology:
    - Node: Aka a leaf or a key. The data contained inside it is called the 'payload'. While not being critical
      to many tree algorithms, the payload is often critical to the applications that use the tree structure.
    - Edge: connects two nodes showing a relationship between them using an arrow. Each node, except the root,         has exactly one incoming edge from another node, but each node can have several outgoing nodes.
    - Path: is an ordered list of nodes, connected by edges e.g Mammal -> Carnivora -> Felidae -> Felis
    - Children are a set of nodes that share an incoming node from the same parent.
    - Parent: a node connecting other nodes (children) with outgoing edges.
    - Sub-tree: a set of nodes and edges comprised of a parent and all the descendants of that parent.
    - Leaf Node: a node that does not have any children.
    - Level: level of a node 'n' is the number of edges on the path from the root node to 'n' node.
    - Height of a tree: the max level of any node in the tree.
    
 - Properties of trees:
     
     - All nodes except the root has a parent and inherit charateristics from the parent.
     - Child nodes of a node are independent of each other.
     - Each node is unique and so is the path traversing from the root to each node.
     - You can move an entire sub-tree to another section without affecting the structure of the other                  corresponding sub-trees (same level subtrees).
     - If each node in the tree has a maximum of two children, then the tree is a binary tree.
     
- Recursive defination of a tree: a tree is either empty or consists of a root and zero or more subtrees, each of which is a complete tree on their own. The root of each sub-tree is connected to the root of the parent tree by an edge.
- Trees can be represented as a list of lists, with the first element of each list being the root node, the second element being the left subtree and the third element being the right sub-tree i.e:
    
       my_tree = ['a', # root node at index 0
                     ['b', # root node of the left subtree of node a
                         ['d', [], [] # next two elements are empty (left/right subtrees) as d has no children],
                         ['e', [], [] # next two elements are empty (left/right subtrees) as d has no children]
                     ],
                     ['c', # root node of right subtree of node a,
                         ['f', [], [] # next two elements are empty (left/right subtrees) as f has no children]
                     ]
                 ]
   
   
### Tree Implementation using a list of lists

In [None]:
def binary_tree(root):
    ''' Takes in a root node value and constructs & returns the initial tree with empty subtree '''
    return [root, [], []]

def insert_left(root, new_branch_root):
    ''' Inserts a left branch to a passed in root node of a tree /subtree '''
    
    # Check if the left node of the passed in root already has a subtree
    left_node = root.pop(1)
    
    if len(left_node) > 1:
        # If the current left node of root is not empty, insert a new subtree in its position and allocate the
        # current left node as the left node of the newly created subtree to the left of the passed in root node.
        root.insert(1, [new_branch_root, left_node, []])
    
    else:
        # If the current left node is empty, then insert a new subtree in its place with empty children
        root.insert(1, [new_branch_root, [], []])
        
    return root

def insert_right(root, new_branch_root):
    ''' Inserts a right branch to a passed in root node of a tree /subtree '''
    
    right_node = root.pop(2)
    
    if len(right_node) > 1:
        root.insert(2, [new_branch_root, [], right_node])
    
    else:
        root.insert(2, [new_branch_root, [], []])
        
    return root

def get_root(root):
    ''' Retuns the value of the root node passed in '''
    return root[0]

def set_root(root, new_val):
    ''' Re-assigns the value of the root node passed in '''
    root[0] = new_val
    
def get_left_child(root):
    ''' Returns the left subtree of the root node passed in '''
    return root[1]

def get_right_child(root):
    ''' Returns the right subtree of the root node passed in '''
    return root[2]

In [None]:
root = binary_tree(1)

In [None]:
insert_left(root, 'first-left')

In [None]:
insert_left(root, 'second-left')

In [None]:
insert_right(root, 'first-right')

In [None]:
insert_left(root, 'third-left')

In [None]:
insert_right(root, 'second-right')

In [None]:
print(get_left_child(root))

In [None]:
print(get_right_child(root))

In [None]:
set_root(root, 'origin')

In [None]:
get_root(root)

In [None]:
print(get_right_child(root))

In [None]:
print(root)

### Tree implementation using OOP

In [None]:
class BinaryTree:
    
    def __init__(self, root_node_obj):
        ''' Creates a new root tree node with no children '''
        
        self.root = root_node_obj
        self.left_child = None
        self.right_child = None
        
    def insert_left_tree(self, new_node):
        
        if self.left_child == None:
            self.left_child = BinaryTree(new_node)
            
        else: 
            new_tree = BinaryTree(new_node)  # Create new tree
            new_tree.left_child = self.left_child  # Assign the current left child as left child of the new tree
            self.left_child = new_tree # Assing the new tree to the left child of the current root node
            
            if self.right_child != None:
                new_tree.right_child = self.right_child
                self.right_child = None
            
    def insert_right_tree(self, new_node):
        
        if self.right_child == None:
            self.right_child = BinaryTree(new_node)
            
        else: 
            new_tree = BinaryTree(new_node)
            new_tree.right_child = self.right_child
            self.right_child = new_tree
            
            if self.left_child != None:
                new_tree.left_child = self.left_child
                self.left_child = None
            
    def get_left_child(self):
        return self.left_child
    
    def get_right_child(self):
        return self.right_child
        
    def get_root(self):
        return self.root

In [None]:
root = BinaryTree('a')
root.get_root()

In [None]:
root.get_left_child()

In [None]:
root.get_right_child()

In [None]:
root.insert_left_tree('b')
root.insert_right_tree('c')

In [None]:
root.get_left_child().get_root()

In [None]:
root.get_right_child().get_root()

### Tree Traversal
- Tree traversal can be done using 3 different patterns, the diiference between each being the order in which each node is visited:

    - In Pre-order traversal: visit the root node first, then recursively do a pre-order traversal of the left
      sub-tree, followed by a pre-order traversal of the right sub-tree.
    
    - In Inorder traversal: do a recursive inorder traversal of the left sub-tree, visit the root node then do
      a recursive inorder traversal of the right sub-tree.
      
    - In Postorder traversal: recursively do a postorder traversal of the left sub-tree, followed by a recursive
      postorder traversal of the right sub-tree, then finally visit the root node.

(NB) Each of the above algorithms can be used to traverse a tree with more than two child nodes.

- Code implementations of the pre-order traversal:

    - Using an external function outside a Tree class: (recommended).
    
          def preorder(tree):
              if tree:  # (BASE CASE) If the current tree object is not None continue else exit the function
                 print(tree.get_root_val())  # Visit the root node
                 preorder(tree.get_left_child()) # Recursive call on all left sub-trees
                 preorder(tree.get_right_child()) # Recursive call on all right sub-trees
                
    - Using a method of a Tree class, but in this case the method must pre-check the existence of either children
      before making the recursive calls.
      
          def preorder(self):
              print(self.root)
              if self.left_child:
                  self.left_child.preorder()
                  
              if self.right_child:
                  self.right_child.preorder()
                  
- Code implemetation of inorder traversal:

           def inorder(tree):
              if tree:
                 preorder(tree.get_left_child())
                 print(tree.get_root_val())  # Visit the root node between vists to the left & right trees
                 preorder(tree.get_right_child())
                 
- Code implemetation of postorder traversal:

            def postorder(tree):
              if tree:
                 preorder(tree.get_left_child())
                 preorder(tree.get_right_child())
                 print(tree.get_root_val())  # Visit the root node after vists to the left & right trees
                 

### Implementing Tree traversal using the BinaryTree class above

In [None]:
# Create a tree and initialize a root node and two empty child nodes
root = BinaryTree('Book')

In [None]:
''' Start building out the right side tree from bottom up '''
# Insert first level of the right child (Bottom)
root.insert_left_tree('Section 1.1')

In [None]:
root.insert_right_tree('Section 1.2')

In [None]:
# Insert the next level
root.insert_left_tree('Section 1')

In [None]:
root.insert_right_tree('Section 2')

In [None]:
# Insert final level (Top)
root.insert_left_tree('Chapter 2')

In [None]:
''' Start building out the left side tree from bottom up '''
# Insert first level of the left child (Bottom)
root.insert_left_tree('Section 2.2')

In [None]:
root.insert_left_tree('Section 2.1')

In [None]:
# Insert the next level
root.insert_left_tree('Section 2')

In [None]:
root.insert_left_tree('Section 1')

In [None]:
# Insert final level (Top)
root.insert_left_tree('Chapter 1')

In [None]:
# Traverse the complete tree
def preorder(tree):
    if tree:
        print(tree.get_root())
        preorder(tree.get_left_child())
        preorder(tree.get_right_child())
        
# Traverse tree
preorder(root)

## Priority Queues using Binary Heaps

- A priority queue is an implementation of a queue, with the distinction that enqueing an item will position it   anywhere in the priority queue based on its priority key rather than enqueing at the back.
- Highest priority items are positioned at the front of the queue based on the priority criteria.
- Priority queues are implemented using a Binary heap data structure, which allows for enqueing and dequeuing in order O(Log n).
- A binary heap is defined as a binary tree with the additional properties:

    - Shape property: a binary heap is a complete binary tree; that is, all levels of the tree, except possibly         the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of         that level are filled from left to right.
    - Heap property: the key stored in each node is either greater than or equal to (≥) or less than or equal to       (≤) the keys in the node's children, according to some total order (transitive relation between the keys).
    - Heaps where the parent key is greater than or equal to (≥) the child keys are called max-heaps; those where       it is less than or equal to (≤) are called min-heaps.
    
- Efficient (logarithmic time) algorithms are known for the two operations needed to implement a priority queue on a binary heap: inserting an element, and removing the smallest or largest element from a min-heap or max-heap, respectively.
- Binary heaps are also commonly employed in the heapsort sorting algorithm, which is an in-place algorithm because binary heaps can be implemented as an implicit data structure (using little info to describe its structure), storing keys in an array and using their relative positions within that array to represent child-parent relationships.
    
- Priority queues implementation using binary heaps:

    - A min heap: where the smallest key has the highest priority i.e positioned at the front.
    - A max heap: where the largest key has the highest priority i.e positioned at the front.
    
- To ensure efficiency in implementing a binary heap, we need to keep it balanced like in a tree i.e:

    - Have roughly the same number of nodes in the left and right subtrees of the root (a complete binary tree)
    - Each level, that is not the final, should have all it's node.
    
- A complete binary heap can be represented using an array implicitly, without nodes and pointers, using the relation:
    
    - If a parent node is at index i
    - Then the left child will be at index 2i, and
    - The right child will be at index 2i + 1
    
- See wikipedia link for Heap operations: https://en.wikipedia.org/wiki/Binary_heap

In [None]:
from IPython.display import Image
Image("MinHeapAndMaxHeap.png")

In [None]:
# Binary Heap implemented as an array (Max)
from IPython.display import Image
Image("BinHeap.png")

In [1]:
class MinBinHeap:
    
    def __init__(self):
        ''' Initializes a new min BH with first element blank and not part of the BH values, since related nodes
        are referenced via their indices which are integer calculated, thus avoiding the 0 index '''
        
        self.heap_list = [0]
        self.current_size = 0
    
    def insert(self, new_item):
        ''' Inserts a new item (up-heaping) at the lowest level of the BH, which corresponds to the last slot in
        the inplementation array '''
        
        self.heap_list.append(new_item)
        self.current_size = self.current_size + 1
        self.perc_up(self.current_size)  # percolate the new item up the BH if necessary to fix BH heap property
    
    def perc_up(self, current_size):
        ''' Percolates a newly inserted item up the BH, if necessary, to fix the BH heap property '''
        
        while current_size // 2 > 0:
            ''' You would only need to iterate half the length of the BH to get to the item, thus the O(log n)
            efficient complexity because when comparing a child with it's parent when percolating a new item up
            the BH, only one child needs to checked as the other child already satisfied the heap property prior
            to the new child item being added. Also a heap with n items has a log n height '''
            
            if self.heap_list[current_size] < self.heap_list[current_size // 2]:
                # if the parent is less than the child, swap their positions in the BH
                
                tmp = self.heap_list[current_size // 2]
                self.heap_list[current_size // 2] = self.heap_list[current_size]
                self.heap_list[current_size] = tmp
            
            current_size = current_size // 2 # Move up the parent position and start above process again 
    
    def del_min(self):
        ''' Deletes the root item (least value in a min BH), corresponding to the first item in the implementaion
        array. The oeprations are:
        
            1. Pop the root item
            2. Place the last item in the BH at the root position 
            3. Porculate this new item down the BH to its rightful position based on the heap property '''
        
        min_val = self.heap_list[1]  # Save the min/root item to pop
        self.heap_list[1] = self.heap_list[self.current_size]  # Replace the root item with copy of the last item
        self.current_size = self.current_size - 1  # Reduce the size of the BH by 1
        self.heap_list.pop()  # Remove the last item from the BH
        self.perc_down(1)  # Percolate the new root down the BH to its rightful position
        
        return min_val   
    
    def perc_down(self, min_val):
        ''' Percolates a new root item upon deletion of the old root down the BH, if necessary, to fix the BH
        heap property '''
        
        while (min_val * 2) <= self.current_size:
            ''' Don't iterate past the length of the array '''
            
            min_child_index = self.min_child(min_val)  # Replace the current parent to be percolated down with
                                                       # the least child, this least chilld will be the new
                                                       # parent for the sub-tree
                
            if self.heap_list[min_val] > self.heap_list[min_child_index]: # if the curr parent < the least child
                curr_parent = self.heap_list[min_val]  # Save the curr parent
                self.heap_list[min_val] = self.heap_list[min_child_index]  # replace curr parent with min child
                self.heap_list[min_child_index] = curr_parent
            
            min_val = min_child_index  # Move to the swapped parent position and start percolate process again  
            
    def min_child(self, parent_index):
        ''' Returns the index of a parent's min child '''
        
        if (parent_index * 2 + 1) > self.current_size:
            
            return parent_index * 2  # if the right child does not exist, return the left child
        
        else: # If a right child exists, then a left chils also exists
            
            if self.heap_list[parent_index*2] < self.heap_list[parent_index*2+1]:
                return parent_index * 2
            else:
                return parent_index * 2 + 1           
            
    def build_heap(self, keys_list):
        ''' Build a binary heap from a list of keys/ values '''
        
        max_parent_index = len(keys_list) // 2
        
        self.current_size = len(keys_list)
        self.heap_list = [0] + keys_list[:]  # populate a heap list array with a copy of keys_list values
        
        while max_parent_index > 0:
            self.perc_down(max_parent_index)   # Re-organize the heap list array to meet heap property
            max_parent_index = max_parent_index - 1
            
    def __str__(self):
        return ''.join(str(self.heap_list))

In [2]:
mbh = MinBinHeap()
mbh.build_heap([10,5])

In [3]:
print(mbh)

[0, 5, 10]


In [4]:
mbh.del_min()

5

In [5]:
print(mbh)

[0, 10]


In [6]:
mbh.insert(5)

In [7]:
print(mbh)

[0, 5, 10]


In [8]:
mbh.insert(2)

In [9]:
print(mbh)

[0, 2, 10, 5]


## Binary Search Trees

- BST aka orderd/ sorted binary trees, are another map ADT (abstract data type)
- BSTs follow the 'BST Property' where keys that are < the parent are found in the left sub trees and vice versa ie.
    
    - The left subtree of a node contains only nodes with keys less that the node's key
    - The right subtree of a node contains only nodes with keys greater that the node's key
    - Both the left and right subtrees of the node must also satisfy the above two properties.

- A phonebook can be implemented as a BST, where a contact's number can be searched using their name.

In [None]:
# Building out a BST from given keys

from IPython.display import Image
Image(url='https://images.slideplayer.com/27/9257239/slides/slide_47.jpg')

In [None]:
# BST with more levels

from IPython.display import Image
Image(url='https://cs.lmu.edu/~ray/images/bstexample.png')

In [10]:
# Implementation of a BST

class TreeNode:
    ''' Represents a node of the BST '''
    
    def __init__(self,key,val,left=None,right=None,parent=None):
        ''' Initializes a new BST node with the following properties '''
        
        self.key = key
        self.payload = val
        self.leftChild = left
        self.rightChild = right
        self.parent = parent

    def hasLeftChild(self):
        return self.leftChild

    def hasRightChild(self):
        return self.rightChild

    def isLeftChild(self):
        ''' Checks if the current node has a parent and if it has one, then that parent's left child is the
        the current node(self) '''
        
        return self.parent and self.parent.leftChild == self

    def isRightChild(self):
        ''' Checks if the current node has a parent and if it has one, then that parent's right child is the
        the current node(self) '''
        
        return self.parent and self.parent.rightChild == self

    def isRoot(self):
        return not self.parent

    def isLeaf(self):
        return not (self.rightChild or self.leftChild)

    def hasAnyChildren(self):
        return self.rightChild or self.leftChild

    def hasBothChildren(self):
        return self.rightChild and self.leftChild
    
    def findMin(self):
        ''' Helper method for the 'findsuccessor' method '''
        
        current = self
        while current.hasLeftChild():
            current = current.leftChild
            
        return current

    def replaceNodeData(self,key,value,lc,rc):
        self.key = key
        self.payload = value
        self.leftChild = lc
        self.rightChild = rc
        if self.hasLeftChild():
            self.leftChild.parent = self
        if self.hasRightChild():
            self.rightChild.parent = self
            

class BinarySearchTree:
    ''' Manages the the BST '''

    def __init__(self):
        ''' Initializes a new empty BST '''
        
        self.root = None
        self.size = 0

    def __len__(self):
        ''' Special python method that allows the use of the  'len' keyword to get the size of the bst '''
        
        return self.size

    def _put(self,key,val,currentNode):
        ''' Insert the new BST node in the correct position '''
        
        if key < currentNode.key:
            if currentNode.hasLeftChild():
                   self._put(key,val,currentNode.leftChild)
            else:
                   currentNode.leftChild = TreeNode(key,val,parent=currentNode)
        else:
            if currentNode.hasRightChild():
                   self._put(key,val,currentNode.rightChild)
            else:
                   currentNode.rightChild = TreeNode(key,val,parent=currentNode)
    
    def put(self,key,val):
        ''' Checks if the BST is empty when attempting to insert a new BST node '''
        
        if self.root:
            self._put(key,val,self.root)
        else:
            self.root = TreeNode(key,val)
        self.size = self.size + 1

    def __setitem__(self,k,v):
        ''' Class setter magic method to insert the new BST node. Using this setter method, class properties are
        hidden from the outside world and thus we can insert nodes in this format:
        
            class_object = BinarySearchTree()
            class_object[2] = 'r'
            
        instead of:
            
            class_object.put(2, 'r')
            
        '''
        
        self.put(k,v)

    def _get(self,key,currentNode):
        ''' Fetches the required BST node '''
        
        if not currentNode:
            return None
        elif currentNode.key == key:
            return currentNode
        elif key < currentNode.key:
            return self._get(key,currentNode.leftChild)
        else:
            return self._get(key,currentNode.rightChild)
        
    def get(self,key):
        ''' Checks if the BST is empty when attempting to fetch a BST node'''
        
        if self.root:
            res = self._get(key,self.root)
            if res:
                
                return res.payload
            else:
                return None
        else:
            return None

    def __getitem__(self,key):
        ''' Class getter magic method to fetch BST node. Using this getter method, class properties are
        hidden from the outside world and thus we can fetch nodes in this format:
        
            class_object = BinarySearchTree()
            fetched_node = class_object[2]
            
        instead of:
            
            class_object.get(2)
            
        '''
        return self.get(key)
    
    def __contains__(self,key):
        ''' Special python method that allows checking if a certain key exists in the bst without retriving
        the actual node. This ins implemented by using the 'in' keyword:
        
             key in class_object
             
        '''
        
        if self._get(key,self.root):
            return True
        else:
            return False
    
    def spliceOut(self, succ):
        if succ.isLeaf():
            if succ.isLeftChild():
                
                succ.parent.leftChild = None
            else:
                succ.parent.rightChild = None
                
        elif succ.hasAnyChildren():
            if succ.hasLeftChild():
                
                if succ.isLeftChild():
                    
                    succ.parent.leftChild = succ.leftChild
                else:
                    
                    succ.parent.rightChild = succ.leftChild
                    succ.leftChild.parent = succ.parent
        else:
                    
            if succ.isLeftChild():
                        
                succ.parent.leftChild = succ.rightChild
            else:
                succ.parent.rightChild = succ.rightChild
                succ.rightChild.parent = succ.parent
        
                
    def findSuccessor(self, currentNode):
        ''' Finds a sucessor node to replace the parent node to be deleted '''
        
        succ = None
        
        if currentNode.hasRightChild():
            succ = currentNode.rightChild.findMin()
        else:
            if currentNode.parent:
                if currentNode.isLeftChild():
                    succ = currentNode.parent
                    
                else:
                    currentNode.parent.rightChild = None
                    succ = currentNode.parent.findSuccessor()
                    currentNode.parent.rightChild = currentNode
        return succ
                
    def remove(self,currentNode):
        ''' Deletes a node '''
        
        if currentNode.isLeaf(): #leaf
            if currentNode == currentNode.parent.leftChild:
                currentNode.parent.leftChild = None
            else:
                currentNode.parent.rightChild = None
                
        elif currentNode.hasBothChildren(): #interior
            succ = self.findSuccessor(currentNode)
            print(succ.payload)
            self.spliceOut(succ)
            currentNode.key = succ.key
            currentNode.payload = succ.payload

        else: # this node has one child
            if currentNode.hasLeftChild():
                if currentNode.isLeftChild():
                    currentNode.leftChild.parent = currentNode.parent
                    currentNode.parent.leftChild = currentNode.leftChild
                    
                elif currentNode.isRightChild():
                    currentNode.leftChild.parent = currentNode.parent
                    currentNode.parent.rightChild = currentNode.leftChild
                    
                else:
                
                    currentNode.replaceNodeData(currentNode.leftChild.key,
                                    currentNode.leftChild.payload,
                                    currentNode.leftChild.leftChild,
                                    currentNode.leftChild.rightChild)
            else:
                
                if currentNode.isLeftChild():
                    currentNode.rightChild.parent = currentNode.parent
                    currentNode.parent.leftChild = currentNode.rightChild
                elif currentNode.isRightChild():
                    currentNode.rightChild.parent = currentNode.parent
                    currentNode.parent.rightChild = currentNode.rightChild
                else:
                    currentNode.replaceNodeData(currentNode.rightChild.key,
                                    currentNode.rightChild.payload,
                                    currentNode.rightChild.leftChild,
                                    currentNode.rightChild.rightChild)
    
    def delete(self, key):
        ''' Determines the node to be deleted exists '''
        
        if self.size > 1:
            node_to_remove = self._get(key, self.root)
            
            if node_to_remove:
                self.remove(node_to_remove)
                self.size = self.size - 1
            else:
                raise KeyError('Error, key not in the tree')
                
        elif self.size == 1 and self.root.key == key:
            self.root = None
            self.size = self.size - 1
            
        else:
            raise KeyError('Error, key not in the tree')
            
    def __delitem__(self, key):
        ''' Special method that allow the use of:
        
            del class_object[2]
            
            instead of:
            
            class_object.delete(2)
            
            to delete a node instead of  '''
        
        self.delete(key)

In [11]:
# Create a BST
test_bst = BinarySearchTree()

In [12]:
# Ensure the BST is empty by testing the size of the BST
len(test_bst)

0

In [13]:
# Add nodes
test_bst[3] = 'red'
test_bst[4] = 'blue'
test_bst[6] = 'yellow'
test_bst[2] = 'green'
test_bst[1] = 'orange'
test_bst[2.5] = 'black'
test_bst[3.5] = 'pink'

In [14]:
# Test the BST is not empty
len(test_bst)

7

In [15]:
# Test which node is the root
root_node = test_bst.root
root_node.payload

'red'

In [16]:
# View the root's children
left_child = root_node.leftChild
right_child = root_node.rightChild

print(left_child.payload)
print(right_child.payload)

green
blue


In [17]:
# Test if the root's chilren are leaves, if not, show their children

if left_child.hasAnyChildren():
    if left_child.hasBothChildren():
        print("Left child's, left child is {}".format(left_child.leftChild.payload))
        print("Left child's, right child is {}".format(left_child.rightChild.payload))
    elif left_child.hasLeftChild():
        print("Left child's, left child is {}".format(left_child.leftChild.payload))
    else:
        print("Left child's, right child is {}".format(left_child.rightChild.payload))
else:
    print('Left child is a leaf')
        
if right_child.hasAnyChildren():
    if right_child.hasBothChildren():
        print("Right child's, left child is {}".format(right_child.leftChild.payload))
        print("Right child's, right child is {}".format(right_child.rightChild.payload))
    elif right_child.hasLeftChild():
        print("Right child's, left child is {}".format(right_child.leftChild.payload))
    else:
        print("Right child's, right child is {}".format(right_child.rightChild.payload))
else:
    print('Right child is a leaf')

Left child's, left child is orange
Left child's, right child is black
Right child's, left child is pink
Right child's, right child is yellow


In [18]:
# Test a key exists before fetching the node' value
if 4 in test_bst:
    print(test_bst[4])
else:
    print('Node not in BST')

if 10 in test_bst:
    print(test_bst[1])
else:
    print('Node not in BST')

blue
Node not in BST


In [19]:
# Delete a node
del test_bst[4]

yellow


In [20]:
# Check that the size of the tree has reduced
len(test_bst)

6

In [21]:
# View updated tree:

# Check root
print('Root is {}'.format(test_bst.root.payload))

# Check root's children
print("Root's left child is {}".format(root_node.leftChild.payload))
print("Root's right child is {}".format(root_node.rightChild.payload))

# Check root's grandchildren
if left_child.hasAnyChildren():
    if left_child.hasBothChildren():
        print("Left child's, left child is {}".format(left_child.leftChild.payload))
        print("Left child's, right child is {}".format(left_child.rightChild.payload))
    elif left_child.hasLeftChild():
        print("Left child's, left child is {}".format(left_child.leftChild.payload))
    else:
        print("Left child's, right child is {}".format(left_child.rightChild.payload))
else:
    print('Left child is a leaf')
        
if right_child.hasAnyChildren():
    if right_child.hasBothChildren():
        print("Right child's, left child is {}".format(right_child.leftChild.payload))
        print("Right child's, right child is {}".format(right_child.rightChild.payload))
    if right_child.hasLeftChild():
        print("Right child's, left child is {}".format(right_child.leftChild.payload))
    else:
        print("Right child's, right child is {}".format(right_child.rightChild.payload))
else:
    print('Right child is a leaf')

Root is red
Root's left child is green
Root's right child is yellow
Left child's, left child is orange
Left child's, right child is black
Right child's, left child is pink


In [22]:
float("-inf")

-inf

In [None]:
# TREE INTERVIEW PROBLEMS

class Node:
    ''' Tree class for the interview problems '''
    
    def __init__(self, key, value=None, parent=None):
        self.key = key
        self.value = value
        self.leftChild = None
        self.rightChild = None
        self.parent = parent
        
    def insert(self, key, currentNode):
        ''' Insert a new BST node in the correct position '''
        
        if key < currentNode.key:
            if currentNode.leftChild:
                   self.insert(key, currentNode.leftChild)
            else:
                   currentNode.leftChild = Node(key, parent=currentNode)
        else:
            if currentNode.rightChild:
                   self.insert(key, currentNode.rightChild)
            else:
                   currentNode.rightChild = Node(key, parent=currentNode)

# ----------------------------------------------------------------------    
# 1. VALIDATE BST

def tree_max(node):
    ''' Gets the largest key value allowed for a node's left subtree '''
    
    if not node:
        return float("-inf")
    maxleft = tree_max(node.leftChild)
    maxright = tree_max(node.rightChild)
    return max(node.key, maxleft, maxright)

def tree_min(node):
    ''' Gets the smallest key value allowed for a node's right subtree '''
    
    if not node:
        return float("inf")
    minleft  = tree_min(node.leftChild)
    minright = tree_min(node.rightChild)
    return min(node.key, minleft, minright)

def validate_bst(tree): 
    ''' Method 1: Validates a BST using pre-order tree traversal '''
    
    if tree:     
         # Visit the root   
        print(tree.key)
        
        # Check left subtree
        if tree.leftChild:
            if tree_max(tree.leftChild) <= tree.key:
                validate_bst(tree.leftChild)
            else:
                return False
        
        # Check right subtree
        if tree.rightChild:
            if tree_min(tree.rightChild) >= tree.key:
                validate_bst(tree.rightChild)
            else:
                return False
        
        return True
            
    else:
        print('Tree is empty')


traversed_list = []

def validate_bst_trick(tree):
    ''' Method 2: Using a key list sort trick.
    
        Rationale > If a BST is traversed inorder and the node keys stored in a list, they will be automatically 
        sorted in ascending order.
    '''
    
    if tree:
        validate_bst_trick(tree.leftChild)
        traversed_list.append(tree.key)
        validate_bst_trick(tree.rightChild)
    
    return traversed_list == sorted(traversed_list)

# -----------------------------------------------------------------------
# 2. PRINT BST LEVELS
import collections

def levelOrderPrint(tree):
    if not tree:
        return
    nodes=collections.deque([tree])
    currentCount, nextCount = 1, 0
    while len(nodes)!=0:
        currentNode=nodes.popleft()
        currentCount-=1
        print(currentNode.key, ' ', end ='')
        if currentNode.leftChild:
            nodes.append(currentNode.leftChild)
            nextCount+=1
        if currentNode.rightChild:
            nodes.append(currentNode.rightChild)
            nextCount+=1
        if currentCount==0:
            #finished printing current level
            print('\n')
            currentCount, nextCount = nextCount, currentCount
            
# -----------------------------------------------------------------------
# 3. TRIM BST
def required_nodes(node, min_key, max_key):
    ''' Saves the nodes with keys within the required parameters '''
    
    if node and min_key <= node.key <= max_key:
        traversed_list.append(node.key)
        required_nodes(node.leftChild, min_key, max_key)
        required_nodes(node.rightChild, min_key, max_key)       
        
def trim_bst(node, min_key, max_key):
    ''' Generates an new tree with node keys between the min and max parameters provided '''
    
    required_nodes(node, min_key, max_key)
    new_tree_root = Node(traversed_list[0])
    traversed_list.pop(0)
    
    for node_key in traversed_list:
        new_tree_root.insert(node_key, new_tree_root)
        
    levelOrderPrint(new_tree_root)
        
# TRIM BST OPTIMAL SOLUTION: Both time & space complexities of O(n)
def trim_bst_opt(node, min_val, max_val):
    ''' Uses post-order tree traversal, to build the tree meeting the constraints bottom-up '''
    
    if not node:
        return
    
    # Visit the left subtree, then the right subtree
    node.leftChild = trim_bst_opt(node.leftChild, min_val, max_val)
    node.rightChild = trim_bst_opt(node.rightChild, min_val, max_val)
    
    # Visit the subtree root
    # If the current node meets the key constraints, leave it intact & return it
    if min_val <= node.key <= max_val:
        return node
    
    # If the node's key is less than required, drop the node + all its left side descendants as they will also
    # not meet the constraint (Achieved by only returning the right subtree of the node)
    if node.key < min_val:
        return node.rightChild
    
    # If the node's key is greater than required, drop the node + all its right side descendants as they will also
    # not meet the constraint (Achieved by only returning the left subtree of the node)
    if node.key > max_val:
        return node.leftChild
    

# ---------------------- TESTING CODE ------------------------------------

root = Node(10, "Ten")
root.leftChild = Node(5, "Five")
root.rightChild = Node(20, "Twenty")
root.leftChild.leftChild = Node(3, "Three")
root.leftChild.rightChild = Node(7, "Seven")
root.rightChild.leftChild = Node(15, 'Fifteeen')
root.rightChild.rightChild = Node(25, 'Twentyfive')

print('\n--------- BST validation: Test the function for True ------------')
print('Classical BST test yields: {}\n'.format(validate_bst(root)))
traversed_list = []
print('Trick BST test yields: {}'.format(validate_bst_trick(root)))
print(traversed_list)

print('\n--------- Print BST Levels ------------')
levelOrderPrint(root)

print('\n--------- Trim BST ------------')
traversed_list = []
trim_bst(root, 4, 21)

print('\n--------- Trim BST (Optimal Solution ------------')
trim_bst_opt(root, 4, 21)
levelOrderPrint(root)

# Search Algorithms
- Focuses on how to efficiently look for items within data structures.

## Sequential search (Brute force search)
- Involves checking each element in a data structure e.g array and especially used when the elements are not ordered.
- Unorderd list search analysis:

| Case | Best Case | Worst Case | Average Case |
| :---: | :---: | :---: | :---: |
| Item present | 1 (Item is the 1st element) | n (Item is the last element) | n/2 (Item is in the middle) |
| Item NOT present | n | n | n |

- Ordered list search analysis:

| Case | Best Case | Worst Case | Average Case |
| :---: | :---: | :---: | :---: |
| Item present | 1 | n | n/2 |
| Item NOT present | 1 (1st is greater than the search item | n (Item is the last element) | n/2 (Item is in the middle) |


In [None]:
# Implementing sequential Search
# For an unordered list, we have to look at every item in the list
def seq_search(array, search_item):
    index = 0
    found = False
    
    while index < len(array) and not found:
        if array[index] == search_item:
            found = True
            
        else:
            index +=1
        
    return found


# For an ordered list, we only need to ensure the list is sorted and check upto the element that is greater than
# the serach item and we can conlude the search item is not in the list
def ord_seq_search(array, search_item):
    array.sort()
    
    index = 0
    found = False
    stop = False
    
    while index < len(array) and not found and not stop:
        if array[index] > search_item:
            stop = True
            
        else:
            if array[index] == search_item:
                found = True
            
            else:
                index +=1
    
    return found

In [None]:
seq_search([2,4,6,3,1,8,10,2.5], 4)

In [None]:
ord_seq_search([2,4,6,3,1,8,10,2.5], 2.1)

### Binary Search
- Binary search only works for an ordered list.
- Unlike sequential search which start looking at the first item in the list, binary search start by looking at
    middle item. If middle item > than search item, then discard top half of the list because the search item can
    only be in the lower half of the list and vice versa.
- The above process is repeated for the bottom half of the list until the serach item is found or not
- This process of spliting up the list in halfs eliminates about half of the items to look.
- Analysing the number of item left to compare after each split is as follows:

| Comparisons | Approximate # of comparisons |
| :---: | :---: |
| 1 | $$n/2$$ |
| 2 | $$n/4$$ |
| 3 | $$n/8$$ |
| i | $$n/2^i$$ |


### Implementing Binary Search

In [None]:
# Iterative Solution
def binary_search_it(arr, item):
    arr.sort() # Ensure the array is sorted, else binary search will not work
    print(arr)
    
    first = 0
    last = len(arr)-1
    found = False
    
    while first <= last and not found:
        mid = (first + last)//2
        
        if arr[mid] == item:
            found = True
            
        else:
            if item < arr[mid]:
                last = mid-1
            else:
                first = mid+1
        
    return found


# Recursive Solution
def binary_search_re(arr, item):
    arr.sort()
    print(arr)
    
    # Base case is if we have run out of elements to look at
    if len(arr) == 0:
        return False
    else:
        mid = len(arr)//2
        
        if arr[mid] == item:
            return True
        else:
            if arr[mid] < item:
                return binary_search_re(arr[mid+1:], item)
            else:
                return binary_search_re(arr[:mid], item)

In [None]:
binary_search_it([2,4,6,3,1,8,10,2.5], 4)

In [None]:
binary_search_re([2,4,6,3,1,8,10,2.5], 7)

### Hashing
- Is the process of mapping values to slots in a data structure e.g an array. The slots in this case are represented by indices where the values will/are stored in the array.
- Hashing results in a highly efficient search time complexity of O(1), because the location/index of any value in the array computed and then used to directly access it without having to iterate over the array.
- Hashing results in a hash table which is a collection of items that are stored in SLOTS, named by an integer value starting from 0, with all slots initialled with a None value and the size of the hash table m = len(hash_table).
- The mapping between the items and their slots in a hash table is called a hash function.
- It takes any item in a range of item and the size of the hash table and returns an integer value within the range of the hash table slot names i.e (0, m-1).
- There are various ways a hash function achieves this:

**Hashing - Remainder Method**
- Determines the value slot names as (value % m).
- The load factor denoted by lambda is the number of values out of the availble slots in the hash table that are ready to be filled.

$$\lambda = {{numberOfItems}\over tableSize}$$

#### Hash Functions
- A **Collision** is occurs when the hash function attempts to assing two different items to the same slot e.g
 44%11 = 0 and 77%11 = 0
- A hash function that maps each item into a unique slot is referred to as a **perfect hash function**
- The goal is to create a hash function that **1. minimizes the number of collision*, *2. is easy to compute*, and *3. evenly distributes the items in the hash table*

**Hashing - Folding Method**
- Divides the item to be stored in the hash table into groups of 2(the last group may not be equal to the rest,  adds the groups up, then determines the position of the overall item in the hash table as:

$$item-index = {{sum of groups}\over tableSize}$$

e.g item is the phone number 436-555-4601, sum of groups of 2: 43+65+55+46+01 = 210, if the hash table size is 11, items location = 210%11 = 1

**Hashing - Mid Square Method**
- Square the item, extract the middle portion of the square, then using that to determine the items index using the reaminder method
e.g- if item = 44, then $$44^2$$ = 1936, then get the item index: 93%11 = 5

**Hashing Non integer items**
- Thinking of character as a sequence of ordinal values (https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/), we can use python's ord() method to return each character's unicode eqiuvalent and then use the folding method to compute the strings hash table index e.g.

c - ord('c)  =  99, 
a - ord('a') =  97, 
t - ord('t') = 116, 
         sum = 312, 
hash table index ('cat') = 312 % table size

#### Collision Resolution / Rehashing Methods
**1. Open Addressing**
 - This method tries to find the next availble slot in the hash table, sequentialy, to place the item that caused the collision, a process called **linear probing**
- This methods also clusters items at the begining fo the hash table.
- To decluster the items and distribute them evenly in the hash table and fulfil goal 3 above is to skip slots rather than probing in intervals of 1.
- One such method for rehashing while declustering is called **quadratic probing** where a rehashing function increments the original hash value "h" by 1,3,5,7,9...... ie h+1, h+3.....
- Another method of addressing collisions is called **chaining** where each slot in the hash table is allowed to hold a reference to a collection/chain of items, allowing many items to exist in the smae slot.

(NB) When more and more items are chained it makes searching for the items more difficult.


### Implementing a Hash Table

In [None]:
# Rudimentary method
class Hashtable(object):
    def __init__(self, size):
        
        self.size = size
        self.slots = [None] * self.size
        self.data = [None] * self.size
        
    def put(self, key, data):
        ''' Add a new key-value pair to the map. 
            If the key is already in the map then replace the old value with the new value.
            (NB) Keys will be ints only for simplicity
        '''
        
        hash_value = self.hash_function(key, len(self.slots))

        if self.slots[hash_value] == None:
            self.slots[hash_value] = key
            self.data[hash_value] = data

        else:
            if self.slots[hash_value] == key:
                self.data[hash_value] = data
            else:
                next_slot = self.rehash(key, len(self.slots))

                while self.slots != None and self.slots[next_slot] != key:
                    next_slot = self.rehash(next_slot, len(self.slots))

                if self.slots[next_slot] == None:
                    self.slots[next_slot] = key
                    self.data[next_slot] = data

                else:
                    self.data[next_slot] = data

    def hash_function(self, key, size):
        ''' The actual hash function. **Practice other hashing methods and no int keys** '''
        
        return key % size
    
    def rehash(self, oldhash, size):
        ''' Addresses collisions in the hash table '''
        
        return (oldhash+1) % size
    
    def get(self, key):
        ''' Retrieves the item at the provided key '''
        
        start_slot = self.hash_function(key, len(self.slots))
        data = None
        stop = False
        found = False
        position = start_slot

        while self.slots[start_slot] != None and not found and not stop:
            if self.slots[position] == key:
                found = True
                data = self.data[position]

            else:
                position = self.rehash(position, len(self.slots))

                if position == start_slot:
                    stop = True

        return data

    def __getitem__(self, key):
        ''' Getter magic method for the hash table class, allowing the use of the dict[key] notation '''
        
        if key < len(self.slots):
            return self.get(key)
        else:
            return 'Key not in the hashtable'
    
    def __setitem__(self, key, data):
        
        ''' Setter magic method for the hash table class, allowing the use of the dict[key] notation '''
        
        if key < len(self.slots):
            self.put(key,data)
        
        else:
            print('Hash table limited to {} slots'.format(len(self.slots)))

In [None]:
h = Hashtable(3)

In [None]:
h[0] = 'zero'

In [None]:
h[1] = 'one'

In [None]:
h[2] = 'two'

In [None]:
h[3] = 'three'

In [None]:
print(h[0])

In [None]:
print(h[1])

In [None]:
print(h[2])

In [None]:
print(h[3])

## Sorting Algorithms
- When deciding which sorting algorithm to use, the following criteria can be used for comparison:

    1. Time Complexity (Big-O notation). You should note that best-case, worst-case and average run-time can have        different time complexity. For example best-case for Bubble Sort is only O(n), making it faster than              Selection Sort when the original list is mostly in order (not many elements out of place).
    2. Memory Complexity. How much more memory is required to sort a list as n grows?
    3. Stability. Does the sort preserve the relative ordering of elements that have equivalent sort values? (For        example if you were sorting a list of catalog items by their price, some elements may have equal prices. If        the catalog was originally sorted alphabetically by item name, will the chosen sort algortihm preserve the        alphabetical ordering within each group of equal-priced items.)
    4. Best/Worst/Averavge number of comparisons required. Important when compare operations are expensive. (For          example: comparing efficiencies of alternative designs where efficiency is calculated via some simulation          or otherwise complex calculation).
    5. Best/Worst/Average number of swap operations required. Important when swap operations are expensive. (For          example: sorting shipping containers that must be physically moved on the deck of a ship)
    6. Code size. Bubble-sort is known for its small code footprint.
    

### Bubble Sort
- Accomplished in O(n^2), this algorthm bubbles up the larger of a pair of items being compared over multiple passes until all the items are in their right place.

In [None]:
# My method:
def bubble_sort_mine(arr):
    swap = True
    
    while swap:    
        swap = False
        
        # Visualizer code
        print(arr)
        
        for i in range(len(arr) - 1):
            
            # Visualizer code
            print(arr[i], arr[i+1])
            
            if arr[i] > arr[i + 1]:
                temp = arr[i] 
                arr[i] = arr[i + 1]
                arr[i + 1] = temp 
                swap = True
                
    return arr

# Instructor's method:
def bubble_sort_jose(arr):
    for n in range(len(arr)-1,0,-1):
        
        # Visualizer code
        print(arr)
        
        for k in range(n):
            
            # Visualizer code
            print(arr[k], arr[k+1])
            
            if arr[k]>arr[k+1]:
                temp = arr[k]
                arr[k] = arr[k+1]
                arr[k+1] = temp
                
    return arr

In [None]:
arr = [4,3,1,2]
# bubble_sort_mine(arr)
bubble_sort_jose(arr)

#### Comparison between the two methods above
- Given the array [4,3,1,2] to sort;

- My method: (i: 0 -> 2)

|While loop pass|arr_before|i|Swap?|arr_after|
|:---:|:---:|:---:|:---:|:---:|
|1|[4,3,1,2]|0|True|[3,4,1,2]|
|1|[3,4,1,2]|1|True|[3,1,4,2]|
|1|[3,1,4,2]|2|True|[3,1,2,4]|
|2|[3,1,2,4]|0|True|[1,3,2,4]|
|2|[1,3,2,4]|1|True|[1,2,3,4]|
|2|[1,2,3,4]|2|False|[1,2,3,4]|
|Exit outter loop as swap is false|

- Instructor's method: (n: 3 -> 1) & (k: 0 -> n-1)

|n|arr_before|k|arr_after|
|:---:|:---:|:---:|:---:|
|3|[4,3,1,2]|0|[3,4,1,2]|
|3|[3,4,1,2]|1|[3,1,4,2]|
|3|[3,1,4,2]|2|[3,1,2,4]|
|2|[3,1,2,4]|0|[1,3,2,4]|
|2|[1,3,2,4]|1|[1,2,3,4]|
|1|[1,2,3,4]|0|[1,2,3,4]|
|Exit inner loop, which exits outter loop|

### Selection Sort
- Accomplished in O(n^2), this algorithm makes several passes over a list of items and in each pass, looks for the either the minimum or maximum value and then places it in its righful position.
- By doing this it makes fewer item swaps (one per pass over the list), thus improving on the bubble sort which may make more swaps per pass depending on the list of items to sort.

In [None]:
# Method 1: Tracks the minumum
def selection_sort_min(arr):
    swap = False
    curr_min_index = 0
    last_sorted_item_index = 0

    while last_sorted_item_index <= (len(arr) - 1):   
        
        # Visualizer code
        print(arr)
        
        for i in range((last_sorted_item_index), len(arr)):
            
            # Visualizer code
            print(arr[last_sorted_item_index], arr[curr_min_index])
            
            if arr[i] < arr[curr_min_index]:
                curr_min_index = i
                swap = True
        
        if swap:
            temp = arr[curr_min_index]
            arr[curr_min_index] = arr[last_sorted_item_index]
            arr[last_sorted_item_index] = temp 
            swap = False

        last_sorted_item_index += 1
        curr_min_index = last_sorted_item_index
            
    return arr

# Method 2: Tracks the maximum
def selection_sort_max(arr):
    for max_slot in range(len(arr) - 1, 0, -1):
        
        # Visualizer code
        print(arr)
        
        curr_max_slot = 0
        for curr_slot in range(0, max_slot + 1):
            
            # Visualizer code
            print(arr[max_slot], arr[curr_max_slot], arr[curr_slot])
            
            if arr[curr_slot] > arr[curr_max_slot]:
                curr_max_slot = curr_slot
                
        temp = arr[max_slot]
        arr[max_slot] = arr[curr_max_slot]
        arr[curr_max_slot] = temp
        
    return arr

In [None]:
arr = [4,3,1,2]
selection_sort_max(arr)
# selection_sort_mine(arr)
# selection_sort_jose(arr)

### Insertion Sort
- Accomplished in O(n^2), this algorithm makes several passes over a list of items, while maintaining two halfs, a sorted first half and an unsorted second half.
- It starts by considering the first item as sorted, it then compares subsequent items and find where in the sorted section it belongs and shifts back all items larger than the current item back to accomodate the newly sorted item.
- This algorithm also improves on the number of swap needed to sort a list of items compared to the bubble sort.

In [None]:
def insertion_sort(arr):
    for i in range(1, len(arr)):
        
        # Visualizer code
        print(arr)
        
        current_val = arr[i]
        position = i
        
        while position > 0 and arr[position - 1] > current_val:
            
            # Visualizer code
            print(current_val, arr[position - 1])
            
            arr[position] = arr[position - 1]
            position -= 1
            
            # Visualizer code
            print(arr)
            
        arr[position] = current_val
        
    return arr

In [None]:
arr = [4,3,1,2]
insertion_sort(arr)

### Shell Sort
- This algorithm improves on the insertion by splitting the lit into more than two sub-lists,depending on an item interval/ gap, then using insertion sort to sort the sub-lists.

In [None]:
def shell_sort(arr):
    sublist_count = len(arr)//2
    
    while sublist_count > 0:        
        for item in range(sublist_count):
            
            # Visualizer code
            print(arr, arr[:sublist_count], '---')
            
            shell_insertion_sort(arr, item, sublist_count)
            
        sublist_count //= 2
    
    return arr

def shell_insertion_sort(arr, start_item, gap):
    for i in range((start_item + gap), len(arr), gap):
        curr_value = arr[i]
        curr_pos = i
        
        while curr_pos >= gap and arr[curr_pos - gap] > curr_value:
            
            # Visualizer code
            print('Thru while', arr[curr_value], gap, arr[(curr_pos - gap)])
        
            arr[curr_pos] = arr[curr_pos - gap]
            curr_pos = curr_pos - gap
        
        # Visualizer code
        print('Past while')
            
        arr[curr_pos] = curr_value

In [None]:
arr = [4,3,1,2]
shell_sort(arr)

### Merge Sort
- Merge sort is a recursive algorithm that continually splits a list in half. 
- If the list is empty or has one item, it is sorted by definition (the base case).
- If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. 
- Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list.

In [None]:
def merge_sort(arr):   
    if len(arr)>1:
        mid = len(arr)//2
        lefthalf = arr[:mid]
        righthalf = arr[mid:]

        merge_sort(lefthalf)
        merge_sort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                arr[k]=lefthalf[i]
                i=i+1
            else:
                arr[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            arr[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            arr[k]=righthalf[j]
            j=j+1
            k=k+1

    return arr

arr = [4,3,1,2]
merge_sort(arr)

### Quick Sort
- A quick sort first selects a value, which is called the pivot value.
- Although there are many different ways to choose the pivot value, we will simply use the first item in the list. 
- The role of the pivot value is to assist with splitting the list. 
- The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.

In [None]:
def quick_sort(arr,first,last):
    
    if first<last:
        
        splitpoint = partition(arr,first,last)

        quick_sort(arr,first,splitpoint-1)
        quick_sort(arr,splitpoint+1,last)

    return arr

def partition(arr,first,last):
    
    pivotvalue = arr[first]

    leftmark = first+1
    rightmark = last

    done = False
    while not done:

        while leftmark <= rightmark and arr[leftmark] <= pivotvalue:
            leftmark = leftmark + 1

        while arr[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark = rightmark -1

        if rightmark < leftmark:
            done = True
        else:
            temp = arr[leftmark]
            arr[leftmark] = arr[rightmark]
            arr[rightmark] = temp

    temp = arr[first]
    arr[first] = arr[rightmark]
    arr[rightmark] = temp


    return rightmark

arr = [4,3,1,2]
quick_sort(arr,0,len(arr)-1)

### Heap Sort

Recap notes on binary trees and heaps: https://www.programiz.com/dsa/heap-sort

- HeapSort proceedure for a max heap:

    - Since the tree satisfies Max-Heap property, then the largest item is stored at the root node.
    - Remove the root element and put at the end of the array (nth position) Put the last item of the          tree (heap) at the vacant place.
    - Reduce the size of the heap by 1 and heapify the root element again so that we have highest              element at root.
    - The process is repeated until all the items of the list is sorted.

- Just like the merge sort heapsSort has O(nlogn) time complexities for all the cases ( best case, average case and worst case).
- It however has a better space complexities of O(1), compared to merge sort's O(n), making it the most efficient sorting algorithm.

In [65]:
def heapify(arr, n, i):
    # Find largest among root and children
    largest = i
    l = 2 * i + 1
    r = 2 * i + 2 
 
    if l < n and arr[i] < arr[l]:
        largest = l
 
    if r < n and arr[largest] < arr[r]:
        largest = r
 
    # If root is not largest, swap with largest and continue heapifying
    if largest != i:
        arr[i],arr[largest] = arr[largest],arr[i]
        heapify(arr, n, largest)
 
def heapSort(arr):
    n = len(arr)
 
    # Build max heap
    for i in range(n, -1, -1):
        heapify(arr, n, i)
 
    
    for i in range(n-1, 0, -1):
        # swap
        arr[i], arr[0] = arr[0], arr[i]  

## Graph Algorithms
- A graph is a structure that shows the relationship between objects.
- They can be used to represent many real-world objects e.g road network, air-line flights, the internet etc
- Once a problem is adequately represented on a graph, std graph algorithms cna be used to solve the problem.

- Graph vocabulary:
    - Vertex / Node which represents the objects of the graph, can have a key/ name as well as information about the object it represents called the payload.
    - Edge which connect/ relates the vertices. They may be one-way(DIRECTED GRAPH) or two-way(DIGRAPH).
    - Edge weight shows that there is a cost to go from one vertex to another e.g in a graph of a road network, edge wieghts can represent the distant from once city/vertex to another.

- Formal defination:
    - A graph G consists cna be represented as G(V,E), which is a finite set of vertices and edges.
    - V and E for the graph G, are each sets representing the vertices and the edges of the graph.
    - For the set E, each edge in it, is a tuple (v,w) represented as:
    ## $$v,w \in E$$
    (NB) A third component representing the weight of the edge can also be included.
    - A sub graph s is a set of edge and vertices e and v respectively such that:
    $$e \subset E $$ and $$ v \subset V$$
    
- The graph below can be represented as follows:

    V = {A, B, C, D, E, F, G}  
    
    E = {(A,B,1), (B,D,2), (B,C,3), (B,E,1), (C,E,4), (E,F,3), (D,E,2), (G,D,1)}
    
 - A path is a sequence of vertices connected by edges.
 - The unweighted path length is the number of the edges in the path.
 - The weighted path length is the sum of the weights of the edges in the path.

- For the graph below, the path from A to D is the sequenceof vertices (A, B, D) and the edges are {(A,B,1), (B,D,2)}

- A cycle is a path that starts and ends at the same vertex.
- An acyclic graph is one that has no cycles. (For a directed graph this its a DAG (directed acyclic graph) and can be used to solve many important problems.)

(NB) If for the graph below, we had an edge (F,D,2), then the path {(D,E,2), (E,F,3), (F,D,2)}, would be a cyclic path.
- Adjacency matrices are n*n matrix with vertices listes as rows and col an the intersection of each denoting either the number of edges connecting the vertices or the weight of the egde connecting the vertices.
- They are an efficient way to represent a graph that has many vertices and edges but not for a graph with 'sparse' data (the matrix has many blank cells due to the few edges connecting the vertices). This make using an adjacency matrix memory unefficient for sparse data.
- A matrix is full when every vertex is connected to all other vertices.
- In its place an adjacency list is a better way (memory efficient) to represent a graph with sparse data.
- In the adjacency list, the graph object maintains a master list of all the vertices in the graph, then each vertex maintains a list of all the vertices it is connected to.
- A dictionary is used to implement the list, where the keys are the vertices ans the values are the weights of the edges connecting to the vertex.

In [None]:
from IPython.display import Image
Image("graph.png")

In [None]:
from IPython.display import Image
Image("SparseAdjacencyMatrix.png")

In [None]:
from IPython.display import Image
Image("AdjacencyListImplementation.png")

### Implementing a graph using adjacency list
- Using dictionaries, it is easy to implement the adjacency list in Python. 
- In our implementation of the Graph abstract data type we will create two classes: 

    - Vertex class, which will represent each vertex in the graph. Each Vertex uses a dictionary to keep track of the vertices to which it is connected, and the weight of each edge.
    - Graph class, which holds the master list of vertices.

In [None]:
# The Vertex class
class Vertex:

    def __init__(self, key):
        ''' Initializes the id, which will typically be a string, and the connected_to dictionary '''
        self.id = key
        self.connected_to = {}  # takes the form {vertex_connected_to: weight_of_edge_connecting_them, ...}

    def add_connection(self, vertex_connected_to, edge_weight=0):  # Connections between vertices are edges
        ''' Adds a connection from this vertex to another '''
        self.connected_to[vertex_connected_to] = edge_weight

    def get_connections(self):
        ''' Returns all of the vertices in the adjacency list, as represented by the connectedTo instance                   variable '''
        return self.connected_to.keys()

    def get_weight(self, edge):
        ''' Returns the weight of the edge from this vertex to the vertex passed as a parameter '''

    def __str__(self):
        ''' Returns the vertex class object properties when accessed e.g print(obj) '''
        return str(self.id) + ' connectedTo: ' + str([x.id for x in self.connected_to])


# The Graph class
class Graph:

    def __init__(self):
        ''' Initializes a new, empty graph '''
        self.vertex_list = {}  # takes the form {vertex_key:vertex_object, .....}
        self.vertex_num = 0

    def add_vertex(self, key):
        ''' Creates a new vertex object and adds it to the master vertices list '''
        self.vertex_num += 1
        new_vertex = Vertex(key)
        self.vertex_list[key] = new_vertex
        return new_vertex

    def add_edge(self, start_vertex_key, end_vertex_key, cost=0):
        ''' Adds a new, weighted, directed edge to the graph that connects two vertices '''
        if start_vertex_key not in self.vertex_list:
            new_vertex_object = add_vertex(start_vertex_key)

        if end_vertex_key not in self.vertex_list:
            new_vertex_object = add_vertex(end_vertex_key)

        # Vertex start_vertex is connected to vertex end_vertex by an edge that has a weight of cost
        self.vertex_list[start_vertex_key].add_connection(self.vertex_list[end_vertex_key], cost)

    def get_vertex(self, vertex_key):
        ''' Returns a target vertex '''
        if vertex_key in self.vertex_list:
            return self.vertex_list[vertex_key]
        else:
            return None

    def get_vertices(self):
        ''' Returns all vertices '''
        return self.vertex_list.keys()

    def __contains__(self, key):
        ''' Allows using the "in" keyword to search a vertex keys within the graph '''
        return key in self.vertex_list

    def __iter__(self):
        ''' Allows iterating over the graph by returning a vertices iterable e.g ["for i in graph"] '''
        return iter(self.vertex_list.values())


# Test code
g = Graph()
for i in range(6):
    g.add_vertex(i)

g.vertex_list

g.add_edge(0,1,2)

for vertex in g:
    print(vertex)
    print(vertex.get_connections())
    print('\n')

### Implmenting a Graph class

In [None]:
'''
The graph will be directed and the edges can hold weights.

We will have three classes, a State class, a Node class, and finally the Graph class.

We're going to be taking advantage of two built-in tools here, OrderDict and Enum
'''

from enum import Enum
from collections import OrderedDict

class State(Enum):
    unvisited = 1 # White color for an unexplored node/vertex
    visited = 2 # Black color for a fully explored node/vertex
    visiting = 3 # Gray color for a partially explored node/vertex

'''
Now for the Node class we will take advantage of the OrderedDict object in case we want to keep trak of the order keys are added to the dictionary. NOT REQUIRED IN PYTHON 3, where keys added order is maintained.
'''

class Node:

    def __init__(self, num):
        self.num = num
        self.visited_state = State.unvisited
        self.adjacent = OrderedDict() # a node in the form {adjacent_node_object: connecting_edge_weight}

    def __str__(self):
        ''' returns the nodes id/index '''
        return str(self.num)


class Graph:

    def __init__(self):
        ''' Initializes a new empty graph '''
        self.nodes = OrderedDict()  # All the graph's nodes in the form {node_index/num: the node object}

    def add_node(self, node_num):
        ''' Create a new node object, adds it to the graph and returns it '''
        node = Node(node_num)   # Create a new node objec
        self.nodes[node_num] = node     # Add new node to graph

        return node

    def add_edge(self, start_node_num, end_node_num, edge_weight=0):
        ''' 
        Checks if the nodes for the new edge exist, if not cfreates them, then adds their connecting edge
        '''

        if start_node_num not in self.nodes:
            self.add_node(start_node_num)

        if end_node_num not in self.nodes:
            self.add_node(end_node_num)

        # Add end_node as an adjacent node to start_node, connected by a new edge (weight = edge_weight)
        self.nodes[start_node_num].adjacent[self.nodes[end_node_num]] = edge_weight


In [None]:
g = Graph()
g.add_node(1)

In [None]:
g.add_edge(1, 2, 5)

In [None]:
g.nodes

In [None]:
print(g.nodes[1].adjacent)

### Implementing Depth first Search

 - This algorithm we will be discussing is Depth-First search which as the name hints at, explores possible vertices (from a supplied root) down each branch before backtracking. 
 - This property allows the algorithm to be implemented succinctly in both iterative and recursive forms. 
- Below is a listing of the actions performed upon each visit to a node.

    - Mark the current vertex as being visited.
    - Explore each adjacent vertex that is not included in the visited set.

In [None]:
# SIMPLE GRAPH
graph = {
    'A': set(['B','C']),
    'B': set(['A','D','E']),
    'C': set(['A','F']),
    'D': set([]),  # one way disconnected vertex/node
    'E': set(['B','F']),
    'F': set(['C','E'])
}

# CONNECTED COMPONENTS - NODES REACHEABLE FROM ANY NODE
'''
The implementation below uses the stack data-structure to build-up and return a set of vertices that are accessible a particular vertex. Using Python’s overloading of the subtraction operator to remove items from a set, we are able to add only the unvisited adjacent vertices.
'''

def dfs_itr(graph, start_node):
    visited, stack = set(), [start_node]    # node to start graph exploration & the node visted so far

    while stack:  # More nodes to explore are available
        vertex = stack.pop()    # Vertex to explore now

        if vertex not in visited:   # Node has not been explored/discovered
            visited.add(vertex)     # Add it discovered nodes

            ''' 
            Python overload of the '-' operator ie. graph['A'] = {'B', 'C'}, If 'B' was already explore
            and is in 'visited', then graph['A'] - visited = {'C'}, whcih is now set for exploring
            '''
            stack.extend(graph[vertex] - visited)  # Add its unexplored adjacent nodes for exploring next

    return visited

'''
The second implementation provides the same functionality as the first, however, this time we are using the more succinct recursive form. Due to a common Python gotcha with default parameter values being created only once, we are required to create a new visited set on each user invocation. Another Python language detail is that function variables are passed by reference, resulting in the visited mutable set not having to reassigned upon each recursive call.
'''

def dfs_rec(graph, start_node, visited=None):
    if not visited:  # if visited is None
        visited = set()

    visited.add(start_node)

    for next_node in graph[start_node] - visited:
        dfs_rec(graph, next_node, visited)

    return visited


# PATHS - ALL POSSIBLE PATHS BETWEEN ANY TWO NODES
'''
We are able to tweak both of the previous implementations to return all possible paths between a start and goal vertex. The implementation below uses the stack data-structure again to iteratively solve the problem, yielding each possible path when we locate the goal. Using a generator allows the user to only compute the desired amount of alternative paths.
'''

def dfs_paths(graph, start_node, goal_node):
    stack = [(start_node, [start_node])]

    while stack:
        (vertex, path) = stack.pop()

        for next_node in graph[vertex] - set(path):
            if next_node == goal_node:
                
                ''' Returnig the first path found, then continuing to find other paths '''
                yield path + [next_node]  # returns back to caller but continues executions after this
            else:
                stack.append((next_node, path + [next_node]))

In [None]:
dfs_itr(graph, 'B')

In [None]:
dfs_rec(graph, 'D')

In [None]:
list(dfs_paths(graph, 'A', 'F'))

### Implemeting Breath First Search

- An alternative algorithm called Breath-First search provides us with the ability to return the same results as DFS but with the added guarantee to return the shortest-path first. 
- This algorithm is a little more tricky to implement in a recursive manner instead using the queue data-structure, as such I will only being documenting the iterative approach. 
- The actions performed per each explored vertex are the same as the depth-first implementation, however, replacing the stack with a queue will instead explore the breadth of a vertex depth before moving on. - - This behavior guarantees that the first path located is one of the shortest-paths present, based on number of edges being the cost factor.

In [None]:
# CONNECTED COMPONENTS - NODES REACHEABLE FROM ANY NODE
'''
Similar to the iterative DFS implementation the only alteration required is to remove the next item from the beginning of the list structure instead of the stacks last.
'''

def bfs(graph, start_node):
    visited, queue = set(), [start_node]

    while queue:
        vertex = queue.pop(0)

        if vertex not in visited:
            visited.add(vertex)
            queue.extend(graph[vertex] - visited)

    return visited

# PATHS - ALL POSSIBLE PATHS BETWEEN ANY TWO NODES
'''
This implementation can again be altered slightly to instead return all possible paths between two vertices, the first of which being one of the shortest such path.
'''

def bfs_paths(graph, start_node, goal_node):
    queue = [(start_node, [start_node])]

    while queue:
        (vertex, path) = queue.pop(0)

        for next_node in graph[vertex] - set(path):
            if next_node == goal_node:
                yield path + [next_node]
            else:
                queue.append((next_node, path + [next_node]))


# SHORTEST PATH BETWEEN ANY TWO NODES
'''
Knowing that the shortest path will be returned first from the BFS path generator method we can create a useful method which simply returns the shortest path found or ‘None’ if no path exists. As we are using a generator this in theory should provide similar performance results as just breaking out and returning the first matching path in the BFS implementation.
'''

def bfs_shortest_path(graph, start_node, goal_node):
    try:
        return next(bfs_paths(graph, start_node, goal_node))
    except StopIteration:
        return None

In [None]:
bfs(graph, 'D')

In [None]:
list(bfs_paths(graph, 'B', 'F'))

In [None]:
bfs_shortest_path(graph, 'F', 'B')