# Algorithms, Binary Search & Linked Lists

## Tasks Today:
 
1) <b>In-Place Algorithms</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) Syntax <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) Out of Place Algorithm <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) In-Class Exercise #1 <br>
2) <b>Two Pointers</b> <br>
4) <b>Merge Sort</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) Video on Algorithms <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) How it Works <br>
5) <b>Exercises</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) Exercise #1 - Reverse a List in Place Using an In-Place Algorithm <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) Exercise #2 - Find Distinct Words <br>
 &nbsp;&nbsp;&nbsp;&nbsp; c) Exercise #3 - Write a program to implement a Linear Search Algorithm. <br>

## In-Place Algorithms

###### Main distinction: is the original data structure being modified?

###### An in-place algorithm modifies the original data structure (often directly)

#### Syntax

In [1]:
# switching the places of values within an ordered data structure
# is a swapping algorithm
# at it's simplest, that means swapping the index location of two values
    # which can be done using multiple variable assignment
    # a = <a_value>
    # b = <b_value>
    # we can swap the values of a and b with multiple variable assignment
    # a, b = b, a
    # the result will be
    # a = <b_value>
    # b = <a_value>
    
# using that concept of multiple variable assignment
# we can create a simple in-place swapping algorithm

def swap(alist, x, y):
    """
    accept a list and two index numbers
    swap the order of the values at those indexes
    """
    alist[x], alist[y] = alist[y], alist[x]
    
mylist = ['Ruben Dias', 'Kyle Walker', 'John Stones']
print(f'before swap: {mylist}')
swap(mylist, 0, 2)
print(f'after swap: {mylist}')

# notice there is no variable redefinition - we're still looking at the same original 'mylist'
    # and that original list has changed value
# another thing to notice is that this function doesn't return anything
    # no return value is typical for an in-place algorithm
    # we don't need a return value - the function is acting directly on the original data structure
    # we already have access to the original data structure
    # therefore we have no need for a return value

before swap: ['Ruben Dias', 'Kyle Walker', 'John Stones']
after swap: ['John Stones', 'Kyle Walker', 'Ruben Dias']


## Out of Place Algorithms

###### An out of place algorithm is characterized by the creation of a new data structure/collection/value
###### And maintains data integrity aka does not modify the original values

In [10]:
# an example of an out of place algorithm - reversing a list entirely using list slicing
# or list slicing in general

# a simple out of place algorithm: reversing a list with slicing
print(mylist)
mylist[::-1] # reverse the list using slicing
print(mylist)
# nothing changed above!
    # reversing the list using slicing creates a modified copy of the original
    # it doesn't modify the original
    # therefore if we want to work with the modified copy going forward
    # we must create a new variable or redefine the original variable
print(f'original before reverse: {mylist}')
reversecopy = mylist[::-1]
print(f'original after reverse: {mylist}')
print(f'new reversed version: {reversecopy}')

# an out of place algo either creates a modified copy or an entirely new data structure
# out of place algorithms are often easily identifiable by the necessity of variable redefintion/assignment
    # and the necessity of a return statement in a function

['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']
['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']
original before reverse: ['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']
original after reverse: ['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']
new reversed version: ['Rodrygo', 'Marcelo', 'Karim Benzema', 'Courtois']


###### Classic Example of in-place vs. out of place of the same process: sorted() vs. .sort()

In [7]:
# sorted() is an out of place implementation of the TimSort algorithm
mylist = ['Karim Benzema', 'Rodrygo', 'Courtois', 'Marcelo']
print(f'original before sort: {mylist}')
sortedcopy = sorted(mylist)
print(f'original after sort: {mylist}')
print(f'new sorted version aka return value: {sortedcopy}')

original before sort: ['Karim Benzema', 'Rodrygo', 'Courtois', 'Marcelo']
original after sort: ['Karim Benzema', 'Rodrygo', 'Courtois', 'Marcelo']
new sorted version aka return value: ['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']


In [9]:
# .sort() is an in-place implementation of the same TimSort algorithm
mylist = ['Karim Benzema', 'Rodrygo', 'Courtois', 'Marcelo']
print(f'original before sort: {mylist}')
sortedcopy = mylist.sort()
print(f'original after sort: {mylist}')
print(f'return value: {sortedcopy}')

original before sort: ['Karim Benzema', 'Rodrygo', 'Courtois', 'Marcelo']
original after sort: ['Courtois', 'Karim Benzema', 'Marcelo', 'Rodrygo']
return value: None


#### In-Class Exercise #1 <br>
<p>Write a function that takes in one argument (a_list), and reverses that list in-place.</p>

In [7]:
l_1 = [10, 4, 3, 8, 4, 2, 6]



def reverseInPlace(a_list):
    for i in range(len(a_list)//2):
        a_list[i], a_list[-(i+1)] = a_list[-(i+1)], a_list[i]
        
reverseInPlace(l_1)
print(l_1)

[6, 2, 4, 8, 3, 4, 10]


## What is a pointer?

In [9]:
# just some variable with an integer value (usually) that is set up to keep track of index numbers
    # as you loop or perform an algorithm/process

# the primary advantage of using a pointer is additional control over the process of looping
# pointers are most commonly used with while loops

l_1 = [10, 4, 3, 8, 4, 2, 6]

# let's look at a pointer-based approach to the above reversal algorithm
def reverseInPlace(a_list):
    i = 1 # a pointer
    while i < len(a_list)//2:
        a_list[i], a_list[-(i+1)] = a_list[-(i+1)], a_list[i]
        i += 1
        
# same effect as the above function, just a different approach
    # because we have control over the way the loop steps, we can pick slightly different behavior for our loop
    # swap the middle values only? (ignore the end values)

print(l_1)
reverseInPlace(l_1)
print(l_1)

[10, 4, 3, 8, 4, 2, 6]
[10, 2, 4, 8, 3, 4, 6]


## Two Pointers

#### Syntax

In [11]:
# same concept as a single pointer
# but, we can independently move the two pointers
# one pointer/one index location you are examining in the data structure can be entirely unrelated to the other location

l_1 = [10, 4, 3, 8, 4, 2, 6]

def reverseInPlace(a_list):
    left = 0 # a pointer that starts at the start of the list
    right = len(a_list)-1 # pointer that starts at the end of the list
    while left < right:
        a_list[left], a_list[right] = a_list[right], a_list[left]
        left += 1
        right -= 1
        
print(l_1)
reverseInPlace(l_1)
print(l_1)

[10, 4, 3, 8, 4, 2, 6]
[6, 2, 4, 8, 3, 4, 10]


#### Video of Algorithms <br>
<p>Watch the video about algorithms.</p>

https://www.youtube.com/watch?v=Q9HjeFD62Uk

https://www.youtube.com/watch?v=kPRA0W1kECg

https://www.youtube.com/watch?v=ZZuD6iUe3Pc

# Sorting Algorithms

#### Bubble Sort

Worst Case: O(n^2) Time - O(1) Space

In [18]:
# Time Complexity
# O(n^2) Θ(n^2) Ω(n)
# Big-O notation: worst-case: quadratic
# Theta notation: avg-case: quadratic
# Omega notation: best-case (already sorted): linear

def bubbleSort(arr):
    # flag variable - True if we think the list is sorted, False if the list is not yet sorted
    unSorted = True
    while unSorted:
        # assume that the list is sorted already
        unSorted = False
        # perform our actual inner loop with comparisons and check/confirm whether or not this is sorted
        for i in range(len(arr)-1):
            if arr[i] > arr[i+1]: # check if values are out of order
                # swap them if so
                arr[i], arr[i+1] = arr[i+1], arr[i]
                unSorted = True # if a swap occured, there are potentially more swaps to occur
    
mylist = [5, 3, 12, 37, 7, 72, 20, 1, 42]
bubbleSort(mylist)
print(mylist)

[1, 3, 5, 7, 12, 20, 37, 42, 72]


##### Insertion Sort

Worst Case: O(n^2) time - O(1)space

In [20]:
# Time Complexity
# O(n^2) Θ(nlogn) Ω(n)
# Big-O notation: worst-case: quadratic
# Theta notation: avg-case: linear logarithmic
# Omega notation: best-case (already sorted): linear

def insertionSort(array):
    # outer for loop - all values except the first value in the list
    for i in range(1, len(array)):
        j=i # start pointer j at the index we're pulling the value out of
        # j is a pointer starting at index i where we 'extracted' a value
        # j will move toward the start of the list as we swap values
        while j > 0 and array[j] < array[j-1]:
            array[j], array[j-1] = array[j-1], array[j]
            j -= 1 # move the pointer to do the next comparison
            
mylist = [5, 3, 12, 37, 7, 72, 20, 1, 42]
insertionSort(mylist)
print(mylist)

[1, 3, 5, 7, 12, 20, 37, 42, 72]


## Merge Sort

#### How it Works

In [22]:
# There are multiple ways to code a MergeSort algorithm
# We'll be looking at a classic recursive implementation

# Step 1: Split every item into its own partition recursively
# Step 2: From left to right, merge partitions together
# Step 3: While merging partitions, place values into the correct position within the partitions
# Step 4: Continue steps 2-3 until all partitions have been merged back into one whole

# Advantage - merge sort is more efficient in the worst case scenario!
# O(nlogn) Θ(nlogn) Ω(nlogn)
# All cases linear logarithmic
# Merge sort does use some additional memory - O(n) linear space complexity

mylist = [5, 3, 12, 37, 7, 72, 20, 1, 42]

def mergeSort(arr):
    print('Splitting...', arr)
    
    # step 1: splitting the array
    if len(arr) > 1:
        lefthalf = arr[:len(arr)//2]
        righthalf = arr[len(arr)//2:]
        
        # recursively call mergeSort to perform all of our merges/creation of partitions
        mergeSort(lefthalf)
        mergeSort(righthalf)
        
        # steps 2&3 - comparisons and merging partitions
        # set up pointers:
        i = 0 # pointer for the left half
        j = 0 # pointer for the right half
        k = 0 # pointer for the main array
        # if we have values left in both partitions
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                arr[k] = lefthalf[i]
                i = i+1
                k = k+1
            else:
                arr[k] = righthalf[j]
                j = j+1
                k = k+1
        # only have values left in the lefthalf
        while i < len(lefthalf):
            arr[k] = lefthalf[i]
            i = i+1
            k = k+1
        # only have values left in the righthalf
        while j < len(righthalf):
            arr[k] = righthalf[j]
            j = j+1
            k = k+1
    print('Merging: ', arr)
    
print(mylist)
mergeSort(mylist)
print(mylist)

[5, 3, 12, 37, 7, 72, 20, 1, 42]
Splitting... [5, 3, 12, 37, 7, 72, 20, 1, 42]
Splitting... [5, 3, 12, 37]
Splitting... [5, 3]
Splitting... [5]
Merging:  [5]
Splitting... [3]
Merging:  [3]
Merging:  [3, 5]
Splitting... [12, 37]
Splitting... [12]
Merging:  [12]
Splitting... [37]
Merging:  [37]
Merging:  [12, 37]
Merging:  [3, 5, 12, 37]
Splitting... [7, 72, 20, 1, 42]
Splitting... [7, 72]
Splitting... [7]
Merging:  [7]
Splitting... [72]
Merging:  [72]
Merging:  [7, 72]
Splitting... [20, 1, 42]
Splitting... [20]
Merging:  [20]
Splitting... [1, 42]
Splitting... [1]
Merging:  [1]
Splitting... [42]
Merging:  [42]
Merging:  [1, 42]
Merging:  [1, 20, 42]
Merging:  [1, 7, 20, 42, 72]
Merging:  [1, 3, 5, 7, 12, 20, 37, 42, 72]
[1, 3, 5, 7, 12, 20, 37, 42, 72]


# Binary Search

The Binary Search algorithm works by finding the number in the middle of a given array and comparing it to the target. Given that the array is sorted

* The worst case run time for this algorithm is `O(log(n))`

In [55]:
# We're going to look at a binary search in a sorted array of numbers

from random import randint
# make a list of random numbers
nums = sorted([randint(0,1000) for i in range(1800000)])
#print(nums)

def binarySearch(arr, target):
    left = 0
    right = len(arr)-1
    steps = 0 # counts total number of steps (not part of the functionality of the algorithm)
    while left <= right: # while we still have values to check
        steps += 1
        mid_index = (left+right)//2
        if target == arr[mid_index]:
            return f'The index of {target} is {mid_index}. Steps taken: {steps}'
        elif target > arr[mid_index]:
            # target greater than middle value
            left = mid_index + 1
        else:
            right = mid_index - 1
    return f'The value {target} is not present in the list. Steps taken: {steps}.'

binarySearch(nums, 20000)

'The value 20000 is not present in the list. Steps taken: 21.'

# Exercises

### Exercise #1 <br>
<p>Reverse the list below in-place using an in-place algorithm.<br>For extra credit: Reverse the strings at the same time.</p>

In [None]:
words = ['this' , 'is', 'a', 'sentence', '.']


### Exercise #2 <br>
<p>Create a function that counts how many distinct words are in the string below, then outputs a dictionary with the words as the key and the value as the amount of times that word appears in the string.<br>Should output:<br>{'a': 5,<br>
 'abstract': 1,<br>
 'an': 3,<br>
 'array': 2, ... etc...</p>

In [25]:
a_text = 'In computing, a hash table hash map is a data structure which implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots from which the desired value can be found'

# uppercase and lowercase the same
# punctuation not impacting the answer - in this example it shouldn't be a problem

def counts(astr):
    countdict = {}
    words = sorted(astr.lower().split()) # feed the dictionary words in sorted order bc dictionaries are insertion-ordered
    for w in words:
        # if the word is already in my countdict's keys, increase the value by 1
        if w in countdict:
            countdict[w] += 1
        # otherwise the word is not already in the countdict, add a key value pair for that word
        else:
            # where the value is 1 because we are seeing this word for the first time
            countdict[w] = 1
    return countdict

counts(a_text)

{'a': 5,
 'abstract': 1,
 'an': 3,
 'array': 2,
 'associative': 1,
 'be': 1,
 'buckets': 1,
 'can': 2,
 'compute': 1,
 'computing,': 1,
 'data': 2,
 'desired': 1,
 'found': 1,
 'from': 1,
 'function': 1,
 'hash': 4,
 'implements': 1,
 'in': 1,
 'index': 1,
 'into': 1,
 'is': 1,
 'keys': 1,
 'map': 2,
 'of': 1,
 'or': 1,
 'slots': 1,
 'structure': 2,
 'table': 2,
 'that': 1,
 'the': 1,
 'to': 2,
 'type,': 1,
 'uses': 1,
 'value': 1,
 'values.': 1,
 'which': 2}

In [29]:
# What about a version that ignores punctuation and uses built-ins
from collections import Counter
import string

# Counter() is responsible for creating the Counter object (dictionary-like structure providing our answer)
# .translate(str.maketrans()) are responsible for the removal of punctuation

mycounter = Counter(a_text.lower().translate(str.maketrans('', '', string.punctuation)).split())
mycounter.most_common(3)

[('a', 5), ('hash', 4), ('an', 3)]

## Exercise #3

Write a program to implement a Linear Search Algorithm. Also in a comment, write the Time Complexity of the following algorithm.

#### Hint: Linear Searching will require searching a list for a given number. 

In [11]:
# write a function that searches a list for a given value
# in linear time aka O(n)
# ... aka just a for loop through a list to find a value

# O(n) - linear
def searchVal(alist, value):
    for i in range(len(alist)): # n
        if value == alist[i]: # 1
            return i
    return None

searchVal([5, 120, 325, 352, 12, 4, 52, 16], 12)

4

In [10]:
# we just wrote the list.index() method
[5, 120, 325, 352, 12, 4, 52, 16].index(12)

4

## Wednesday Whiteboard - O(n) time O(1) space answer

###### Given a sequence of numbers, find the largest pair sum in the sequence.

###### For example

###### [10, 14, 2, 23, 19] -->  42 (= 23 + 19)
###### [99, 2, 2, 23, 19]  --> 122 (= 99 + 23)

###### Input sequence contains minimum two elements and every element is an integer.

In [4]:
# the goal is to perform as few expensive operations as possible
# initial answer had sorting - sorting is a linear logarithmic process O(nlogn)
    # so, we sought to eliminate that sorting from our refactored answer
    # using max twice and a remove, we were able to reduce our time complexity to O(n) linear
        # however, we performed three linear operations (still linear overall but we can improve)
# a good strategy if we need to find certain values in a list
# is to create placeholder variables that will hold those values
# that way we can then loop and fill in those placeholder variables
# usually just by performing simple mathmatical comparisons (which are very efficient)

# I want to replace max() .remove() and max()
# with a single for loop
# the secondary benefit to this besides the moderate efficiency gain (remember- still overall linear)
# we will maintain data integrity (we won't alter the original list like we did with .remove())

# O(n) linear Time complexity
# O(1) constant space complexity

def bigPairSum(arr):
    if arr[0] > arr[1]: # 1
        largest = arr[0] # 1
        large2 = arr[1] # 1
    else:
        largest = arr[1] # 1
        large2 = arr[0] # 1
    for i in range(2, len(arr)): # n
        if arr[i] > largest: # 1
            large2 = largest # 1
            largest = arr[i] # 1
        elif arr[i] > large2: # 1
            large2 = arr[i] # 1 
    return largest+large2 # 1

bigPairSum([99, 2, 2, 23, 19])
    

122