### MY470 Computer Programming
# Searching and Sorting Algorithms
### Week 10 Lecture

## Problem Set 5 takes place in class tomorrow!

* Come to your class. Don't be late. 
* **7 questions, 25 min long**
  * Give **Big-O for time complexity** and explain reasoning in 1-2 sentences.
  * Last 2 questions also ask you to write a simple function (~3 lines of code each)
* If you miss class and have not informed us **in advance** with a valid reason, we will mark your submission as **no attempt** and assign 0 to it.

## Overview

We will practice thinking about algorithm design and complexity analysis:

* Search algorithms
    * Linear search
    * Binary search
* Sorting algorithms
    * Bubble sort
    * Selection sort
    * Insertion sort
    * Merge sort
* Hashing

- revisit binary search (called it binary search)
- sorting algorithms in the order of their efficiency(but the first three all n**2, only merge sort is O(n log n))
- hashing
    - might have encountered it as an error
    - when trying to put certain values into dict or set, saying something like list is non-hashable element, now understand where this comes from
    - how to implement data structures where you can look up items in O(1)

## Searching

![Searching](figs/searching.jpg "Searching")

* The goal is to find a specific item in a collection of items
* The answer could be `True` or `False`, or alternatively, the precise location of the item
* In Python, search with `in`

- in to check whether element in the list
- could also get the precise index of where the element is in a list

In [12]:
4 in [1, 2, 3, 4, 5]

True

## Linear/Sequential Search

* Visit each item in the collection in order until you discover the item or until you run out of items

In [20]:
def linear_search(ls, e):
    """Assume ls is a list.
    Return True if e is in ls, False otherwise.
    """
    for i in range(len(ls)):
        if ls[i]==e:
            return True
    return False


- in order to find whether element is in there, no other way but to visit every element in the list until we find it
    - iterating over each item in the list (or other data type)
- using indexing might not be the best option
    - can easily fix the function to tell us where the element is instead of just true or false
- **Time complexity: O(n) where n is the length of the list ls**
    - the longer our list is the more time we require to go through the entire list
- space complexity: no new variables created, thus constant O(1)



### Exercise 1: What is the time and space complexity of linear search?

## Binary Search

*Example of divide and conquer strategy – break the problem into smaller pieces, solve the smaller pieces, and then reassemble to get the result*


* Assume search space is sorted
* Start from the middle
    * If the item is the one we are searching for, we are done
    * If the item is larger than the one we are searching for, eliminate the upper half and repeat the search in the lower half
    * If the item is smaller than the one we are searching for, eliminate the lower half and repeat the search in the upper half


- **we can do better than sequential serach under one condition: the data is sorted**
- we introduced binary search as bisection search
- start form the emiddle, check that element, if that is not the element can assesss whether element is larger or smaller, if larger then ignore everything that is above 
    - **= divide and conquer strategy**: break complex problem into smaller same type of problems, solve each and reassemble to get the solution
    - -> **naturally implemented as a recursive function**

## Binary Search

In [9]:
# binary_search() is called a "wrapper function" –
# it hides the implementation details
def binary_search(ls, e):
    """Assume ls is a list with its elements in ascending order.
    Return True if e is in ls, False otherwise.
    """
    
    def b_search(ls, e, low, high):
        # Decrement high - low
        if high==low: # only one item left
            return ls[low]==e
        mid = (low + high)//2
        
        # Check if the item is at the midpoint
        if ls[mid]==e:
            return True
        # If the item is smaller than the midpoint, search in the lower half 
        elif ls[mid] > e:
            if low==mid: # no items left
                return False
            else:
                return b_search(ls, e, low, mid - 1)
        # If the item is larger than the midpoint, search in the higher half 
        else:
            return b_search(ls, e, mid + 1, high)
        
    if len(ls)==0:
        return False
    else:
        return b_search(ls, e, 0, len(ls) - 1)
    

ChatGPT:
This function performs a binary search, which is an efficient algorithm for finding an element in a **sorted list** by repeatedly dividing the search interval in half.

- In each recursive call (or iteration in a non-recursive binary search), the search space is halved.
    - After the first step, the search space is n/2
    - After the second step, the search space is n/4 and so on
- The search space keeps halving until the number of elements left is 1 (or no elements are left to search).
- The number of steps (recursive calls) required to reduce the search space from n elements to 1 element is logarithmic in base 2, log n
- Thus, the time complexity of binary search is O(log n), where n is the lenght of the list

- Space complexity: Space Complexity: The space complexity is O(log n) due to the recursive calls. Each recursive call adds a new frame to the call stack, and since the recursion depth is proportional to the logarithm of the number of elements, the space complexity is also logarithmic.

- extra steps to account for not skipping any of the indices
- integer dividsion -> we can then use mid as an index
- looking for particuar value contained in a list, thus have to be careful with the index to ensure we dont miss anything
- outer function = wrapper function
    - hides the implementation detail
    - the recursive function requires to be specified with an interval but the user should not care how it is implemented, hides implementation details, that is the only purpose it serves
    - why specifying arguments for the helper function if not dependent on user input?

- **time complexity of binary search: O(log n) or O(e log n) depending on what e is, e lenght of string and n is length of the list**
    - At every step, we halve the range in which to look for. If we double the data, we need one more step. 
    - 2 inputs: ls and e
        - only depends on e if e is a string
        - when comparing strings, or if e is another list, have to compare every element ~ has a loop in it
    - if you want to be more inclusive, have to account for the fact that e could be a string
- space copmlexity: recursive function 
    - when recursive function: at the bottom of recursive tree how many memory threads you have opened up, until we get to the bottom of the tree we have log n threads, 
    - thus here space complexity is also log n, BE CAREFUL WITH SPACE COMPLEXITY FOR RECURSIVE FUNCTIONS

   
    

log n: at every step halfing the range we are looking for, if we double the data, we only need one extra step, the steps at which we are reducing are larger than ... where n is the length of the list
    - search does not depend on e unless E IS A STRING, when comparing string to another string or if another list, have to compare every element in this object, then it has a loop in it too
    - without makign assumption that e is particular type, techncially have to also express it as a function of e, if it has its own sub elemetns, big O of n*e where n is lenght of list and e is lenght of element e in cases when e is some kind of sequence 
    - BECAUSE NO SPECIFICAOITN OF WHAT E IS, n * log n, log n attemps where we compare the length of e thus O(n*log n)
    - NO AMBIGUITIES IN THE QUIZ
    - converse: 
    - steps are changing, if large data, when halving much larger amount of data removed


### Exercise 2: What is the time and space complexity of binary search?

## When to Sort and Use Binary Search

* Best if searching needs to be done many times
* For small $n$, the additional cost of sorting is likely not worth it
* For large $n$, sorting may be too expensive and ultimately, sequential search may be preferable


- do we do linear search (O(n)) or sort (n log n) and then do binary search (log n except if e has internal structure)?
    - depends on application and data (how large the data, how many sorts if the data changes, how many searches)
    - linear serach is good enough most of the time
    - if we search a lotof times and the data does not change, then makes sense to sort and then do quick searches with log n
    - if the data keeps changing, then you have to keep on sorting
    - if onyl couple of searches, then linear good enough
- if lots of searches, better way to do this with hashing

## Sorting

![Sorting](figs/sorting.jpg "Sorting")

* The goal is to place items from a collection in some kind of order
* Sorting requires two operations:
    * Compare values
    * Exchange values if they are not in the correct order
* **The efficiency of a sorting algorithm depends on the total number of comparisons and exchanges**


- **List.sort: in place, without creating new list**
- **sorted(): call it not just on lists but any kind of sequence or string**
    - we get new object of same data type that is sorted
- they are in order of runtime efficiency

In [1]:
ls = [3, 5, 1, 2, 4]
ls_new = sorted(ls)
print(ls, ls_new)

ls.sort()
print(ls)

[3, 5, 1, 2, 4] [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


## Sorting Algorithms

* Bubble sort
* Selection sort
* Insertion sort
* Merge sort


## Bubble Sort

![Bubble sort](figs/bubble_sort.jpg "Bubble sort")

1. Iterate over a list and compare the item at the current position with every item in the remaining sublist; swap the items if necessary to get the correct ordering
2. Repeat until no swaps are done

[Visualization](https://visualgo.net/bn/sorting)

## Bubble Sort

- start from beginning, push back the largest element, largest bubble moving through the data until it goes to the end, then the second largest
- visualgo.net/en/sorting
- code
    - starts from the beginning
    - compare element where we are with the one that follows, if our current one is larger, then we swap them
    - with one line we reassign two values
    - reduce the area where we iterate by one every time, the one is already sorted, look at every thing than the last one
- time complexity: 
    - two nested loops, both of them go over some part of the data
    - quadratic O(n**2) where n is the lenght of the list ls
    - inner one is shorter: keep on looking at shorter segment of the data, have pushed all the larger elements to the back
        - ends up being a constant which we ignore?
    - a lot of non final swaps: most of the shifting is temporary that we have to do again and again, too many unnecessary operations
        - very inefficient in terms of run time
- bubble sort is very inefficient, still doable

ChatGPT:
- The outer loop (for passnum in range(len(ls) - 1, 0, -1)) iterates from the last element (len(ls) - 1) down to the second element (1), effectively iterating n−1 times (where n is the length of the list).
- The inner loop (for i in range(passnum)) iterates over the unsorted portion of the list. On the first pass, it compares all n−1 elements. On the second pass, it compares n−2 elements, and so on, until it compares just 1 element on the last pass.
- Inside the inner loop, if an element at position i is greater than the element at position i+1, the two elements are swapped. This operation is constant-time O(1).

In [20]:
def bubble_sort(ls):
    """Assume ls is a list of elements that can be compared using >.
    Sort ls in ascending order.
    """
    
    # Start from the whole list, reducing towards the front
    for passnum in range(len(ls) - 1, 0, -1):
        # Consider each of the sublists
        for i in range(passnum):
            if ls[i] > ls[i + 1]:
                # Swap, pushing the larger number to the back
                ls[i], ls[i + 1] = ls[i + 1], ls[i]


### Exercise 3: What is the time and space complexity of bubble sort?

[Hint](https://www.youtube.com/watch?v=koMpGeZpu4Q)

## Selection Sort

1. Iterate over a list and look for the largest/smallest item in the remaining sublist
2. Swap the item in the current position with the identified item

- iterate over list, look for largest/smallest element, just want to identify it, not swap it, then swap the element 
- fewer swaps than with bubble sort, first identify what to swap and then do a swap
- going through list many times but fewer swaps
- code
    - inner loop: 
    - look for the element htat is larger than we have currently recorded as the max position
    - if we encounter anything that is larger, we change what we identify as the largest element
    - move largest element at the end, then again dont have to look at all elements again
- time complexity: grows on the order of n squared
    - no diff, still nested loops, one gets shorter and shorter but
- more efficient because some of the operations are outside of inner loop, thus we only do them n times rather than n squared
    - dont do that many non final swaps



In [1]:
def selection_sort(ls):
    """Assume ls is a list of elements that can be compared using >.
    Sort ls in ascending order.
    """
    
    # Consider each position, starting from the back
    for pos in range(len(ls) - 1, 0, -1):
        max_pos = 0
        # Find the largest item in the sublist until this position
        for i in range(1, pos + 1):
            if ls[i] > ls[max_pos]:
                max_pos = i
        
        # Swap the item at this position with the largest item
        ls[pos], ls[max_pos] = ls[max_pos], ls[pos]


ChatGPT
- Outer Loop:
    - The outer loop (for pos in range(len(ls) - 1, 0, -1)) iterates through the list from the last position to the second position. 
    - At each iteration, it selects the largest element in the unsorted part of the list and places it in its correct position.
- Inner Loop:
    - The inner loop (for i in range(1, pos + 1)) searches through the unsorted part of the list, starting from the beginning up to the current position of pos.
    - It compares each element with the current largest element (tracked by max_pos).
    - The largest element is stored in max_pos.
- Swapping:
    - After the inner loop completes, the element at the max_pos (the largest found element in the unsorted part) is swapped with the element at the current pos.

### Exercise 4: What is the time complexity of selection sort?

## Insertion Sort

1. Iterate over a list starting from the beginning
2. Insert each new item into the previous sublist in order, shifting the positions of larger items by 1

In essence, the algorithm maintains a sorted sublist in the lower positions of the list as it progresses one item ahead

- every element we encounter, go back and put it into its rightful slot in the section we have already covered, lways maintaining a sorted sublist at the beginning of oru collection of our list, anything new we encounter we jsut sort it
- shift everything in the sorted list until we find the place of new addition
- most intuitive to her, the other ones rather untuitive
- code:
    - can always start from the end rather than beginning, vice versa
    - starting form the beginning
    - current value and its position
    - while previous elemetn is larger than current value, we move shift this element current element move it to further grab current value and start looking backwars and anything that is larger we move it further, once we encounter that is smaller than current value, then we have position where we should assign our value
- again two loops, **still n squared** ASK CHAT GPT WHY, on the order of the length of the list
- **performance: better often, better than selection sort, dont have swaps, no mutual assignment, often data we work with is partially sorted**
- **merge sort is the one you will be using, using none of the one above**
- **except for binary search, you will never have to write your own sorting algorithms**
    - sometimes optimisations for certain data, but in general just covering hwo you learn to evaluate solutions

ChatGPT

- Outer Loop: The outer loop (for i in range(1, len(ls))) iterates through the list starting from the second element (index 1). For each element, it finds the correct position in the sorted part of the list (which is the portion before the current element).
- Inner Loop (while loop): The inner loop (while pos > 0 and ls[pos - 1] > currentvalue) moves elements of the sorted part of the list to the right until it finds the correct position for the currentvalue (the element from the outer loop).
- Inserting: Once the correct position is found, currentvalue is inserted into that position by setting ls[pos] = currentvalue.

In [25]:
def insertion_sort(ls):
    """Assume ls is a list of elements that can be compared using >.
    Sort ls in ascending order.
    """
    
    for i in range(1, len(ls)):
        currentvalue = ls[i]
        pos = i

        while pos > 0 and ls[pos - 1] > currentvalue:
            ls[pos] = ls[pos - 1]
            pos -= 1

        ls[pos] = currentvalue


### Exercise 5: What is the time complexity of insertion sort?

## Merge Sort

*Example of divide and conquer strategy – break the problem into smaller pieces, solve the smaller pieces, and then reassemble to get the result*

1. If the list has 0 or 1 elements, it is sorted
2. If the list has more than 1 element, split the list in two and use merge sort on each
3. Merge the results*

\* Merge by inspecting the first elements of the two lists and moving the smaller one to the end of the result list

- you dont have to write it youself, difficult
- **this version is the one that .sort and sorted() use, thus they are O(n log n)**
- recursive algorithm, example of divide and conquer strategy
    - problem of sorting larger list is reduced by breaking the list down and then recombining them
- recursive function
    - more than one element
    - split it int he middle and clal recursive on each sub list 
    because lsit is already sorted we dont have to compare anymore 
- this should be n log n? because that is the O for .sort and sorted(), this is what is implemented in python, they use merge sort
- recursive tree, start from each leaf and reocmbine lsit together and order it, already sorted, recombiining is much easier, two sublists that are sorted, then start pickign by order from the two sub lists

- **time complexity: O(n log n)**
    - merge_sort() is called log n times
        - splitting the list in half, log n steps, if we double the data, we add one more merge_sort() step
    - merge() is on the order of n -> O(n)
        - sequential loops, but not nested
        - they touch each element



    - //2 merge sort is called log n times. at every step splitting list in half, this is log n steps
    - merge() is on the order of n, we have sequential loops, another loop here and there but all of these, they look at data we have and touch them but they are linear, touch elements multiple times within each sub list, touching them again ,but repeated loops rather than scaled loops adn they repeat log n times and thus n log n

- **you dont have to write this, cannot do better than sort(), dont have to write any of this ever**
- **but possible you might get question about this for data science jobs ,interview, how much you have covered, good to have some basic understanding of this, go to visualisation and study a bit, get intuition, with recursion**

## Merge Sort

In [26]:
def merge_sort(ls):
    """Assume ls is a list. 
    Return a new sorted list with same elements as ls.
    """
    
    if len(ls) <= 1:
        return ls[:]
    else:
        middle = len(ls)//2
        left = merge_sort(ls[:middle])
        right = merge_sort(ls[middle:])
        return merge(left, right) # calling merge on the result, it recombines these sorted lists
    
    
def merge(left, right):
    """Assume left and right are sorted lists.
    Return a new sorted list containing the same elements as (left + right).
    """
    
    result = []
    i, j = 0, 0 # keep track where we are on the right and on the left list, keep goign until one of the list depletes
    # Inspect the first items of the two lists and append the smaller one to results
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    # Append any remaining items
    while i < len(left):
        result.append(left[i])
        i += 1
    while j < len(right):
        result.append(right[j])
        j += 1
    return result


### The time and space complexity of merge sort
* The time complexity of merge sort is $O(n \log n)$. It takes $O(\log n)$ splits and each of them requires a merge which is $O(n)$. In a merge, each item in the list will eventually be processed and placed on the sorted list, so it will take $n$ operations to get a list of size $n$
* The space complexity is $O(n)$, as the algorithm copies the list

## Hashing

![Hashing](figs/hashing.jpg "Hashing")

* A **hash table** is a collection of items that are stored in a way that makes them easy to find later
* The goal is to design a hash table that allows us to search on the order of $O(1)$

## Hash Table 

![Empty hash table](figs/hash_table_empty.png "Empty hash table")

* This hash table has length 10 and is currently empty
* Slots are named with integers starting at 0

- need a hash function that determines how the elements are positioned within the hash table
- regardless how large your collection is, still O(1)
    - opation of the hash function is constant look up time
    - ideally one mathematical operation to find the element in the slot where it should be. It if is not there, then it is not in the list (have to look in the other items if we do linear probing or have to do linear search through the slot if chaining)
    - looking up key in dict is constant look up time, same for sets because the key is hashed
        - gets the key wehre it should be in memory and returns the value
- You want to come up with a hash table that 
    - (1) minimises the number of collisions because the number of collisions increases the look up time. 
        - Have to use linear search within that slot with chaining approach (get from constant to linear time). We also do not wnat a complex hash function that is computationally intensive. 
    - (2) The second thing we want to do is to distribute the data as evenly as possible. Because you can minimise the number of collisions by having an infinitely large hash table. But then blocked a lot of memory. Want uniformly distributed hash table, all data is mapped uniquely without too many empty spots. 
    - Want to get as much variability without repeating values
- Collisions are bound to happen given how simple our hash function is but we also dont want a very computationally demanding hash function
    - modulo operator almost always used in hash functions because it ensures that each value will be allocated a slot that is contained within the hash table
- common: modulo
    - any answer we get is within range of possible values
    - remainder cannot be larger than the whole part
- is 22 in my data, use hash function, see whether it is in slot 2?
- look up time is O(1)
    - regardless of how large your table is, always dividing same numbers, some more operations depending on hash functions
    - this is how lists and dict are implemented in python:
        - common pytho methods: looking up a key in a dict is, whether my key is in dict, this is constant look up time, the key is hashed, instead of searching sequentially, gets the place where it should be in memory and checks whether it is there

* The **hash function** defines how you map an item to its rightful slot
    * For example, consider **the remainder method**, `i % h`, where `h` is the size of the hash table
        * We need to store `20`, `22`, `34`, `45`, `117`
        * `20 % 10 = 0`
        * `22 % 10 = 2`
        * `34 % 10 = 4`
        * `45 % 10 = 5`
        * `117 % 10 = 7`
    
![Hash table with items](figs/hash_table_filled.png "Hash table with items")

## Hash Table Collisions

A **collision** occurs when more than one item maps to the same slot

![Hash table with collision](figs/hash_table_collision.png "Hash table with collision")

- we want to come up with hash function that minimises the number of collisions, collisions increase look up time, have to use linear search within that slot to look whether the item is there
- want to distribute data as much as possible, can avoid collisions by having large table but have blocked large memory, want nice uniformely distributed, uniquely with not too many empty spots


The goal is to create a hash function that **minimizes the number of collisions, is easy to compute, and evenly distributes the items in the hash table**

## Hash Functions

* The **remainder method** 
    * Guarantees that the result is within the range of slot names
    * Because of this, the modulo arithmetic is typically present in some form in all hash functions

* The **folding method** 
    * Divide the item into equal-size pieces and add them to get the hash value; then use `%`
    * E.g. 04/12/2017 = 04 + 12 + 20 +17 = 53 % 10 = 3 (if table length is 10)

* The **mid-square method** 
    * Square the item and then extract some portion of the resulting digits to get the hash value; then use `%`
    * E.g. 77 = 77^2 = 5929 = 92 % 10 = 2 (if table length is 10)

- remainder method: almost always present in any hash function out there, guaranteees that any slot iwthin size of hash table
- if data in date format, cannot divide it, have to transform it in some kind of way, for dates not all values are allowed, go beyond reasonable data we get ... what did she say here 
- folding method one way to do this 
- **get as much variability without repeating values that is the goal**

## Hash Functions for Strings 

* Map each character to an ordinal value and sum them to get the hash value; then use `%`
* E.g. 'cat' = ord('c') + ord('a') + ord('t') = 99 + 97 + 116 = 312 % 10 = 2 (if table length is 10)

- MY472: every character in computer has unique O representation, unique code
    - function in python called ord() that gives you the numeric part of any unique code incoded character
- many anagrams in English, this method would suggest that anagrams would go to the same slot, thus way too many collisions
    - words are limited by what we pronounce, there are syllables that are always together, thus a lot of similarity vs uniformely distributed data
    - we want more variability in our data, thus weigh their unicode value by their position


- unique code has numeric part which is unique
- ord() gives oyu the numeric part of any character
- take numeric part for eachc h
- anagrams, thus way too many collisions

In [1]:
help(ord)
print(ord('c'), ord('a'), ord('t'))

# takes the numeric part of each character in the string, sums it and divides it by the table size
def hash(string, table_size):
    summ = sum([ord(i) for i in string])
    return summ % table_size

hash('cat', 10)

Help on built-in function ord in module builtins:

ord(c, /)
    Return the Unicode code point for a one-character string.

99 97 116


2

* The problem is that anagrams will always map to the same slot
* One way to fix this is to use the position of the character as a weight

## Exercise 6: Hash Functions for Strings

In [1]:
# Rewrite the hash function below to mutliply the ordinal value 
# for each character by the position of the character

def hash(string, table_size):
    # summ = sum([ord(i) for i in string])
    # enumerate() gives you both the index and the element in a sequence
    summ = sum(([(i + 1) * ord(char) for i, char in enumerate(string)]))
    return summ % table_size

print(hash('cloud', 10), hash('could', 10))

5 5


## Resolving Collisions
 
* Rehashing
* Chaining


- sometimes there will be collisions and these are ways to resolve that 
- restrictions on how large hash table can be

## Rehashing

* If a collision occurs, place item into the next available empty slot (starting from the beginnning, if necessary)
* When searching, continue **probing** until item is found or until you encounter an empty slot
* `rehash(pos) = (pos + skip) % table_size`

- rehashing: instead of chaining, put it into the next available empty slot
    - you do increase the look up time a bit but at least data spread out 
    - guarantees that elements are evenly distributed (BUT: in the book it said if you always take the next available slot, then Klumpen/ clumps, chunks will form)
    - will slightly increase look up time due to probing
        - have estimated location, if this location already occupied, put it in the next skip slot away if that one is empty
        - when rehasing, have to account for this skip
        - is 12 in my table? use hash function to say it should be in position 2 but it is not there, then check next available position
            - if you dont find it, then go to the next available position until you encounter element or EMPTY SLOT
                - if you encounter an empty slot, it means that the element is not in the table, either you find the element or you find empty slot and that means the element is not in the table
- +3 probing or quadratic probing



- will slightly increase look up time because you have to do probing
- once you encounter empty lsot when seaching, either oyu find it or empty spot that means the elemetn is not in the table
- spread data as wide as possible but in systematic way so that you can look for the elements mroe easily see what she said here
- rehashing: do increase look p time but spread it out and not many slots 

- chaining: put element in the same slot but then you have to use linear search

**Linear probing** (`skip = 1`)

![Linear probing for collisions](figs/collision_linear_probing.png "Linear probing for collisions")

Other variants include **plus 3 probing** (`skip = 3`) and **quadratic probing** (`skip = 1, 4, 9, 16, ...`)
        
![Plus 3 probing for collisions](figs/collision_plus3_probing.png "Plus 3 probing for collisions")    


## Chaining

* If a collision occurs, still place item into the proper slot
* When searching, use the hash function to generate the slot and then use a searching technique to find the item in the collection at that slot

![Hash table with collision](figs/hash_table_collision.png "Hash table with collision")

- Chaining
    - keep adding elements at this memory slot
    - now use linear searching, go through each element in that slot, if it is not there, then it is not in the tale
    - both of these carry a bit of an overhead
- idea of hashing is constant look up time but in practice a bit more complicated
- look up time depends on load factor lambda
    - size of data divided by size of hash table
    - if twice as much data, lambda values * 2, more collisions
    - trade off between memory and look up time

* In theory, hashing provides $O(1)$ searching
* In practice, due to collisions, the runtime depends on the **load factor**, or $\lambda = \frac{n}{h}$, where $n$ is the number of items and $h$ is the size of the hash table


## Searching and Sorting Algorithms

* The best sorting algorithm is $O(n \log n)$
* To search an ordered list, use binary search, which is $O(\log n)$ 
* To search an unordered list, the best we can do is $O(n)$
* In practice, sorting and binary search is not always faster than linear search
* Use hash tables for O(1) searches

- main goal of today: practice order of growth analysis and searching and sorting
- best sorting is n log n
    - ways to optimise it for different types of data, but in terms of order of growth it does not get better than this
- for ordered list, binary search way more efficient than linear search
- for most applications, you dont have to worry about this, unless doing lots of searches
- when not orderd, O(n) is the best we can do
- can always sort and then do binary search but in practice might not be worht it
    - depends on ow large the data is and how often you need to search
    - linear search might be good enough
- if lots of searches, you should opt for hash table
    - the part where you might have to implement it yourself
    - no universal way to do this
    - dict and set already use hashing, can rely on what python does but if outside of pyhton, have to implement your hash table
- functional programming in python, extension of week 8 (R), how to do that in python, options for optimising runtime in seminar

-------

* **Lab**: **Problem Set 5**, functional programming in Python
* **Next week**: Basic tree and graph algorithms, course summary, guidance for final project