# Chapter 2: selection sort

This chapter focuses on understanding arrays and linked lists and selection sort.

In [1]:
# Setup
import numpy as np
import math

## How memory works

Computers essentially use slots of physical space to store data, with each space given a particular memory address.  

## Arrays and linked lists

The key difference between storing data in an array versus a linked list is that the former stores data in continguous stretches in memory.  This is why growing arrays in for loops can be very slow, because with each iteration of the loop, we need to find a new area in memory to store the data in one contiguous stretch.  Consequently, one way to mitigate this is to physically allocate a given amount of space in memory before performing additional computation.  The only two downsides here is that if you don't use all of the memory you've requested, it's not available for anything else.  And, if you don't know exactly how much memory to allocate, you can't add more than what you've asked for without moving.  

In contrast, linked lists can use random memory addresses, because each item in the list stores the address of the next item.  This makes storing additional data in a linked list easy, since we just need to find a new open memory address to store the data.  If we want to access all of the items at once, linked lists work well.  However, if we want to access different elements in the linked list at different indexes, the problem becomes much more difficult because in order to find the address of the piece of data we're interested in, we need to know the address of all of the items in the linked list before it.

## Exercise


**2.1) If we're building an app to keep track of finances and we're constantly adding new items but rarely reading items, I would choose to use a linked list.  The prompt mentions that at the end of the month, we just want to review our expenses and sum up how much we spent-- in other words, we want to read all of the data at once.  While linked lists have a time complexity of O(n) for reading (since we need to review all of the predecing data to find the correct address), the time complexity for inserting a new element is constant with O(1).**  

If we want to add or delete an item in the middle of a sequence, linked lists are better because we only need to modify what the previous element points to in memory.  In contrast, the same operations with arrays would involve shifting many previous elements over.  In the case of insertions, it may involve copying the entire array to a new contiguous location, if there's not enough space already.

In terms of accessing information, there are two types-- random access and sequential access.  With random access data structures, we can access any random piece of data within the sequence.  In the case of arrays in particular, since all elements are stored in a contiguous sequence, we know the location of every element.  In contrast, sequential access data structures require that we access all of the preceding n-1 elements to get access to the nth element.

## Exercises

**2.2) Basic premise of the scenario-- we're creating an app for restaurants to take customer orders.  Servers put the orders in and chefs take them off the order list and make the dishes.**

In this situation, I would choose to use a linked list.  We know that in general, chefs aren't going to need random access to data in the queue (given in the prompt) and servers will be constantly inserting elements at the end of the list.  To insert items, the time complexity for arrays is O(n), while it's O(1) for linked lists.  The time complexity for each data structure to delete items (i.e., for the chef to pull orders off the queue) is the same: O(n) for arrays, and O(1) for linked lists.  

**2.3) In this scenario, we need to decide whether to implement a linked list or array for Facebook to check usernames when people try to log in.  We're also told that Facebook will implement binary search to scan through the available usernames.**

In this situation, we know that we'll use binary search, which has time complexity O(log n) for sorted lists.  We'll also need random access because the username could be anywhere in the sequence of usernames and binary search requires random access.  For these reasons, we'll need to use an array.

**2.4) This question asks to describe the pros and cons of using an array to store the list of users.  Specifcally, what happens when we add new users to the list.**

Arrays are great for storing data in situations that require random access.  Specifically, the time complexity for random access using an array is O(1).  However, arrays are stored in contiguous chunks of memory, which means that if we want to add a new user, we may have to copy the entire contents of the array to a new contiguous chunk of memory.  Binary search using an array has time complexity O(log n), which means that as the size of the array grows, time will not necessarily be a problem.  However, we do need to think about sorting the new array, which could take a substantial amount of time depending on the algorithm used.

**2.5) Here, we're told that Facebook uses a hybrid data structure-- an array composed of linked lists.  Each slot in the array stores one of 26 letters of the alphabet, each pointing to a linked list.  For example, slot at index 0 points to a linked list of all usernames starting with "A".  We're asked to compare whether the new data structure would be faster or slower.**

This is an interesting case and we can break this down to qualitatively compare searching and inserting funcionality.  In terms of search, this will be slower than a standard array.  Arrays have time complexity of O(1) for searching, while linked lists have time complexity of O(n).  So, we can use random access to find the correct linked list like an array, but then we'll still need to use sequential access to search and find the data we want.

In terms of inserting, it's likely about the same as a linked list because the actual data structure we're inserting elements into is a linked list.

## Selection sort

Selection sort means that we "select" each element and append it to a new, sorted list.  However, because we need to view each of n elements, and we'll need to do that n times to create the new sorted list, the time complexity is O(n^2).  

Below is some sample code to create a selection sort algorithm.  I've broken down the function into individual components to describe what's happening in each section.  


In [12]:
# Define a test array
arr = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

In [11]:
# First, set the baseline smallest value and index
# Then, iterate over the array and compare the baseline smallest value
# If the value in the array is smaller than the baseline, reset the new smallest value
smallest_value = arr[0]
smallest_index = 0
for i in range(1, len(arr)):
    print(f'Value at index {i} is {arr[i]}')
    if arr[i] < smallest_value:
        print(f'Smallest value is now {arr[i]}')
        print(f'Smallest index is now {i}\n')


Value at index 1 is 9
Smallest value is now 9
Smallest index is now 1

Value at index 2 is 8
Smallest value is now 8
Smallest index is now 2

Value at index 3 is 7
Smallest value is now 7
Smallest index is now 3

Value at index 4 is 6
Smallest value is now 6
Smallest index is now 4

Value at index 5 is 5
Smallest value is now 5
Smallest index is now 5

Value at index 6 is 4
Smallest value is now 4
Smallest index is now 6

Value at index 7 is 3
Smallest value is now 3
Smallest index is now 7

Value at index 8 is 2
Smallest value is now 2
Smallest index is now 8

Value at index 9 is 1
Smallest value is now 1
Smallest index is now 9



First, we set a baseline smallest value as the first element in the array (i.e., the element at index 0).  We'll also set the smallest index as 0.  Then, we'll search through the array, starting at index 1, and ask if element in the array is smaller than the baseline smallest value.  If that's true, then we'll reset the baseline smallest value and its index.  Since we started with an array from 10 to 1, for each iteration of the list, the baseline smallest value will be updated until it's 1.  Now let's turn this into a function.

In [16]:
def find_smallest(arr):
    """Find the smallest value in an array and return its index."""
    
    # Set baselines
    smallest_value = arr[0]
    smallest_index = 0
    
    # Search through the array to find the smallest value and index
    for i in range(1, len(arr)):
        if arr[i] < smallest_value:
            smallest_value = arr[i] # reset baseline value
            smallest_index = i # reset baseline index
            
    return smallest_index

In [17]:
# Test function
assert find_smallest(arr) == 9, f'Smallest index should be 9.'

Now we'll use the same approach to break down a function for performing the selection sort using the `find_smallest` function.

In [23]:
# First, define original array again and new sorted array
arr = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
sorted_arr = []

# Sort through the original array--
# Find the smallest index 
# Pop its value off the original array
# Append it to the new sorted array
for i in range(len(arr)):
    smallest_value = find_smallest(arr)
    print(f'Smallest value at iteration {i+1}: {arr[smallest_value]}')
    
    sorted_arr.append(arr.pop(smallest_value))
    print(f'Contents of sorted array at iteration {i+1}:\n{sorted_arr}\n')

Smallest value at iteration 1: 1
Contents of sorted array at iteration 1:
[1]

Smallest value at iteration 2: 2
Contents of sorted array at iteration 2:
[1, 2]

Smallest value at iteration 3: 3
Contents of sorted array at iteration 3:
[1, 2, 3]

Smallest value at iteration 4: 4
Contents of sorted array at iteration 4:
[1, 2, 3, 4]

Smallest value at iteration 5: 5
Contents of sorted array at iteration 5:
[1, 2, 3, 4, 5]

Smallest value at iteration 6: 6
Contents of sorted array at iteration 6:
[1, 2, 3, 4, 5, 6]

Smallest value at iteration 7: 7
Contents of sorted array at iteration 7:
[1, 2, 3, 4, 5, 6, 7]

Smallest value at iteration 8: 8
Contents of sorted array at iteration 8:
[1, 2, 3, 4, 5, 6, 7, 8]

Smallest value at iteration 9: 9
Contents of sorted array at iteration 9:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Smallest value at iteration 10: 10
Contents of sorted array at iteration 10:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]



Breaking down each of the steps: 1) first, we need to define an empty array to hold the sorted values; 2) now we need to iterate n times, where n is equal to the length of the original array; 3) for each iteration, we find the smallest index with `find_smallest`; 4) we then pop off the smallest value from the original array and append it to the new array; 5) that was one iteration, and now our original array will have one fewer element.

Let's put all of this code (minus the print statements) into a function.

In [24]:
def selection_sort(arr):
    """Return new sorted array using the selection sort algorith."""
    sorted_arr = []
    for i in range(len(arr)):
        smallest_index = find_smallest(arr)
        sorted_arr.append(arr.pop(smallest_index))
        
    return sorted_arr

In [33]:
# Test
arr = list(range(10, 0, -1)) # 10 to 1
sorted_arr = list(range(1, 11)) # 1 to 10

assert selection_sort(arr) == sorted_arr, f'Sorted array should be {sorted_arr}'