# Chapter 5 Searching and Sorting

## 5.1 Searching
* **Searching is the process of selectng particular information from a collection of data based on specific criteria.**
    * **Sequence search involves finding an item within a sequence using a search key to identiy the specific item.**
    * **A key is a unique values used to identify the data elements of a collection.**
    * **A key may consist of multiple components, which is also known as a *compound key*.**
    
## 5.1.1 The Linear Search
* **The sequential or *linear search* algorithm. This technique iterates over the sequence, one item at a time, until the specific item is found or all items have been examined.**
    * **In python, a target item can be found in a sequenc using the** in **operator.**
    * **The** in **operator is implemented as a linear search**
![Screen%20Shot%202020-11-19%20at%207.22.56%20PM.png](attachment:Screen%20Shot%202020-11-19%20at%207.22.56%20PM.png)

### Finding a Specific Item
* **The function implements the sequential search algorithm, which results in a boolean value indicating success or failure of the search.**
* **Assuming the sequence contains $n$ items, the inear search has a worst case time of $O(n)$**

In [None]:
def linearSearch(theValues, target):
    n = len( theValues)
    for i in range(n):
        # If the target is in the ith element, return True
        if theValues[i] == target
        return True
    
    # Id not found, return false
    return False

### Searching a Sorted Sequence
* **A sorted sequence is a sequence containing value in a specific order. That is, each value in the array is largetr than its predecessor.**
* **Linear search on a sorted sequence works in the same fashion as that of unsorted seuqnce, but it's possible to terminate the search early when the value is not in the sequence instead of always having to perform a complete traversal.**
* **The time-complexity remains the same. The reason is that the worst case occurs when the value is not in the sequence and is larger than the last element.**

In [2]:
def sortedLinearSearch(theValues, item):
    n = len(theValues)
    for i in range(n):
        # If the target is found in the ith elment, return True
        if theValues[i] == item:
            return True
        # If target is larger than the ith element, it's not in the sequnce
        elif theValues[i] > item:
            return False
        
    return False # The item is not in the sequnce

### Finding the Smallest Value
* **A linear search is performed and we must keep track of the smallest value found for each iteration through the loop.**
    * **To prime the loop, we assume the first value in the sequence is the smallest and start the comparisons at the second item.**
    * **Since the smallest value can occur anywhere in the sequence, we must always perform a complete traversal, resulting in a worst case time of $ O(n) $.**

In [5]:
def findSmallest(theValues):
    n = len(theValues)
    # Assume the first item is the smallest value
    smallest = theValues[0]
    # determine if any other item in the sequence is smaller
    for i in range(1, n):
        if theValues[i] < smallest:
            smallest = theValues[i]
            
    return smallest # Return the smallest found

### 5.1.2 The Binary Search
* **Divide and conquer: entails dividing a larger problem into smaller parts and conquering the smaller parts.**

#### Algorithm Description
* **The algorithm starts by examning the middle item of the sorted sequence, resulting in one of three possible conditions: the middle item is the tagrget value, the target value is less than the middle item, or the target value is larger than the midlle item. Since the sequence is ordered, we can eliminate half the values in the list whemn the target value is not found at the middle position.**

#### Implementation
* **The first step in each iteration is to determine the midpoint in the sequence**
* **If the midpoint contains the target, we immediately return** True. **Otherwise, we determine if the target is less than the item at the midpoint or greater. If it is less, we adjust the** high **market to be oe less than the midpoint, and if it is greater, we adjust the** low **market to be one greater than the midpoint**
![Screen%20Shot%202020-11-24%20at%207.20.10%20AM.png](attachment:Screen%20Shot%202020-11-24%20at%207.20.10%20AM.png)

In [None]:
def binarySearch(theValues, target):
    # start with the entire sequence of elements
    low = 0
    high = len(theValues) - 1
    
    # Repeatedly subdivide the sequence in half until the target is found
    while low <= high:
        # Find the midpoint of the sequence
        mid = (high + low) // 2
        # Does the midpoint contain the target?
        if theValues[i] == target:
            return True
        # Does the target precede the midpoint?
        elif target < theValues[mid]:
            high = mid - 1
        # Or does it follow the midpoint?
        else:
            low = mid + 1
            
    # If the sequence cannot be subdivided further, we're done
    return False

#### Run Time Analysis
* **The input size is repeatedly reduced by half duting each iteration of a loop, there will be $ \log_n$ iterations in the worst case. Thus, the binary search algorith has a worst case time-complexity of $O(\log n) $, which is more efficient than linear search.**

## 5.2 Sorting
* **Sorting is the process of arranging or ordering a colllection of items such that each item and its successor satisfy a presribed relationship.**
* **The ordering of the items is based on the value of a *sorted key.* The key is the value itself when sorting simple types or it can be a specific component or a combination of components when sorting complex types.**

### 5.2.1 Bubble Sort
* **bubble sort: Rearranges the values by iterating over the list multiple times, causing larger values to bubble to the the top or end of the list.**
* **The efficiency of the bubble sort algorithm only depends on the number of keys in the array and is independent of the specific values and the initial arrangement of those values.**
* **The runtime complexity of bubble sort is $ O(n^2).$ Bubble sort is considered one of the most inefficient sorting algorithmms due to the total number of swaps required. Given an array of keys in reverse order, a swap is performed for every iteration of the inner loop, which can be costly in practice.**
* **The bubble sort algorithm can be improved by having it terminate early and not require it to perform all $n^2$ iterations when the sequence is in sorted order.**
    * **We can determine the sequence is in sorted order when no swaps are performed by the $\space \textit{if} \space$ statement within the inner loop. At that point, the function can return immediately withput completing the remaining iterations.**
![Screen%20Shot%202020-11-24%20at%207.42.34%20AM.png](attachment:Screen%20Shot%202020-11-24%20at%207.42.34%20AM.png)

In [6]:
# Sorts a sequence in ascendin order using the bubble sort algorithm
def bubbleSort(theSeq):
    n = len(theSeq)
    # Perform n-1 bubble operations on the sequence
    for i in range(n - 1):
        # Bubble the largest item to the end
        for j in range(i + n - 1):
            if theSeq[j] > theSeq[j + 1]: # Swap the j and j + 1 items
                tmp = theSeq[j]
                theSeq[j] = theSeq[j + 1]
                theSeq[j + 1] = tmp

### 5.2.2 Selection Sort
* **The process starts by finding the smallest value in the sequence and swaps it with the value in the first position in the sequence. THe second smallest value is then found and swapped with the value in the second position.**
* **The selection sort, which makes $n-1$ passes over the array to reposition $n-1$ values, is also $O(n^2)$**
![Screen%20Shot%202020-11-24%20at%207.56.19%20AM.png](attachment:Screen%20Shot%202020-11-24%20at%207.56.19%20AM.png)

In [7]:
def selectionSort(theSeq):
    n = len(theSeq)
    for i in range(n-1):
        # Assume the ith element is the smallest
        smallNdx = i
        # Determine if any other element contains a smaller value
        for j in range(i + 1, n):
            if theSeq[j] < theSeq[smallNdx]:
                smallNdx = j
        # swap the ith value and the smallNdx value only if the 
        # smallest value is not already in its proper position. 
        if smallNdx != i:
            tmp = theSeq[i]
            theSeq[i] = theSeq[smallNdx]
            theSeq[smallNdx] = tmp

### 5.2.3 Insertion Sort
![Screen%20Shot%202020-11-24%20at%207.59.59%20AM.png](attachment:Screen%20Shot%202020-11-24%20at%207.59.59%20AM.png)
* **The insertion sort maintains a collection of sorted items and a collection of items to be sorted.**
* **The algorithms maintains both the sorted and unsorted collections within the same sequence structure.**
* **The algorithm keeps the list of sorted vales at the front of the sequence and picks the next unsorted values from the first of those yet to be positioned.**
* **To position the next item, the correct spo within the seuqnce of sorted value is found by performing a search.**
* **After finding the proper position, the slot has to be opened by shifting the items down one position.**

In [8]:
# Sorts a sequence in ascending order using the insertion sort algorithm
def insertionSort(theSeq):
    n = len(theSeq)
    # Starts with the first item as the only sorted entry
    for i in range(1, n):
        # Save the value to be positioned
        value = theSeq[i]
        # Find the position where value fits in orfered part of the list
        pos = i
        while pos > 0 and value < theSeq[pos - 1]:
            # shift the items to the right during the search
            theSeq[pos] = theSeq[pos -  1]
            pos -= 1
            
        # put the saved value into the open slot
        theSeq[pos] = value

## 5.3 Working with Sorted Lists

### 5.3.1 Maintaining a Sorted List
* **To maintain a sorted list in real time, new items must be inserted into their proper position.**
* **The new items cannot simply be appended at the end of the list as they may be out of order. Instead, we must locate the proper position within the list and use the** insert( ) **method to insert it into the indicated position.**
    * **To find the position of a new item within a sorted list, a modified version of the binary search algorithm can be used. Instead of returning** True **or** Flase **indicating the existnece of a value, we can modify the algorithm to return the index position of the tager if it's actually in the list or where the values should be placed if it were inserted into the list.**

In [None]:
def findSortedPosition(theList, target):
    low = 0
    high = len(theList) - 1
    while low <= high:
        mid = (high + low) // 2
        if theList[mid] == target:
            return mid             # Index of the target
        elif target < theList[mid]:
            high = mid - 1
        else: 
            low = mid + 1
            
        return low # index where the target should be

### 5.3.2 Merging Sorted Lists
* **The itmes in the original list are not removed, but instead copied to the new list.**
* **The process of merging the two lists begins by creating a new empty list and initializing the two index variables to zero. A loop is used to repeat the process of selecting the next largest value to be added to the new merged list.**
* **During the iteration of the loop, the value at $listA[a]$ is compared to the value at $listB[b]$. The largest of these two values is added or appended to the new list. If the two values are equal, the value from $listB$ is chosen.**
* **As values are copied from the original lists to the merged list, one of the two index variables $a$ or $b$ is incremented to indicate the next largest value in the corresponding list.**
![Screen%20Shot%202020-11-24%20at%209.06.18%20AM.png](attachment:Screen%20Shot%202020-11-24%20at%209.06.18%20AM.png)

In [13]:
# Merge two sorted lists to create and return a new sorted list
def mergeSortedLists(listA, listB):
    # Create the new list and initialize the list markers
    newList = list()
    a = 0
    b = 0
    
    # Merge the two lists together until one is empty
    while a < len(listA) and b < len(listB):
        if listA[a] < listB[b]:
            newList.append(listA[a])
            a += 1
        else:
            newList.append(listB[b])
            b += 1
    
    # If listA contains more items, append them to newList
    while a < len(listA):
        newList.append(listA[a])
        a += 1
        
    # Or if listB contains more items, append them to newList
    while b < len(listB):
        newList.append(listB[b])
        b += 1
        
    return newList

In [9]:
%%bash
git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   chapter_1_ADT.ipynb
	modified:   chapter_2_array.ipynb
	modified:   chapter_3_set_maps_multiD_array.ipynb
	modified:   cpt_4_algo_analy.ipynb

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.ipynb_checkpoints/
	searching_and_sorting.ipynb

no changes added to commit (use "git add" and/or "git commit -a")


In [10]:
%%bash
git add "searching_and_sorting.ipynb"

In [11]:
%%bash
git commit -m"add searhing and sorting"

[main 3a7099f] add searhing and sorting
 1 file changed, 368 insertions(+)
 create mode 100644 searching_and_sorting.ipynb


In [12]:
%%bash
git push

To https://github.com/lzeng11bc/PythonDataStructuresAndAlgorithms.git
   03613be..3a7099f  main -> main
