# DSCI 6001 1.3 Lecture


## An introduction to Sorting

### By the end of this lecture you will be able to:
1. Describe the general sorting problem in your own words
2. Describe and Implement the Selection Sort Algorithm
3. Describe and Implement the Insertion Sort Algorithm
4. Describe and Implement the Mergesort Algorithm

### The general sorting problem

Sorting an array of values:

$$\{4, 5, 3, 1, 9, 2, 8, 0, 7, 6\} \rightarrow \{0, 1, 2, 3, 4, 5, 6, 7, 8, 9\}$$

is one of the great classical problems in computer science. This problem has been solved many times and in many different ways. The development and modifications of novel sorting algorithms is something of an art among academic computer scientists. It is not particularly important that you be able to do this off-the-cuff as a data scientist, but the skills and background are absolutely necessary.


### Ordering

The nature of the items to be sorted by the list can vary, for example, alphanumeric characters, integers or floating point values. We might also consider things such as vectors and geometric objects. In any of these cases, there must be a way of effectively comparing two items. For integers and floating values, comparisons are easily made using the standard comparison operations ( <, >, <=, >=, ==, != ), guaranteed to work by the fundamental theorem of arithmetic. Alphanumeric characters are ordered naturally in the way that they are taught (i.e. a:1, b: 2, c: 3, etc.). For other objects, such as vectors or alphabet characters, we have to

In Python, we typically do this by overriding the comparison operators:

In [1]:

import math
class Circle(object):
    def __init__(self, x=0, y=0, r=0):
        # origin point in R2 (x, y)
        self.x = x
        self.y = y
        # radius
        self.r = r
    def __str__(self):
        return "Circle at (%d , %d). Radius: %f" % (self.x, self.y, self.r)
    def calcArea(self, r):
        self.r = r
        return (math.pi)*(r**2)
    def __gt__(self, circle2):
        return self.r > circle2.r
    def __ge__(self, circle2):
        return self.r >= circle2.r
    def __lt__(self, circle2):
        return self.r < circle2.r
    def __le__(self, circle2):
        return self.r <= circle2.r
    def __ne__(self, circle2):
        return self.r != circle2.r
    def __eq__(self, circle2):
        return self.r == circle2.r
    
    #And so on for __lt__(), __le__(), __ne__(), etc
    
def main():
    A = Circle(3,4,1.5) 
    B = Circle(1,2,5.0)
    C = Circle(5,7,7) 
    D = Circle(9,8,3)
    print A < B, B > C, A < C, A >= C
main()
#Output should be "True, False, True, False"

True False True False


## QUIZ: 

Finish filling out the above code stub.

# Sorting Algorithms

Charting [sorting algorithms](http://www.sorting-algorithms.com/) is important to understanding them. It is important that you learn to work out drawing sorting charts by hand. 

The main goal with sorting algorithms is to have an algorithm that is:

1) Efficient - minimizes search complexity
2) Adaptable - reacts to partially ordered data
3) Flexible - can efficiently handle data with few unique values


## Selection Sort

The first algorithm that you will learn is [selection sort](http://www.sorting-algorithms.com/selection-sort). Selection sort is the most obvious and slowest possible method of sorting. It works by going through the array, rank-by-rank, and exchanging the item currently in that rank with the next smallest item. It only looks at items after the current rank, not before. The pseudocode is given below:


    def selection_sort(a):
        # sanity check for array status and sortability
        for i in a:
            k = i
            # look ahead in the array for the next max
            for j in a[i+1:]:
                if a[k] > a[j]:
                    # mark the point at which we found the last max
                    k = j
                    # make the swap
                    swap(a[i],a[k])

## QUIZ:

Chart the way this algorithm works by hand on a test array of 5 items.


The problem with selection sort is its performance. It doesn't adapt well to partially ordered arrays, and requires a large number of operations, $N$ operations for the rank searching in the index $i$ and $N$ operations for the lookahead. This means that there are always $N \times N$ array accesses, and thus the algorithm always runs in $O(N^2)$ time. However, it has the property of minimizing the number of required swaps, so selection sort might be used in extreme cases where swap cost is very high (larger than $O(N^2)$).

## Insertion Sort

The [insertion sort](http://www.sorting-algorithms.com/insertion-sort) algorithm also runs in $O(N^2)$ time. It maintains a sorted sublist in the lower positions of the list. Each new item is then “inserted” back into the previous sublist such that the sorted sublist is one item larger. This leads to N array accesses in the forward direction and N-1 array accesses in the reverse direction, thus $O(N^2)$ time.

Pseudocode is given below:

    def insertion_sort(a):
        N = len(a)
        for i in xrange(N):
            current = a[i]
            j = i
            # look backward over the list so far and swap all elements backward
            while j>0 and a[j-1] > current:
                a[j] = a[j-1]
                j = j-1

            # set the current element to the current max
            a[j] = current


##QUIZ:

[15, 5, 4, 18, 12, 19, 14, 10, 8, 20] which list represents the partially sorted list after four complete passes of insertion sort?

# Higher Performance Sorts

Sorting performance can be enhanced by splitting the array to be sorted into partitions. There is 60 years of history of advanced sorts that we are not at liberty to cover, but there are three you should be aware of and one we shall cover. 

The first, [shellsort](https://en.wikipedia.org/wiki/Shellsort), was invented by Don Shell in 1959. It is essentially an insertion sort that uses a series of gaps to break down the array into smaller sublists. 

The second [quicksort](https://en.wikipedia.org/wiki/Quicksort)


We will cover [mergesort]() in some detail today

In [150]:
def merge(left, right):
    if not len(left) or not len(right):
        return left or right

    result = []
    i, j = 0, 0
    while (len(result) < len(left) + len(right)):
        if left[i] < right[j]:
            result.append(left[i])
            i+= 1
        else:
            result.append(right[j])
            j+= 1
        if i == len(left) or j == len(right):
            print(result)
            result.extend(left[i:] or right[j:])
            print(result,'post')
            break 

    return result

def mergesort(list):
    if len(list) < 2:
        return list

    middle = int(len(list)/2)
    left = mergesort(list[:middle])
    right = mergesort(list[middle:])
    print(left,right)
    return merge(left, right)


In [151]:
mergesort(test)

[1] [4]
[1]
[1, 4] post
[2] [5]
[2]
[2, 5] post
[1, 4] [2, 5]
[1, 2, 4]
[1, 2, 4, 5] post
[2] [6]
[2]
[2, 6] post
[3] [4]
[3]
[3, 4] post
[2, 6] [3, 4]
[2, 3, 4]
[2, 3, 4, 6] post
[1, 2, 4, 5] [2, 3, 4, 6]
[1, 2, 2, 3, 4, 4, 5]
[1, 2, 2, 3, 4, 4, 5, 6] post


[1, 2, 2, 3, 4, 4, 5, 6]

In [3]:
test = [1,4,2,5,2,6,3,4]

In [141]:
            
            
            
def merge(left, right):
    if not len(left) or not len(right):
        return left or right

    result = []
    i, j = 0, 0
    while (len(result) < len(left) + len(right)):
        if left[i] < right[j]:
            result.append(left[i])
            i+= 1
        else:
            result.append(right[j])
            j+= 1
        if i == len(left) or j == len(right):
            result.extend(left[i:] or right[j:])
            break 

    return result

def mergesort(list):
    if len(list) < 2:
        return list

    middle = len(list)/2
    left = mergesort(list[:middle])
    right = mergesort(list[middle:])

    return merge(left, right)


IndentationError: expected an indented block (<ipython-input-141-f3b644cb246f>, line 12)

In [140]:
merge_sort(test)

<class 'list'> <class 'list'>


TypeError: object of type 'type' has no len()

In [36]:
ll=[[2],[4],[5]]
len(ll)

3

In [17]:
ll.remove([2])
ll

[[4], [5]]

In [16]:
ll[1]>ll[2]

False

In [78]:
t = [2,4,5,5,2]
r=[4,3]

In [79]:
max(t,r)

[4, 3]