# Data Structures Course 
Shiraz University Fall 2024    
Reza Rezazadegan   
[www.dreamintelligent.com](www.dreamintelligent.com) 

# 5- Advanced sorting

Sorting algorithms can be devided into two categories:
- **Comparison sorts:** sorting is done using pairwise comparisons. Examples: merge, insertion, selection and quicksort.
- **Distribution sorts:** the keys are divided into intermediate groups based on their values. For example, integers can first be sorted according to their righmost digit and so on. Example: Radix sort. 


# Quick Sort

Quick sort algorithm is somewhat similar to Merge Sort however:

- It splits the array at some _pivot point_ and not at the middle.
- It does not use temporary storage. 

Algorithm description:

1- The first key is used as _pivot_ $p$ and the array is split into two segments: ones less than $p$ (denoted L) and the ones greater than or equal to $p$ (denoted $G$).

2- The algorithm is applied recursively to both L and G.

3- L, p and G are merged back to the original array. 

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Since we want not to use extra memory, we swap sequence elements around the pivot.
More precisely we use two pointers:
- `left` starting from the cell after pivot and moving to the right
- `right` starting at the end of the array and moving to the left  
Whenever the item at `left` is greater than or equal to the pivot and the item at the `right` is smaller than the pivot, we swap them.   
At the end we swap the pivot so that it lies in between L and G.


![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)
![image-4.png](attachment:image-4.png)
![image-5.png](attachment:image-5.png)

![image-6.png](attachment:image-6.png)

![image-7.png](attachment:image-7.png)
![image-8.png](attachment:image-8.png)

In [None]:
# Sorts an array or list using the recursive quick sort algorithm.
def quickSort( theSeq ):
    n = len( theSeq )
    recQuickSort( theSeq, 0, n-1 ) 
    
    
# The recursive implementation using virtual segments.
def recQuickSort( theSeq, first, last ):
    # Check the base case.
    if first >= last :
        return
    else :
        # Save the pivot value.
        pivot = theSeq[first]

        # Partition the sequence and obtain the pivot position.
        pos = partitionSeq( theSeq, first, last )

        # Repeat the process on the two subsequences.
        recQuickSort( theSeq, first, pos - 1 )
        recQuickSort( theSeq, pos + 1, last )

# Partitions the subsequence using the first key as the pivot.
def partitionSeq( theSeq, first, last ):
    # Save a copy of the pivot value.
    pivot = theSeq[first]

    # Find the pivot position and move the elements around the pivot.
    left = first + 1
    right = last
    while left <= right :
        # Find the first key larger than the pivot.
        while left < right and theSeq[left] < pivot :
            left += 1

        # Find the last key in the sequence that is smaller than the pivot.
        while right >= left and theSeq[right] >= pivot :
            right -= 1

        # Swap the two keys if we have not completed this partition.
        if left < right :
            tmp = theSeq[left]
            theSeq[left] = theSeq[right]
            theSeq[right] = tmp

    # Put the pivot in the proper position.
    if right != first :
        theSeq[first] = theSeq[right]
        theSeq[right] = pivot

    # Return the index position of the pivot value.
    return right

## Efficiency

The quick sort algorithm has an average or expected time of O(n log n) but runs in
O(n^2) in the worst case, see Exercise 1. Even though
quick sort is quadratic in the worst case, it does approach the average case in many
instances

# Radix sort

Radix sort is a distribution sort algorithm and can be used to sort many types of keys, including positive integers,
strings, and floating-point values.

Radix sort is a special case of the _tuple sort_ which sorts tuples according to the lexicographic order. In tuple sort we always start from the least inmportant component. 

The process starts by distributing the values among the various bins based on
the digits in the ones column. If keys have
duplicate digits in the ones column, the values are placed in the bins in the order
that they occur within the list.

![image.png](attachment:image.png)

In [1]:
# Sorts a sequence of positive integers using the radix sort algorithm. 
#from llistqueue import Queue
#from array import Array 
def radixSort( intList, numDigits ):
    # Create an array of queues to represent the bins.
    binArray = Array( 10 )
    for k in range( 10 ):
         binArray[k] = Queue()

     # The value of the current column.
    column = 1

    # Iterate over the number of digits in the largest value.
    for d in range( numDigits ):

    # Distribute the keys across the 10 bins.
        for key in intList :
            digit = (key // column) % 10
            binArray[digit].enqueue( key )

        # Gather the keys from the bins and place them back in intList.
        i = 0
        for bin in binArray :
             while not bin.isEmpty() :
                intList[i] = bin.dequeue()
                i += 1

         # Advance to the next column value.
        column *= 10



## Efficiency

Assume a sequence of $n$ keys in which each
key contains $d$ components in the largest key value and each component contains a
value between 0 and $k−1$. Also assume we are using the linked list implementation
of the Queue ADT, which results in $O(1)$ time queue operations.

The array used to store the $k$ queues and the creation of the queues themselves
can be done in $O(k)$ time. The distribution and gathering of the keys involves two
steps, which are performed d times, one for each component:

- The distribution of the $n$ keys across the k queues requires $O(n)$ time since
an individual queue can be accessed directly by subscript.
- Gathering the $n$ keys from the queues and placing them back into the sequence requires $O(n)$ time. Even though the keys have to be gathered from $k$ queues, there are $n$ keys in total to be dequeued resulting in the `dequeue()` operation
being performed $n$ times. 
The distribution and gathering steps are performed $d$ times, resulting in a time
of $O(dn)$. 


Combining this with the initialization step we have an overall time of
$O(k+dn)$. The radix sort is a special purpose algorithm and in practice both $k$ and
$d$ are constants specific to the given problem, resulting in a linear time algorithm.
For example, when sorting a list of integers, $k = 10$ and d can vary but commonly
$d < 10$. Thus, the sorting time depends only on the number of keys.


# Sorting linked lists
## Insertion sort
We unlink each node from the linked list and add it to a new sorted list.
![image.png](attachment:image.png)

In [6]:
# Sorts a linked list using the technique of the insertion sort. A
 # reference to the new ordered list is returned. 34
def llistInsertionSort( origList ):
    # Make sure the list contains at least one node.
    if origList is None :
        return None 
    # Iterate through the original list.
    newList = None
    while origList is not None :
        # Assign a temp reference to the first node.
        curNode = origList

        # Advance the original list reference to the next node.
        origList = origList.next

        # Unlink the first node and insert into the new ordered list.
        curNode.next = None
        newList = addToSortedList( newList, curNode )

        # Return the list reference of the new ordered list.
    return newList


def addToSortedList(head, newnode):
    if head==None:
        head=newnode
        return head
    
    curnode=head
    prednode=None
    while(curnode!=None and curnode.value< newnode.value):
        prednode=curnode
        cunode=cunode.next

    if cunode==None:
        prednode.next=newnode
    if curnode==head:
        newnode.next=head
        head=newnode

    else:
        prednode.next=newnode
        newnode.next=curnode

    return head    




![image.png](attachment:image.png)

# Merge Sort

The linked list is recursively subdivided into smaller linked lists during each
recursive call, which are then merged back into a new ordered linked list.

![image.png](attachment:image.png)

In [7]:
# Sorts a linked list using merge sort. A new head reference is returned.
def llistMergeSort( theList ): 
    # If the list is empty (base case), return None.
    if theList is None :
        return None 
    # Split the linked list into two sublists of equal size.
    rightList = _splitLinkedList( theList )
    leftList = theList

    # Perform the same operation on the left half...
    leftList = llistMergeSort( leftList )

    # ... and the right half.
    rightList = llistMergeSort( rightList )

    # Merge the two ordered sublists.
    theList = _mergeLinkedLists( leftList, rightList )

    # Return the head pointer of the ordered sublist.
    return theList

# Splits a linked list at the midpoint to create two sublists. The
# head reference of the right sublist is returned. The left sublist is
# still referenced by the original head reference.
def _splitLinkedList( subList ):

    # Assign a reference to the first and second nodes in the list.
    midPoint = subList
    curNode = midPoint.next

    # Iterate through the list until curNode falls off the end.
    #the curNode reference will advance twice as fast as the midPoint reference. 
    while curNode is not None :
        # Advance curNode to the next node.
        curNode = curNode.next

        # If there are more nodes, advance curNode again and midPoint once.
        if curNode is not None :
            midPoint = midPoint.next
            curNode = curNode.next

    # Set rightList as the head pointer to the right sublist.
    rightList = midPoint.next
    # Unlink the right sub list from the left sublist.
    midPoint.next = None
    return rightList

# Merges two sorted linked list; returns head reference for the new list.
def _mergeLinkedLists( subListA, subListB ):
    # Create a dummy node and insert it at the front of the list.
    newList = ListNode( None )
    newTail = newList

    # Append nodes to the new list until one list is empty.
    while subListA is not None and subListB is not None :
        if subListA.data <= subListB.data :
            newTail.next = subListA
            subListA = subListA.next
        else :
            newTail.next = subListB
            subListB = subListB.next

        newTail = newTail.next
        newTail.next = None

    # If self list contains more terms, append them.
    if subListA is not None :
        newTail.next = subListA
    else :
        newTail.next = subListB

    # Return the new merged list, which begins with the first node after
    # the dummy node.
    return newList.next

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)

# Exercises

1- Compute the computational complexity of Quick Sort in the worst case.  

2- Using decision trees show that a comparison sort algorithm can not be faster than $O(n\log n)$.

3- Given the following sequence of keys (80, 7, 24, 16, 43, 91, 35, 2, 19, 72), trace the indicated algorithm to produce a recursive call tree when sorting the
values in descending order.
- merge sort
- quick sort  

4- Show the distribution steps performed by the radix sort when ordering the
following list of keys:   
- a) "MS", "VA", "AK", "LA", "CA", "AL", "GA", "TN", "WA", "DC"  
- b) 135, 56, 21, 89, 395, 7, 178, 19, 96, 257, 34, 29  

5- Prove that Quick Sort algorithm is correct.

