# Fundamental Data Structures and Algorithms 03 - Sorting and Recursion

### Unit 2: Basic Algorithms
    2.1 Sorting
        2.1.1 Bubble Sort
        2.1.2 Insertion Sort
        2.1.3 Python's Built-In Sorting Functions
    2.2 Recursion
        2.2.1 Factorial
        2.2.2 Merge Sort
        2.2.3 Analysis of Merge Sort Algorithm

## 2.1 Sorting

**Sorting** refers to <u>bringing a set of items into some well-defined order</u>.

To do this, we first specify the order on which the items are to be sorted it.

For example:
 - numbers: ascending or descending
 - strings: lexicographic or alphabetic order

Sorting is important because having the items in order makes it much easier to find a given item, such as finding the most popular product in an e-commerce platform or a file corresponding to a particular client at a bank. 

If the sorting can be done prior to being used as input to the program, this enables faster access to the required item. This is important because many programs requires large datasets to be searched in real-time.

We will explore 2 common sorting algorithms in computer science namely, 
1. Bubble sort and 
2. Insertion sort.

### 2.1.1. Bubble Sort

"Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. 

Bubble sort is a sorting algorithm that is used to sort items in a list in ascending order. This is done by comparing two adjacent values. If the first value is higher than the second value, then the first value takes the position of the second value while the second value takes the position that was previously occupied by the first value. If the first value is lower than the second value, then no swapping is done. This process is repeated until all the values in a list have been compared and swapped if necessary. Each iteration is usually called a pass. The number of passes in a bubble sort is equal to the number of elements in a list minus one.

The algorithm, which is a comparison sort, is named for the way smaller elements "bubble" to the top of the list. Although the algorithm is simple, it is too slow and impractical for most problems even when compared to insertion sort. It can be practical if the input is usually in sort order but may occasionally have some out-of-order elements nearly in position.

![image.png](attachment:image.png)

#### Algorithm:
Given a list $L$ of $n$ elements with values or records $L_0, L_1, …, L_{n-1}$, bubble sort is applied to sort the list $L$.
1. Compare first two elements $L_0, L_1$ in the list.
2. if $L_1 < L_0$, swap those elements and continue with next 2 elements.
3. Repeat the same step until whole the list is sorted, so no more swaps are possible.
4. Return the final sorted list.

Number of passes = n-1, pass = 1,2,3,4,...,n-1

Number of swap in a particular pass = n-pass-1, if n = 4, pass = 2, then number of swap = 4-2-1 = 1

Maximum number of swaps in a particular pass = n-1

#### Program:

#### program for implementation of Bubble Sort
**Input:**
    `arr = [9, 0, 6, 4]`
    
**Output:**
    Sorted array is: `[0, 4, 6, 9]`

In [4]:
'''bubbleSort definition'''
def bubbleSort(arr):
    n = len(arr)
    for i in range(n-1): # where 'i' is the number of repetitions, number of passes is n-1, i is the index of passes
        for j in range(n-1-i): # and 'j' is the index
            if arr[j] > arr[j+1]:
                arr[j],arr[j+1] = arr[j+1],arr[j] # statement to swap 2 values in an array

In [5]:
'''test bubbleSort function'''
arr = [9,0,6,4]
bubbleSort(arr)
print("Sorted array is: ", arr)

Sorted array is:  [0, 4, 6, 9]


##### Optimized Implementation:
The above function always runs $O(n^2)$ time even if the array is sorted. It can be optimized by stopping the algorithm if inner loop didn’t cause any swap.

In [13]:
'''optimized bubbleSort definition'''
def bubbleSort(arr):
    n = len(arr)
    for i in range(n-1): # where 'i' is the number of repetitions, number of passes is n-1, i is the index of passes
        swapped = False
        for j in range(n-1-i): # and 'j' is the index
            if arr[j] > arr[j+1]:
                arr[j],arr[j+1] = arr[j+1],arr[j] # statement to swap 2 values in an array
                swapped = True
        if not swapped: # or if swapped == False:
            break

In [14]:
'''test optimized bubbleSort function'''
arr = [9,0,6,4,5,6,5,4,5,6,7,8,9,10,15,65,4,36]
bubbleSort(arr)
print("Sorted array is: ", arr)

Sorted array is:  [0, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 9, 10, 15, 36, 65]


#### Time complexity
The time complexity of the bubble sort is $O(n^2)$

The time complexities can be categorized as:

- Worst case – this is where the list provided is in descending order. The algorithm performs the maximum number of executions which is expressed as [Big-O] $O(n^2)$

- Best case – this occurs when the provided list is already sorted. The algorithm performs the minimum number of executions which is expressed as [Big-Omega] $\Omega (n)$

- Average case – this occurs when the list is in random order. The average Complexity is represented as [Big-theta] $\Theta (n^2)$

### 2.1.2 Insertion Sort

Insertion sort is a simple sorting algorithm that works similar to the way you sort playing cards in your hands. The array is virtually split into a sorted and an unsorted part. Values from the unsorted part are picked and placed at the correct position in the sorted part.

![image.png](attachment:image.png)

#### Algorithm
To sort an array $arr$ of size $n$ in ascending order:
1. Iterate from $arr[1]$ to $arr[n]$ over the array.
2. Compare the current element (key) to its predecessor.
3. If the key element is smaller than its predecessor, compare it to the elements before. Move the greater elements one position up to make space for the swapped element.

#### Program for implementation of Insertion Sort
Input: `arr = [7, 8, 4, 1, 2, 6, 9, 5, 3]`

Output: Sorted array is: `[1, 2, 3, 4, 5, 6, 7, 8, 9]`

#### Program:

In [20]:
'''define insertionSort'''
def insertionSort(arr):
    
    #Traverse through 1 to len(arr)
    for i in range(1, len(arr)): # i is the key index
        key = arr[i]
        
        # Move elements of arr[0...i-1], that are greater than key, to one position ahead of their current position
        j = i-1 # j is the index of predecessors of key
        while j >= 0 and key <arr[j]:
            arr[j+1] = arr[j]
            j -= 1
            arr[j+1] = key

In [21]:
'''test insertionSort'''
arr= [7,8,4,1,2,6,9,5,3]
insertionSort(arr)
print("Sorted array is: ",arr)

"""
Number of iterations = (n-1)*(n-1)

First n-1 comes from the for-loop for i, the second n-1 comes from the inner while-loop for comparison.
"""

Sorted array is:  [1, 2, 3, 4, 5, 6, 7, 8, 9]


#### Time Complexities

Worst Case Complexity: $O(n^2)$

Suppose, an array is in ascending order, and you want to sort it in descending order. In this case, worst case complexity occurs.

Each element has to be compared with each of the other elements so, for every nth element, $(n-1)$ number of comparisons are made.

Thus, the total number of comparisons = $n*(n-1) \approx n^2$

Best Case Complexity: $\Omega (n)$  
When the array is already sorted, the outer loop runs for n number of times whereas the inner loop does not run at all. So, there are only n number of comparisons. Thus, complexity is linear.

Average Case Complexity: $\Theta (n^2)$  
It occurs when the elements of an array are in jumbled order (neither ascending nor descending).

### 2.1.3 Python's Built-In Sorting Functions

Python provides two built-in ways to sort data:
1. Sort
2. Sorted

The first is the sort method of the list class. As an example, suppose that we define the following list:

In [23]:
colors = [ "red" , "green" , "blue" , "cyan" , "magenta" , "yellow" ]

That method has the effect of reordering the elements of the list into order, as defined by the natural meaning of the $<$ operator for those elements. In the above example, within elements that are strings, the natural order is defined alphabetically. Therefore, after a call to `colors.sort( )` , the order of the list would become:

In [25]:
'''sort()'''
colors.sort()
print(colors)

['blue', 'cyan', 'green', 'magenta', 'red', 'yellow']


Python also supports a built-in function, named sorted , that can be used to produce a new ordered list containing the elements of any existing iterable container. Going back to our original example, the syntax `sorted(colors)` would return a new list of those colors, in alphabetical order, while leaving the contents of the original list unchanged. This second form is more general because it can be applied to any iterable object as a parameter.

In [26]:
'''sorted()'''
sorted('green')

['e', 'e', 'g', 'n', 'r']

`sort()` basically works with the list itself. It modifies the original list in place. The return value is `None`.

`sorted()` works on any iterable that may include list, dict and so on. It returns another list and doesn't modify the original list.

The time complexity of `sort` is `O(n log n)` both on average and in the worst case. `sorted` is like sort except that the first builds a new sorted list from an iterable while `sort` do sort in place.

## 2.2 Recursion

Any procedure that involve at least one step that invokes (or calls) the procedure itself is known as a recursion.

Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to slip into writing a function which never terminates, or one that uses excess amounts of memory or processor power. However, when written correctly recursion can be a very efficient and mathematically-elegant approach to programming.

![image.png](attachment:image.png)

#### Example with recursion and without recursion

In [None]:
'''Without recursion'''
def numberPrint(n):
    for num in range(n):
        print(num + 1)

In [None]:
numberPrint(4)

**Advantages of Python Recursion**
- Reduces unnecessary calling of function, thus reduces length of program.
- Very flexible in data structure like stacks, queues, linked list and quick sort.
- Big and complex iterative solutions are easy and simple with Python recursion.
- Algorithms can be defined recursively making it much easier to visualize and prove.

**Disadvantages of Python Recursion**
- Slow.
- Logical but difficult to trace and debug.
- Requires extra storage space. For every recursive calls separate memory is allocated for the variables.
- Recursive functions often throw a *Stack Overflow Exception* when processing or operations are too large.

This is a simple function to print all numbers from to , inclusive. However, this function can also be written recursively using the follow

In [None]:
'''With recursion'''
def recursiveNumberPrint(n):


In [None]:
'''test'''
recursiveNumberPrint(4)

### Examples of a recursive function

### 2.2.1 Factorial

The factorial of a positive integer can be used to calculate the number of permutations of elements. The function is defined as:
  
              n! = n x (n-1) x (n-2) x ... x 1
         
with the special case of . This problem can be solved easily using an iterative implementation that loops through the  individual values and computes a product of those values. But it can also be solved with a recursive solution and provides a simple example of recursion. Consider the factorial function on different integer values:

               0! = 1
               1! = 1
               2! = 2 x 1 = 2
               3! = 3 x 2 x 1 = 6
               4! = 4 x 3 x 2 x 1 = 24
               
 

If we carefully inspect of these equations, it becomes obvious that each of the successive equations, for n > 1, can be rewritten in terms of the previous equation:

                0! = 1
                1! = 1 x (1-1)!
                2! = 2 x (2-1)!
                3! = 3 x (3-1)!
                4! = 4 x (4-1)!

Since the function is defined in terms of itself and contains a base case, a recursive definition can be produced for the factorial function as shown here:

![image.png](attachment:image.png)

#### The recursive implementation of the factorial function
**Input:**
    `n=3`

**Output:**
    `6`

In [28]:
'''factorial using recursion'''
def fact(n):
    if n==0:
        return 1
    return n*fact(n-1)

def DPfact(N):
    arr={}
    if N in arr: # if N is found in the array means factorial(N) is already calculated, can return the stored value
        # at arr[N]
        return arr[N]
    elif N == 0 or N == 1: # if N = 1 or 0, 1!, 0! both equal to 1
        return 1
        arr[N] = 1
    else: # N>1, so factorial(N) = N * factorial(N-1)
        factorial = N*DPfact(N - 1)
        arr[N] = factorial # Store factorial(N) into the array
    return factorial

In [30]:
'''test factorial'''
fact(120)

6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000

![image.png](attachment:image.png)

### 2.2.2 Merge Sort

We now turn our attention to using a *divide and conquer strategy* as a way to improve the performance of sorting algorithms. The first algorithm we will study is the **merge sort**. Merge sort is a recursive algorithm that continually splits a list in half. If the list is empty or has one item, it is sorted by definition (the base case). If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list. 

Figure shows our familiar example list as it is being split by mergeSort. 

![image.png](attachment:image.png)

#### Algorithm

- **Step 1** − if it is only one element in the list it is already sorted, return.

- **Step 2** − divide the list recursively into two halves until it can no more be divided.

- **Step 3** − merge the smaller lists into new list in sorted order.

#### Python program for implementation of MergeSort
**Input:**

`nlist  = [12, 11, 13, 5, 6, 7]`    
    
    
**Output:**

Given list is
`12 11 13 5 6 7`

Sorted list is
`5 6 7 11 12 13`

### Program:

In [None]:
'''Python program for implementation of MergeSort '''
def mergeSort(nlist):
    print("Splitting ", nlist)
    if len(nlist)> 1:
        mid = len(nlist)//2 # Finding the mid of the list
        L = nlist[:mid]
        R = nlist[mid:]
        
        #Recursive calls to mergeSort for Left and Right Sub-lists
        mergeSort(L) # Sorting the first half
        mergeSort(R) # Sorting the second half
        
        i = j = k = 0
        
        # Copy data to temp lists L[] and R[]
        
        while i<len(L) and j<len(R):
            if L[i]< R[j]:
                nlist[k] = L[i]
                i+=1
            else:
                nlist[k] = R[j]
                j+=1
            k+=1
            
        # Checking if any element is left
        while i < len(L):
            nlist[k] = L[i]
            i+=1
            k+=1
        
        while j < len(R):
            nlist[k] = R[j]
            j+=1
            k+=1
        print("merging", nlist)
        return(nlist)

In [None]:
# driver code to test the above code 
nlist  = [12, 11, 13, 5, 6, 7]  
print("Given List is:", nlist)
mergeSort(nlist) 
print("Sorted List is: ", end ="\n") 
print(nlist)    

#### Flowchart:

![image.png](attachment:image.png)

#### Time Complexity: 
Time complexity of Merge Sort is $O (n \log{n})$ in all 3 cases (worst, average and best) as merge sort always divides the array into two halves and take linear time to merge two halves.