# SC2001 Project 1 (Lab Group A33, Group 1)
## Integration of Mergesort & Insertion Sort


> In Mergesort, when the sizes of subarrays are small, the overhead of many recursive calls makes the algorithm inefficient. Therefore, in real use, we often combine Mergesort with Insertion Sort to come up with a hybrid sorting algorithm for better efficiency. The idea is to set a small integer S as a threshold for the size of subarrays. <br><br>
> Once the size of a subarray in a recursive call of Mergesort is less than or equal to S, the algorithm will switch to Insertion Sort, which is efficient for small-sized input.


In [3]:
# Imports
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Setting seed to ensure that our data does not change in the future.
np.random.seed(10)

### a) Algorithm Implementation

> Implement the above hybrid algorithm.

### Insertion Sort

In [4]:
def insertionSort(arr):
    
    # Traverse for each element from index 1 to end of list. First element can be ignored.
    for i in range(1, len(arr)):
        key = arr[i] # Pick out the element at the i-th position
        j = i-1 # j will be the running index for all the elements before elem
        
        while j >= 0 and key < arr[j]: # While it hasn't reached the first element and the key is smaller than the previous elements, we will shift the j-th element to the j+1-th position
            arr[j+1] = arr[j]
            j-=1
        
        arr[j+1] = key # If we reach an element that is smaller than the key or the start of the list (j=-1), we will insert the key to the right of that element or at index 0 if it reached start of list.
    return arr

### Hybrid Sort

In [5]:
# Merge function which will be used in the hybrid sort when merge sort is chosen over insertion sort.

def merge(arr1,arr2):
    # initialise indices for each array
    i = 0
    j = 0
    
    # initialise final sorted array
    sorted_arr = []
    
    # While both halves are not empty, we compare the 1st elements of the 2 lists
    while i != len(arr1) and j != len(arr2):
        # if first element of 1st list is smaller, 1st element of first half joins the end of the merged list
        if arr1[i] < arr2[j]:
            sorted_arr.append(arr1[i])
            i += 1
        # else if 1st element of 2nd list is smaller, move the 1st element of 2nd half to the end of the merged list
        elif arr2[j] < arr1[i]:
            sorted_arr.append(arr2[j])
            j += 1
        # else if they are equal, move both the 1st element of the first list and the second list to the merged list
        else:
            sorted_arr.append(arr1[i])
            sorted_arr.append(arr2[j])
            i += 1
            j += 1
    # if first list still has elements, copy all the elements in the first list to the merged list
    while i != len(arr1):
        sorted_arr.append(arr1[i])
        i += 1
    # if second list still has elements, copy all the elements in the second list to the merged list
    while j != len(arr2):
        sorted_arr.append(arr2[j])
        j += 1
    return sorted_arr

In [6]:
# Hybrid Sort takes in parameters of an array and array size limit S.
# If array size is <= S, insertion sort will be used 
# If array size is > S , merge sort will be used instead

def hybridSort(arr,S):
    # Base case. Array has 1 element.
    if len(arr) <= 1:
        return arr
    
    # Recursive step
    
    # Merge Sort 
    if len(arr) > S:
        # Use the middle element to divide the array into two halves
        m = len(arr)//2
        
        # Sort first half recursively
        arr[:m] = hybridSort(arr[:m],S)
        # Sort second half recursively
        arr[m:] = hybridSort(arr[m:], S)
        arr = merge(arr[:m], arr[m:])
        return arr
    
    # Insertion Sort
    else:
        arr = insertionSort(arr)
        return arr

### Testing the sorting algorithms


In [7]:
arr = [4,2,10,100,3,59,43,-1,-8,0,7,12,11,3,3,3]

print(hybridSort(arr,3))

[-8, -1, 0, 2, 3, 3, 3, 3, 4, 7, 10, 11, 12, 43, 59, 100]


## b) Generate input data

> Generate arrays of increasing sizes, in a range from
1,000 to 10 million. For each of the sizes, generate a random dataset of integers in the range of [1, …, x], where x is the largest number you allow for your datasets.

In [8]:
# Generate the required sizes of the arrays programatically from the range of 1,000 to 10 million.

inputDataSizes = []

for i in range(10):
    inputDataSizes.append((i+1) * 1000)
    inputDataSizes.append((i+1) * 10000)
    inputDataSizes.append((i+1) * 100000)
    inputDataSizes.append((i+1) * 1000000)

# Remove duplicates and sort the input data sizes
inputDataSizes = sorted(set(inputDataSizes))
print(inputDataSizes)

[1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 200000, 300000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000]


In [9]:
# List of List of data
inputData = []

# Iterate through the data sizes array
for s in inputDataSizes:
    # For each datasize, we generate a random data array of size (s), each array will contain random integers from 1 to s.
    data = np.random.randint(1,s+1,size = s)
    inputData.append(data)
    
# Checking for the array of size 1000 that the generation was done correctly.    
for i in range(len(inputData)):
    print("Array Size: ", len(inputData[i]))
    print("Min of this array:" , min(inputData[i]))
    print("Max of this array:" , max(inputData[i]))
    print()

Array Size:  1000
Min of this array: 1
Max of this array: 999

Array Size:  2000
Min of this array: 3
Max of this array: 1998

Array Size:  3000
Min of this array: 2
Max of this array: 2998

Array Size:  4000
Min of this array: 1
Max of this array: 4000

Array Size:  5000
Min of this array: 3
Max of this array: 4999

Array Size:  6000
Min of this array: 1
Max of this array: 5999

Array Size:  7000
Min of this array: 1
Max of this array: 6999

Array Size:  8000
Min of this array: 2
Max of this array: 7999

Array Size:  9000
Min of this array: 4
Max of this array: 9000

Array Size:  10000
Min of this array: 2
Max of this array: 10000

Array Size:  20000
Min of this array: 1
Max of this array: 19999

Array Size:  30000
Min of this array: 1
Max of this array: 30000

Array Size:  40000
Min of this array: 1
Max of this array: 39998

Array Size:  50000
Min of this array: 3
Max of this array: 50000

Array Size:  60000
Min of this array: 1
Max of this array: 59999

Array Size:  70000
Min of thi

In [12]:
print(inputData[0])

[266 126 997 528 321 370 124 157 986 734 497 926 882   9  74 257 491  41
 503 421 372 529 357 240 396  55 345 364 123 575 546 201 869 975 690 692
  55  78 454  14 756 410 383 654 861 343 799 671  90 653 322 544 826 805
 284 531  94  78 407 920 607 761 396 669  75 217 394  16 531 465 631  72
 345 396 658 431 136 716 797 802 469 763 865 473  45 646   5  72 345 857
 819 365 183 291 361 784 462 857 656 135 247 598 747 663 524 625 781 733
 481  63 503 186 592 365 233 555 750 314 748 866 819 558 937 986 714 678
 641 915 536 516 798 785 341 339 625 527  52  80 244 786 691  54 922 367
 945 235 786 417 338 746 465  42  91 269 499 287 210 402 145 129 672 358
 202 961 757 381 423 767 535 609 835  68 447  96 356 924  83  63 759 590
 507 364  49 374 990 750 460 628 471 422 780 918 930 608 684 985 737 742
  74 681 358 428 748 347 584 649 726 329 413 256 878 543 124 986 666 207
 466 214 959  14 682 930 517 728 351 541 552 348 104 778 264 151  33 772
 522 181 973 709 159 455 377 885 331 799 138 259 96