## Tier 1. Module 3: Basic Algorithms and Data Structures

## Topic 4 - Sorting algorithms

## Homework

### Task 1

Python has two built-in sorting functions: sorted and sort. Python sort functions use Timsort, a hybrid sorting algorithm that combines merge sort and insertion sort.

Compare three sorting algorithms: merge, insertion, and Timsort in terms of execution time. The analysis should be supported by empirical data obtained by testing algorithms on different data sets. Empirically test theoretical estimates of the complexity of algorithms, for example, by sorting on large arrays. To measure the execution time of algorithms, use the timeit module.

Show that the combination of merge sort and insertion sort makes the Timsort algorithm much more efficient, and it is for this reason that programmers, in most cases, use Python's built-in algorithms rather than coding them themselves. Draw conclusions.

1.1 - Merge sort algorithm

In [40]:
def merge_sort(arr: list) -> list:
    """
    A recursive function that splits an array (list or tuple) into
    two halves and then joins them sorted with an outer function

    :param arr: an array to be sorted
    :return: merged and sorted list
    """
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left_half = merge_sort(arr[:mid])
    right_half = merge_sort(arr[mid:])

    return merge(left_half, right_half)


def merge(left: list, right: list) -> list:
    """
    A function to merge two separate lists into one sorted list

    :params left & right: two lists to be merged
    :return: merged and sorted list
    """
    merged = []
    left_idx, right_idx = 0, 0

    while left_idx < len(left) and right_idx < len(right):
        if left[left_idx] < right[right_idx]:
            merged.append(left[left_idx])
            left_idx += 1
        else:
            merged.append(right[right_idx])
            right_idx += 1

    merged.extend(left[left_idx:])
    merged.extend(right[right_idx:])

    return merged

1.2 - Insertion sorting algorithm

In [41]:
def insertion_sort(arr: list) -> list:
    """
    A function to sort an array (list or tuple) using the insertion
    sort method

    :param arr: an array to be sorted
    :return: sorted array
    """
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while j >= 0 and key < arr[j]:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key

    return arr

1.3 - Random test array generation function

In [42]:
import random


def generate_data(size: int) -> list:
    return [random.randint(0, 1000) for _ in range(size)]

1.4 - Algorithm testing function

In [43]:
import timeit


def test_algorithms(algorithms: dict, data_sizes: list) -> dict:
    """
    Function for testing sorting algorithms on randomly generated lists of
    numbers

    :param algorithms: a dictionary, where the keys are algorithm names, and
    the values are functions or methods with an implemented sorting algorithm
    :param data_sizes: a list of integer values representing the length of
    random arrays to be generated for testing algorithms
    :return: a dictionary with algorithm names as keys and times spent sorting
    random arrays as values
    """
    results = {}
    for algo_name, algo_func in algorithms.items():
        results[algo_name] = {}
        for size in data_sizes:
            data = generate_data(size)
            time_taken = timeit.timeit(lambda: algo_func(data.copy()), number=10)
            results[algo_name][size] = time_taken
    return results

1.5 - Testing algorithms

In [44]:
algorithms = {
    "Merge Sort": merge_sort,
    "Insertion Sort": insertion_sort,
    "Timsort (Python built-in)": sorted,
}
data_sizes = [100, 1000, 10000]

results = test_algorithms(algorithms, data_sizes)
for algo, timings in results.items():
    print(f"Algorithm: {algo}")
    for size, time_taken in timings.items():
        print(f"Data size: {size:<7} Time taken: {time_taken:.6f} seconds")
    print()

Algorithm: Merge Sort
Data size: 100     Time taken: 0.001393 seconds
Data size: 1000    Time taken: 0.018338 seconds
Data size: 10000   Time taken: 0.237893 seconds

Algorithm: Insertion Sort
Data size: 100     Time taken: 0.001766 seconds
Data size: 1000    Time taken: 0.241139 seconds
Data size: 10000   Time taken: 22.663815 seconds

Algorithm: Timsort (Python built-in)
Data size: 100     Time taken: 0.000036 seconds
Data size: 1000    Time taken: 0.000431 seconds
Data size: 10000   Time taken: 0.011314 seconds



### Conclusions:


1. Initialization

As can be seen from the test results, merge sort is significantly faster than insertion sort, and TimSort is faster than merge sort. But it is worth paying attention to the same result of sorting a small array using merge and inserts. Although the first algorithm has a linear-logarithmic time complexity of $O(n \cdot log_2(n))$ and the second has a quadratic $O(x^2)$ time complexity, the execution time should differ by an order of $\left( \frac{100 ^ 2}{100 \cdot log_2{100}} = 15 \right)$ , but constant factors such as initialization of recursive functions eat away all the benefits of mergesort on small datasets.

TimSort significantly outperformed the first two algorithms even on a small array, this is because it is implemented in Python in C programming language, while the merge and insertion sort functions are written directly in Python.

2. Scalability

As expected, the sorting time increases with the size of the data set, for merge sort the growth follows a linear-logarithmic progression $\left( \frac{10.000 \cdot log_2{10.000}}{100 \cdot log_2{100}} = 200 \ approx. \right)$.

TimSort, on the other hand, has a hybrid nature and combines insertion and merge sorting, so when moving from a small to a medium array, the time complexity increases according to a linear-logarithmic progression, and before moving to a large array, the dependence is more like a quadratic one, but the core of the algorithm is still 22 times faster than merge sort, not least because of the C programming language implementation.

### Task 2

Given `k` sorted lists of integers. Your task is to combine them into one sorted list. When completing the task, you can rely on the merging sorting algorithm from the summary. Implement a `merge_k_lists` function that takes as input a list of sorted lists and returns a sorted list.

2.1 - Merge functions of sorted lists

In [54]:
#%%timeit
def merge_k_lists(lists: list[list, list]) -> list:
    """
    Function to merge sorted arrays (lists or tuples)

    :param lists: a list that includes other lists (or tuples) to
    be merged using an outer function
    :return: merged and sorted list
    """
    if not lists:
        return []

    while len(lists) > 1:
        merged = []
        for i in range(0, len(lists), 2):
            if i + 1 < len(lists):
                merged.append(merge(lists[i], lists[i + 1]))
            else:
                merged.append(lists[i])
        lists = merged

    return lists[0]

2.2 - Testing

In [55]:
lists = [[1, 4, 5], [1, 3, 4], [2, 6]]
merged_list = merge_k_lists(lists)
print("Sorted list:", merged_list)

Sorted list: [1, 1, 2, 3, 4, 4, 5, 6]
