Skip to content

Latest commit

 

History

History
90 lines (65 loc) · 4.25 KB

sorting.mdx

File metadata and controls

90 lines (65 loc) · 4.25 KB
title sidebar_label
Sorting algorithms
Sorting

Computers spend a lot of time sorting while processing data. Once sorting is done, many other problems becomes very easy. It is also one of the most studied problem in computer science. Many of the interesting ideas such as dive-and-conquer, randomized algorithms, lower bounds etc. can be found in the context of sorting. Here we will look into some of the sorting algorithms and how they scales with increasing problem size.

Permutation sort

Given an array, we create all possible permutations of the array items, then check if the one of those permutations are sorted. This algorithm is extremely inefficient, there are $n!$ possible permutations of $n$ items. Even for a relatively small number of input size 20, $(20)! \approx 2 \times 10^{18}$. For each step, we also need about $\mathcal{O}(n)$ checks to find out if one of the permutations is sorted.

Selection sort

Here is how the selection sort works: say we are given the following array to sort: [5, 8, 2, 3, 7, 9]

Step 1: find the biggest element in first (n - 1) element and swap it with the last element if it is bigger than the last element else do nothing

Step 2: find biggest element in first (n - 2) element, and swap it with (n-1)th element if it is bigger

Below is an implementation in C++.

import CodeBlock from '@theme/CodeBlock'; import selection_sort from '!!raw-loader!/src/cpp/misc/selection_sort.cpp';

{selection_sort}

Insertion sort

import insertion_sort from '!!raw-loader!/src/cpp/misc/insertion_sort.cpp';

{insertion_sort}

Both selection sort and insertion sort has time complexity $\mathcal{O}(n^2)$.

Merge sort

import merge_sort from '!!raw-loader!/src/cpp/misc/merge_sort.cpp';

{merge_sort}

Merge sort has time complexity $\mathcal{O}(n \log n)$. Note that for any comparison model, there are $n!$ possible outcomes for a problem size $n$. Minimum height of such as descision tree would be $\log(n!) \approx n \log n$. Therefore, merge sort is best possible (or close to best possible) solution for any sorting algorithm based on comparison model. Can we do better?

Linear sort

It takes constant amount of time to access a memory location (RAM) by its address. If we are able to build a direct access array (DAA), we can get its elements by its index. We can build key value pairs, where keys are index of an DAA. If we have to search through the array, we can get an item in constant time. Say we have $n$ items and the key space is $u$. If $n$ and $u$ are of the same order, then such an algorithm would be efficient. Example: say we have student IDs (keys) based on their ranking.

What if key space is of the order of $n^2$? In that case, we can break down each key in a tuple of (a, b), where a = u / n (integer division), and b = u % n (remainder of modulo division). Example: say we have a list of 5 numbers: {12, 17, 3, 22, 24}, we can transform them into {(2, 2), (3, 2), (0, 3), (4,2), (4, 4)}. Now, if we need to sort them (we can call tuple sort), we need to sort first based on the least significant tuple digit (here second digit), and we obtain: {(2, 2), (3, 2), (4, 2), (0, 3), (4, 4)}. Finally, we sort by the most significant tuple digit: {(0, 3), (2, 2), (3, 2), (4, 2), (4, 4)}.

:::info

Note that we need a stable sorting algorithm to sort above tuple, i.e., it maintains the order of numbers in the original input list. If the input tuple was {(2, 3), (2, 1)} after sorting based on most significant tuple digit (i.e., first digit), it should return {(2, 3), (2, 1)}.

:::

Resources