# **Countingsort, Radixsort, Bucketsort and Tries**


In [1]:
#importing necessary libraries
import random
import time
import math

## CountingSort

In this exercise, we implemented the CountingSort algorithm to sort 30,000 parts with distinct numeric codes in the range from 1 to 30,000.

CountingSort is ideal for this case because the data range is known and limited. The algorithm uses an auxiliary array to count the frequency of each value and then reconstructs the sorted array based on these counters.

The time complexity of CountingSort is linear: \(O(n + k)\), where:
- \(n\) is the number of elements,
- \(k\) is the largest possible value (in our case, 30,000).

Since the elements are distinct and the range is small, we have \(k \approx n \Rightarrow O(n)\).


In [2]:
# Generate an array of 30,000 distinct integers between 1 and 30000
array_cs = random.sample(range(1, 30001), 30000)

def counting_sort(arr):
    # Find the maximum value to define the counter size
    max_val = max(arr)
    count = [0] * (max_val + 1)

    # Count the frequency of each value in the array
    for num in arr:
        count[num] += 1

    # Reconstruct the sorted array
    index = 0
    for i in range(len(count)):
        while count[i] > 0:
            arr[index] = i
            index += 1
            count[i] -= 1

# Test and time measurement
arr_to_sort = array_cs.copy()
start = time.time()
counting_sort(arr_to_sort)
end = time.time()

print(f"CountingSort execution time: {end - start:.5f} seconds")

CountingSort execution time: 0.01168 seconds


## RadixSort

In this exercise, we implemented the RadixSort algorithm to sort 30,000 agent identification codes, which are integers between 1,000 and 999,999.

RadixSort is ideal for data with a fixed number of digits. It sorts elements in multiple passes, one for each digit, from the least significant (units) to the most significant (hundreds of thousands).

Since each pass uses CountingSort as a subroutine (which is \(O(n)\)), and there are \(d\) digits, the total complexity is:

\[
O(d \cdot n)
\]

In this case, \(d\) can be up to 6 digits, so the complexity is linear with respect to the number of elements.


In [3]:
# Generation of 30,000 random numbers with up to 6 digits (between 1000 and 999999)
array_rs = [random.randint(1000, 999999) for _ in range(30000)]

# Counting Sort by specific digit (base 10)
def counting_sort_digit(arr, exp):
    n = len(arr)
    output = [0] * n
    count = [0] * 10

    # Count the occurrences of each digit at the 'exp' position
    for i in range(n):
        index = (arr[i] // exp) % 10
        count[index] += 1

    # Accumulate positions
    for i in range(1, 10):
        count[i] += count[i - 1]

    # Build the output array sorted by this digit
    i = n - 1
    while i >= 0:
        index = (arr[i] // exp) % 10
        output[count[index] - 1] = arr[i]
        count[index] -= 1
        i -= 1

    # Copy the sorted array back to arr
    for i in range(n):
        arr[i] = output[i]

# Main RadixSort function
def radix_sort(arr):
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort_digit(arr, exp)
        exp *= 10

# Test and time measurement
arr = array_rs.copy()
start = time.time()
radix_sort(arr)
end = time.time()

print(f"RadixSort execution time: {end - start:.5f} seconds")

RadixSort execution time: 0.08271 seconds


## BucketSort

In this exercise, we used the BucketSort algorithm to sort the arrival times of 30,000 spacecrafts, where each time is a real number (float) between 0.0 and 200.0 seconds.

BucketSort is particularly efficient when the data is uniformly distributed within a known range. It divides the range into “buckets,” spreads the elements among them, and then sorts each bucket individually.

For BucketSort to work properly:
- We normalized the data (by dividing by 200.0) to produce values between 0 and 1.
- We used 10 buckets.
- Each bucket was sorted using Insertion Sort (efficient for small datasets).

The average complexity of BucketSort is \(O(n)\) when the data is uniformly distributed.

A complexidade média do BucketSort é $O(n)$ quando os dados estão uniformemente distribuídos.


In [4]:
# Generation of 30,000 uniformly distributed real numbers between 0.0 and 200.0
array_bs = [random.uniform(0, 200) for _ in range(30000)]

# Insertion Sort function (used to sort buckets)
def insertion_sort(arr):
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key

# Main BucketSort function
def bucket_sort(arr):
    n = len(arr)
    buckets = [[] for _ in range(10)]  # Create 10 buckets

    # Normalization and distribution into buckets
    for value in arr:
        normalized = value / 200.0  # Normalize to [0, 1]
        index = int(normalized * 10)
        if index == 10:
            index = 9
        buckets[index].append(value)

    # Sort each bucket with Insertion Sort
    for bucket in buckets:
        insertion_sort(bucket)

    # Concatenate the sorted buckets
    index = 0
    for bucket in buckets:
        for value in bucket:
            arr[index] = value
            index += 1

# Test and time measurement
arr = array_bs.copy()
start = time.time()
bucket_sort(arr)
end = time.time()

print(f"BucketSort execution time: {end - start:.5f} seconds")

BucketSort execution time: 2.28731 seconds


## Trie with Binary Keys

In this exercise, we implemented a Trie (digital tree) to store alphabet letters mapped to 5-bit binary codes.

The Trie was built with the following functionalities:
- Insertion of binary keys;
- Calculation of the tree height;
- Textual printing of the structure;
- Removal of keys and recalculation of the height.

After inserting all the letters from the given sequence, we computed the height of the Trie and compared it with the expected limits:
- Maximum possible height: 5 bits → (\(2^5 = 32\))
- Ideal balanced height: \( \log_2(n) \approx 5 \), with \(n = 26\)

Next, we removed some letters and observed the resulting changes in height.


In [5]:
# Dictionary of binary codes
bin_codes = {
    'A': '00001', 'B': '00010', 'C': '00011', 'D': '00100', 'E': '00101',
    'F': '00110', 'G': '00111', 'H': '01000', 'I': '01001', 'J': '01010',
    'K': '01011', 'L': '01100', 'M': '01101', 'N': '01110', 'O': '01111',
    'P': '10000', 'Q': '10001', 'R': '10010', 'S': '10011', 'T': '10100',
    'U': '10101', 'V': '10110', 'W': '10111', 'X': '11000', 'Y': '11001',
    'Z': '11010'
}

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert_bin(self, bin_key):
        node = self.root
        for bit in bin_key:
            if bit not in node.children:
                node.children[bit] = TrieNode()
            node = node.children[bit]
        node.is_end = True

    def insert_letter(self, letter):
        bin_key = bin_codes[letter]
        self.insert_bin(bin_key)

    def remove_bin(self, node, bin_key, depth=0):
        if depth == len(bin_key):
            if node.is_end:
                node.is_end = False
            return len(node.children) == 0
        bit = bin_key[depth]
        if bit in node.children:
            should_delete = self.remove_bin(node.children[bit], bin_key, depth + 1)
            if should_delete:
                del node.children[bit]
                return not node.is_end and len(node.children) == 0
        return False

    def remove_letter(self, letter):
        bin_key = bin_codes[letter]
        self.remove_bin(self.root, bin_key)

    def height(self, node=None):
        if node is None:
            node = self.root
        if not node.children:
            return 0
        return 1 + max(self.height(child) for child in node.children.values())

    def print_trie(self, node=None, prefix=""):
        if node is None:
            node = self.root
        if node.is_end:
            print(f"{prefix} *")
        for bit, child in node.children.items():
            self.print_trie(child, prefix + bit)

# Building the Trie and inserting letters
letters_to_insert = ['P','L','O','M','D','V','F','U','X','J','I','A','S','W','Q','T','K','B','N','Z','E','Y','R','C','H','G']

trie = Trie()
for letter in letters_to_insert:
    trie.insert_letter(letter)

print("Trie after insertion:")
trie.print_trie()
h1 = trie.height()
print(f"Trie Height: {h1}")
print(f"Comparison: 2^5 = 32, log2(26) ≈ {round(math.log2(26), 2)}")

# Removing letters: A E I O U K X W Z
for letter in ['A','E','I','O','U','K','X','W','Z']:
    trie.remove_letter(letter)

print("\nTrie after removal:")
trie.print_trie()
h2 = trie.height()
print(f"Trie Height after removal: {h2}")

Trie after insertion:
10000 *
10001 *
10011 *
10010 *
10110 *
10111 *
10101 *
10100 *
11000 *
11001 *
11010 *
01100 *
01101 *
01111 *
01110 *
01010 *
01011 *
01001 *
01000 *
00100 *
00101 *
00110 *
00111 *
00001 *
00010 *
00011 *
Trie Height: 5
Comparison: 2^5 = 32, log2(26) ≈ 4.7

Trie after removal:
10000 *
10001 *
10011 *
10010 *
10110 *
10100 *
11001 *
01100 *
01101 *
01110 *
01010 *
01000 *
00100 *
00110 *
00111 *
00010 *
00011 *
Trie Height after removal: 5
