# <font color="#418FDE" size="6.5" uppercase>**Efficient Sorting**</font>

>Last update: 20260117.
    
By the end of this Lecture, you will be able to:
- Implement a divide-and-conquer sorting algorithm such as merge sort or quicksort in Python. 
- Compare the performance of O(n log n) sorts to O(n^2) sorts on large datasets. 
- Describe key properties of Python’s built-in sort, including stability and typical complexity. 


## **1. Merge Sort Essentials**

### **1.1. Recursive Problem Splitting**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_01_01.jpg?v=1768688832" width="250">



>* Break one big sorting task into smaller pieces
>* Use a recursive function until reaching tiny lists

>* Base case stops splitting tiny sublists further
>* Recursive calls split list halves, forming call tree

>* Break big tasks into smaller, manageable subtasks
>* Recursive splitting structures work into tiny solvable pieces



In [None]:
#@title Python Code - Recursive Problem Splitting

# Demonstrate recursive splitting using a simple merge sort style example.
# Show base case and recursive step clearly for beginner understanding.
# Print splitting steps and final sorted list for visual confirmation.

# pip install statements are unnecessary because this script uses only standard Python.

# Define a recursive function that splits the list into smaller halves.
def split_and_sort(numbers_list):
    # Check base case where list length is zero or one element only.
    if len(numbers_list) <= 1:
        # Return list directly because it is already trivially sorted.
        return numbers_list

    # Compute middle index to split list into left and right halves.
    middle_index = len(numbers_list) // 2
    # Slice original list into left half using computed middle index.
    left_half = numbers_list[:middle_index]
    # Slice original list into right half using computed middle index.
    right_half = numbers_list[middle_index:]

    # Print current splitting step to visualize recursive problem division.
    print("Splitting:", numbers_list, "into", left_half, "and", right_half)
    # Recursively split and sort the left half of the list.
    sorted_left = split_and_sort(left_half)
    # Recursively split and sort the right half of the list.
    sorted_right = split_and_sort(right_half)

    # Merge two sorted halves into one sorted result list.
    merged_result = []
    # Initialize index pointers for left and right halves respectively.
    left_index, right_index = 0, 0

    # Loop while both halves still contain unmerged elements.
    while left_index < len(sorted_left) and right_index < len(sorted_right):
        # Compare current elements and append smaller one into merged_result list.
        if sorted_left[left_index] <= sorted_right[right_index]:
            merged_result.append(sorted_left[left_index])
            left_index += 1
        else:
            merged_result.append(sorted_right[right_index])
            right_index += 1

    # Extend merged_result with any remaining elements from left half list.
    merged_result.extend(sorted_left[left_index:])
    # Extend merged_result with any remaining elements from right half list.
    merged_result.extend(sorted_right[right_index:])

    # Return fully merged and sorted list for this recursive call level.
    return merged_result

# Define an example list representing unsorted document page counts in inches.
example_numbers = [7, 3, 9, 1, 4, 8]

# Call recursive splitting function and capture the final sorted result list.
final_sorted = split_and_sort(example_numbers)

# Print final sorted list after all recursive splitting and merging steps.
print("Final sorted list:", final_sorted)



### **1.2. Merging Sorted Sublists**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_01_02.jpg?v=1768688865" width="250">



>* Merge tiny sorted pieces into one sorted list
>* Repeatedly compare fronts, append smaller, then remaining

>* Only compare current front items from each list
>* Use one pass with pointers for efficiency

>* Merging can keep equal items in order
>* Tie-breaking rules preserve earlier sorting decisions



In [None]:
#@title Python Code - Merging Sorted Sublists

# Demonstrate merging two sorted sublists step by step.
# Show pointer movement while building a merged sorted result.
# Highlight stability when equal elements appear during merging.

# pip install statements are unnecessary because no external libraries used.

# Define a function that merges two sorted lists.
def merge_sorted_lists(left_list, right_list):
    # Initialize result list and index pointers for both lists.
    merged_result = []
    left_index = 0
    right_index = 0

    # Loop while both lists still have remaining elements.
    while left_index < len(left_list) and right_index < len(right_list):
        # Compare current elements and choose smaller or left one when equal.
        left_value = left_list[left_index]
        right_value = right_list[right_index]
        if left_value <= right_value:
            merged_result.append(left_value)
            left_index += 1
        else:
            merged_result.append(right_value)
            right_index += 1

    # Append any remaining elements from the left list if present.
    while left_index < len(left_list):
        merged_result.append(left_list[left_index])
        left_index += 1

    # Append any remaining elements from the right list if present.
    while right_index < len(right_list):
        merged_result.append(right_list[right_index])
        right_index += 1

    # Return the fully merged sorted result list.
    return merged_result

# Example sorted lists with repeated values to show stability clearly.
left_orders = [1, 3, 3, 7]
right_orders = [2, 3, 5, 8]

# Call the merge function and store the merged result list.
merged_orders = merge_sorted_lists(left_orders, right_orders)

# Print original lists and merged result to observe behavior.
print("Left sorted list:", left_orders)
print("Right sorted list:", right_orders)
print("Merged sorted list:", merged_orders)



### **1.3. Memory Usage Tradeoffs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_01_03.jpg?v=1768688893" width="250">



>* Merge sort needs extra memory while merging lists
>* This extra space matters for very large datasets

>* Merge sort keeps original and large temporary list
>* Extra memory doubles footprint, impacting cost and scalability

>* Low-memory merge sorts save space but complicate work
>* Choose implementation based on data size and resources



In [None]:
#@title Python Code - Memory Usage Tradeoffs

# Demonstrate merge sort memory usage with simple timing and size comparison.
# Compare extra list allocation versus in place built in sorting behavior.
# Help visualize memory tradeoffs when sorting large lists of numbers.

# pip install commands are unnecessary because script uses only standard library.

# Import required modules for timing and memory estimation.
import random
import time
import sys

# Define a simple merge function using extra temporary list space.
def merge(left, right):
    merged = []
    i = 0
    j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            merged.append(left[i])
            i += 1
        else:
            merged.append(right[j])
            j += 1
    merged.extend(left[i:])
    merged.extend(right[j:])
    return merged

# Define recursive merge sort that always allocates new lists.
def merge_sort(data):
    if len(data) <= 1:
        return data
    mid = len(data) // 2
    left = merge_sort(data[:mid])
    right = merge_sort(data[mid:])
    return merge(left, right)

# Helper function estimating list memory usage using sys.getsizeof.
def estimate_list_bytes(lst):
    base = sys.getsizeof(lst)
    items = len(lst) * sys.getsizeof(0)
    return base + items

# Create a reasonably large list representing transaction dollar amounts.
size = 200000
random.seed(0)
transactions = [random.randint(1, 1000000) for _ in range(size)]

# Estimate memory for original list before any sorting operations.
original_bytes = estimate_list_bytes(transactions)

# Time merge sort and estimate extra memory for temporary merged list.
start_merge = time.time()
sorted_merge = merge_sort(transactions)
end_merge = time.time()
merge_bytes = estimate_list_bytes(sorted_merge)

# Time built in sort which sorts list in place without full copy.
copy_for_builtin = list(transactions)
start_builtin = time.time()
copy_for_builtin.sort()
end_builtin = time.time()

# Print summary showing approximate memory and time tradeoffs.
print("Original list approximate bytes:", original_bytes)
print("Merge sort extra list bytes:", merge_bytes)
print("Total bytes during merge sort:", original_bytes + merge_bytes)
print("Built in sort approximate bytes:", original_bytes)
print("Merge sort seconds elapsed:", round(end_merge - start_merge, 3))
print("Built in sort seconds elapsed:", round(end_builtin - start_builtin, 3))
print("Note: merge sort roughly doubles list memory while running.")



## **2. Quicksort Performance Essentials**

### **2.1. Pivot Partition Mechanics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_02_01.jpg?v=1768688923" width="250">



>* Pivot step splits data around a chosen element
>* This creates smaller subproblems, enabling scalable recursion

>* Partition scans array once, doing linear work
>* Balanced splits create n log n total effort

>* Balanced pivots greatly reduce total comparisons needed
>* Quicksort scales well; quadratic sorts quickly become impractical



In [None]:
#@title Python Code - Pivot Partition Mechanics

# Show how quicksort pivot partition rearranges elements around chosen pivot.
# Visualize elements moving left or right relative to pivot during partitioning.
# Connect single linear scan partition cost with overall quicksort performance behavior.

# pip install commands not required because script uses only built in libraries.

# Define a simple partition function using Lomuto style partition scheme.
def partition(arr, low, high):
    # Choose pivot as last element within current segment for clarity.
    pivot = arr[high]
    # Index i tracks boundary between smaller and larger elements.
    i = low - 1
    # Scan each element once and compare with pivot value.
    for j in range(low, high):
        # If current element smaller than pivot then expand smaller side.
        if arr[j] < pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    
    # Finally place pivot between smaller and larger element groups.
    arr[i + 1], arr[high] = arr[high], arr[i + 1]
    # Return final pivot index now correctly positioned.
    return i + 1

# Helper function to run partition and display intermediate information.
def demo_partition(values):
    # Copy list so original data remains unchanged for comparison.
    arr = values.copy()
    # Show original unsorted list before partitioning step.
    print("Original list:", arr)
    # Perform partition across entire list segment indices.
    pivot_index = partition(arr, 0, len(arr) - 1)
    
    # Show list after partition with pivot in final sorted position.
    print("After partition:", arr)
    # Show pivot value and its final index location within list.
    print("Pivot value:", arr[pivot_index], "at index", pivot_index)
    # Show left side elements which are strictly smaller than pivot.
    print("Left side < pivot:", arr[:pivot_index])
    # Show right side elements which are greater or equal pivot.
    print("Right side >= pivot:", arr[pivot_index + 1 :])

# Example dataset representing unsorted daily sales amounts in dollars.
data = [42, 7, 105, 23, 88, 16, 59]
# Run demonstration to observe single linear time partition behavior.
demo_partition(data)



### **2.2. Average vs worst case**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_02_02.jpg?v=1768688948" width="250">



>* Average case is efficient, about n log n
>* Bad pivots cause unbalanced splits and n² time

>* Balanced pivots keep work near n log n
>* Highly ordered data can trigger slow quadratic behavior

>* Average-case quicksort is usually fast and scalable
>* Worst case needs defenses like randomized or hybrid strategies



In [None]:
#@title Python Code - Average vs worst case

# Demonstrate quicksort average versus worst case performance clearly.
# Compare random input versus sorted input using simple timing measurements.
# Show how bad pivots can dramatically increase quicksort running time.

# pip install commands are unnecessary because this script uses only standard libraries.

# Import required modules for timing and randomization.
import time
import random

# Define a simple quicksort using first element pivot choice.
def quicksort_first_pivot(data_list):
    if len(data_list) <= 1:
        return data_list
    pivot_value = data_list[0]
    left_part = [x for x in data_list[1:] if x <= pivot_value]
    right_part = [x for x in data_list[1:] if x > pivot_value]
    return quicksort_first_pivot(left_part) + [pivot_value] + quicksort_first_pivot(right_part)

# Define a helper function measuring elapsed time for sorting.
def measure_sort_time(input_list, label_text):
    start_time = time.perf_counter()
    quicksort_first_pivot(input_list)
    end_time = time.perf_counter()
    elapsed_seconds = end_time - start_time
    print(f"{label_text}: {elapsed_seconds:.6f} seconds")

# Create dataset size representing number of customer records.
size_records = 2000

# Create random order list representing typical mixed transaction order.
random_data = [random.randint(0, size_records) for _ in range(size_records)]

# Create already sorted list representing structured adversarial transaction order.
sorted_data = list(range(size_records))

# Time quicksort on random data approximating average case behavior.
measure_sort_time(random_data, "Random data approximate average case time")

# Time quicksort on sorted data approximating worst case behavior.
measure_sort_time(sorted_data, "Sorted data approximate worst case time")

# Print simple conclusion comparing both timings for quick understanding.
print("Notice worst case time is usually much slower than average case time.")



### **2.3. Randomized Pivot Strategy**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_02_03.jpg?v=1768688976" width="250">



>* Fixed pivots can cause unbalanced, slow quicksort
>* Random pivots avoid worst cases on large datasets

>* Random pivots usually create balanced quicksort partitions
>* Keeps work near n log n on datasets

>* Random pivots break harmful patterns in data
>* Keeps quicksort fast and reliable in practice



In [None]:
#@title Python Code - Randomized Pivot Strategy

# Demonstrate randomized pivot quicksort behavior on tricky ordered input arrays.
# Compare fixed pivot quicksort against randomized pivot quicksort performance.
# Show how randomness protects against worst case partition patterns in practice.

# pip install numpy matplotlib seaborn  # Not required in Google Colab environment.

# Import required random and time modules for experiment measurements.
import random
import time
import sys

# Increase recursion limit to avoid RecursionError for worst-case quicksort.
sys.setrecursionlimit(10000)

# Define quicksort using always first element pivot selection strategy.
def quicksort_fixed(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = [x for x in arr[1:] if x <= pivot]
    right = [x for x in arr[1:] if x > pivot]
    return quicksort_fixed(left) + [pivot] + quicksort_fixed(right)

# Define quicksort using randomized pivot selection strategy for robustness.
def quicksort_randomized(arr):
    if len(arr) <= 1:
        return arr
    pivot_index = random.randrange(len(arr))
    pivot = arr[pivot_index]
    rest = arr[:pivot_index] + arr[pivot_index + 1 :]
    left = [x for x in rest if x <= pivot]
    right = [x for x in rest if x > pivot]
    return quicksort_randomized(left) + [pivot] + quicksort_randomized(right)

# Helper function measuring execution time for provided sorting function.
def measure_time(sort_func, data):
    start = time.perf_counter()
    sort_func(data)
    end = time.perf_counter()
    return end - start

# Create already sorted list which hurts fixed pivot quicksort performance.
size = 6000
sorted_data = list(range(size))

# Measure fixed pivot quicksort time on sorted input list.
fixed_time = measure_time(quicksort_fixed, sorted_data)

# Measure randomized pivot quicksort time on same sorted input list.
random_time = measure_time(quicksort_randomized, sorted_data)

# Print timing comparison showing benefit of randomized pivot strategy.
print("Sorted input length:", size, "elements, about", size * 0.000621, "miles metaphor.")
print("Fixed pivot quicksort time seconds:", round(fixed_time, 4))
print("Randomized pivot quicksort time seconds:", round(random_time, 4))
print("Randomized pivot usually avoids worst case partition behavior here.")



## **3. Python Timsort Essentials**

### **3.1. Adaptive Runs in Timsort**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_03_01.jpg?v=1768689024" width="250">



>* Real data often contains partially ordered stretches
>* Timsort finds these runs and sorts more efficiently

>* Timsort scans data, building ascending minimum-length runs
>* Already sorted or block-ordered data needs minimal work

>* Timsort speeds up nearly sorted, lightly shuffled data
>* Runs make real performance near linear, worst n log n



In [None]:
#@title Python Code - Adaptive Runs in Timsort

# Demonstrate how Python Timsort benefits from adaptive runs in practice.
# Compare sorting times for random data versus nearly sorted adaptive runs.
# Show that nearly sorted data can be sorted much faster than random data.

# pip install commands are not required because we only use built in modules.

# Import required modules for timing and random list generation.
import random
import time

# Define a helper function that measures sort time for a given list.
def measure_sort_time(data_list, label_text):
    start_time = time.perf_counter()
    sorted_list = sorted(data_list)
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f"{label_text}: {elapsed_time:.6f} seconds")
    return sorted_list

# Create a list that is completely random without any helpful adaptive runs.
random_list = [random.randint(0, 1000000) for value_index in range(50000)]

# Create a list that is mostly sorted with a small shuffled tail segment.
mostly_sorted_list = list(range(50000))

# Shuffle only a small tail part to keep one large adaptive run.
small_tail = mostly_sorted_list[-5000:]
random.shuffle(small_tail)
mostly_sorted_list[-5000:] = small_tail

# Time sorting for the completely random list using built in sorted.
sorted_random = measure_sort_time(random_list, "Completely random list time")

# Time sorting for the mostly sorted list that contains a long adaptive run.
sorted_mostly = measure_sort_time(mostly_sorted_list, "Mostly sorted list time")

# Print a short summary comparing the two measured sorting times.
print("Notice that the mostly sorted list often sorts significantly faster overall.")



### **3.2. Stability and key functions**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_03_02.jpg?v=1768689052" width="250">



>* Stable sort keeps equal items’ original order
>* Enables safe multi-step, hierarchical sorting workflows

>* Key functions choose values used for sorting
>* Stable sort keeps equal-key items in order

>* Stable sort enables simple multi-step, multi-key ordering
>* Tuple key functions keep equal-key order predictable



In [None]:
#@title Python Code - Stability and key functions

# Demonstrate stable sorting with key functions using simple employee records.
# Show how equal keys preserve original order across multiple sorting passes.
# Compare multi-step stable sorting with a single tuple based key function.

# pip install some_required_library_if_needed_but_standard_libraries_suffice_here.

# Create a list of employees with join order and department fields.
employees = [
    {"name": "Alice", "joined": 1, "department": "Sales"},
    {"name": "Bob", "joined": 2, "department": "Engineering"},
    {"name": "Carol", "joined": 3, "department": "Sales"},
    {"name": "Dave", "joined": 4, "department": "Engineering"},
    {"name": "Eve", "joined": 5, "department": "Sales"}
]

# Print original employees list showing join order and department fields.
print("Original employees:")
for e in employees:
    print(e["joined"], e["department"], e["name"])

# Sort by department using key function while preserving join order within departments.
by_department = sorted(employees, key=lambda e: e["department"])

# Print employees sorted by department showing stable ordering by join time.
print("\nSorted by department (stable):")
for e in by_department:
    print(e["department"], e["joined"], e["name"])

# Sort again by join time using key function to demonstrate multi key behavior.
by_department_then_joined = sorted(by_department, key=lambda e: e["joined"])

# Print employees sorted by join time after department sort showing preserved ties.
print("\nSorted by join time after department sort:")
for e in by_department_then_joined:
    print(e["joined"], e["department"], e["name"])



### **3.3. Mastering sort and sorted**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master Python Algorithms/Module_05/Lecture_B/image_03_03.jpg?v=1768689086" width="250">



>* sort changes the original list in place
>* sorted returns a new list from any iterable

>* sort and sorted share key, reverse options
>* Stable sorting enables flexible, multi-step custom ordering

>* Use in-place sort when original order unnecessary
>* Use sorted for temporary views and cleaner design



In [None]:
#@title Python Code - Mastering sort and sorted

# Demonstrate list sort versus sorted usage clearly and concisely.
# Show in place modification and new list creation differences.
# Highlight stability using multi level sorting with key functions.

# pip install commands are unnecessary because we use only built in features.

# Create original list of city temperature records in Fahrenheit.
records = [("Denver", 75), ("Boston", 80), ("Austin", 95), ("Boston", 70)]

# Show original records before any sorting operations are applied.
print("Original records:", records)

# Use sorted to create new list without changing original records list.
sorted_by_temp = sorted(records, key=lambda item: item[1])

# Display new sorted list and unchanged original list for comparison clarity.
print("Sorted by temperature (new list):", sorted_by_temp)
print("Original after sorted call:", records)

# Use in place sort when original ordering is no longer needed later.
records.sort(key=lambda item: item[0])

# Display records after in place sort by city name alphabetically.
print("In place sort by city name:", records)

# Demonstrate stability by sorting again by temperature after city name.
records.sort(key=lambda item: item[1])

# Display final list showing Boston entries keep previous relative ordering.
print("Stable sort by temperature after city name:", records)



# <font color="#418FDE" size="6.5" uppercase>**Efficient Sorting**</font>


In this lecture, you learned to:
- Implement a divide-and-conquer sorting algorithm such as merge sort or quicksort in Python. 
- Compare the performance of O(n log n) sorts to O(n^2) sorts on large datasets. 
- Describe key properties of Python’s built-in sort, including stability and typical complexity. 

In the next Module (Module 6), we will go over 'Recursion And Backtracking'