A comprehensive Python Code for benchmarking and comparing the performance of various sorting algorithms. This project allows users to evaluate the efficiency of different sorting methods across various data distributions and sizes, visualize the results, and understand the theoretical and practical differences between algorithms.
- Installation
- Usage
- Features
- Methodology
- Examples
- References
- Dependencies
- Algorithms/Mathematical Concepts Used
- License
- Acknowledgments
- Note
To set up the Sorting Algorithm Benchmark project, follow these steps:
- Install the required dependencies:
pip install numpy matplotlib timeit argparse warnings os abc heapq
The benchmarking and visualization tool can run in interactive mode in Jupyter Notebook.
- Simply run the cell.
- Follow the prompts to configure the benchmark parameters.
- Multiple Sorting Algorithms: Implements seven classic sorting algorithms with proper complexity annotations
- Performance Metrics: Measures execution time, comparison counts, and swap operations
- Various Data Distributions: Tests algorithms on random, ascending, descending, and partially sorted data
- Visualization Tools: Generates bar charts for comparing algorithm performance
- Detailed Analysis: Provides comprehensive tables of results with complexity information
- Flexible Configuration: Supports command-line arguments and interactive mode
- Error Handling: Includes validation, warnings for large datasets, and proper error reporting
The benchmarking process follows these key steps:
-
Algorithm Implementation:
- Each sorting algorithm is implemented as a class derived from the abstract
SortingAlgorithmbase class - Algorithms track comparison and swap operations during execution
- Each sorting algorithm is implemented as a class derived from the abstract
-
Dataset Generation:
- Creates datasets of specified size with different distributions:
- Random: Completely unsorted data
- Ascending: Pre-sorted in ascending order
- Descending: Pre-sorted in descending order
- Partially Sorted: Partially pre-sorted data (70% sorted)
- Creates datasets of specified size with different distributions:
-
Benchmarking Process:
- For each algorithm and dataset combination:
- Measures execution time using Python's
timeitmodule - Counts comparison and swap operations
- Skips excessively slow combinations (e.g., quadratic algorithms on large random datasets)
- Measures execution time using Python's
- For each algorithm and dataset combination:
-
Results Analysis:
- Generates tables showing execution times, comparison counts, and swap operations
- Creates bar charts comparing algorithm performance across different datasets
- Provides logarithmic scale option for better visualization of performance differences
-
Visualization:
- Produces time comparison charts
- Generates operation count visualizations
- Adds complexity annotations for reference
When run with a dataset size of 1000 elements, the benchmark and visualization will produce results similar to:
- Time Comparison Chart: Shows execution times across different algorithms and datasets

- Comparison Operations Chart: Displays the number of comparisons performed by each algorithm

- Swap Operations Chart: Illustrates the number of element swaps/moves performed

- Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
- Knuth, D. E. (1998). The Art of Computer Programming, Volume 3: Sorting and Searching (2nd ed.). Addison-Wesley.
- Sedgewick, R., & Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley.
- McIlroy, P. D., Bostic, K., & McIlroy, M. D. (1993). Engineering radix sort. Computing Systems, 6(1), 5-27.
- Python 3.6+: Core language
- NumPy: For efficient array operations and random number generation
- Matplotlib: For visualization and chart generation
- Standard Library Modules: argparse, timeit, warnings, os, abc, heapq
This project implements and analyzes the following sorting algorithms:
-
Bubble Sort
- Time Complexity: O(n²)
- Space Complexity: O(1)
- Mathematical Basis: Repeated pairwise comparison and swapping
- Optimization: Early termination when no swaps occur in a pass
-
Selection Sort
- Time Complexity: O(n²)
- Space Complexity: O(1)
- Mathematical Basis: Finding the minimum element in unsorted portion and moving it to the sorted portion
-
Insertion Sort
- Time Complexity: O(n²)
- Space Complexity: O(1)
- Mathematical Basis: Building a sorted array one item at a time
- Best case: O(n) for nearly sorted data
-
Merge Sort
- Time Complexity: O(n log n)
- Space Complexity: O(n)
- Mathematical Basis: Divide and conquer, recursive merging of sorted subarrays
- Stable sort with guaranteed worst-case performance
-
Quick Sort
- Time Complexity: O(n log n) average, O(n²) worst-case
- Space Complexity: O(log n)
- Mathematical Basis: Divide and conquer using partitioning
- Optimization: Median-of-three pivot selection to avoid worst-case performance
-
Heap Sort
- Time Complexity: O(n log n)
- Space Complexity: O(1)
- Mathematical Basis: Binary heap data structure, selection sort with efficient access to largest element
- In-place sorting algorithm with guaranteed O(n log n) performance
-
Timsort
- Time Complexity: O(n log n)
- Space Complexity: O(n)
- Mathematical Basis: Hybrid sorting algorithm derived from merge sort and insertion sort
- Adaptive algorithm that exploits existing order in the data
The benchmark also explores computational complexity theory, empirical performance analysis, and algorithm efficiency visualization.
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by classic algorithm textbooks and courses
- Developers of NumPy and Matplotlib
- Algorithm implementations based on pseudocode from various academic sources
| AI was used to generate most of the docstrings and inline comments in the code. |
|---|
