This project is a Python tool to analyze and plot the performance of various sorting algorithms. It was built for an algorithms course to demonstrate the practical differences between algorithms based on time complexity and implementation.
The analyzer runs multiple sorting algorithms on arrays of different sizes and types (random, sorted, reversed). It records the execution time, number of comparisons, and number of swaps, then saves the raw data to a .csv file and generates a full set of performance plots.
- Multiple Algorithms: Compares Insertion Sort, Bubble Sort, Merge Sort, Quicksort, and Python's built-in Timsort.
- Detailed Metrics: Measures Time (ms), Comparisons, and Swaps/Movements for a comprehensive analysis.
- Case Analysis: Tests algorithms against Random (average case), Sorted (best case), and Reversed (worst case) arrays.
- Modular Package: Code is cleanly structured in a
sorting_analyzerpackage. - Data Export: Saves all raw experiment data to a
.csvfile. - Automatic Plotting: Uses
seabornandmatplotlibto automatically generate and save publication-quality plots to aplots/directory.
python-sorting-analyzer/
├── .gitignore
├── LICENSE
├── README.md # This documentation
├── requirements.txt # Project dependencies
├── main.py # Main runnable script (CLI)
└── sorting_analyzer/
├── __init__.py # Makes 'sorting_analyzer' a package
├── algorithms.py # All sorting function implementations
├── analyzer.py # The main experiment-running logic
└── plotter.py # Plot generation logic
The tool is orchestrated by main.py:
- Setup: The user runs
main.py, optionally specifying the max array size (--max-size) and number of steps (--steps). - Analysis (
analyzer.py):- The
run_analysisfunction iterates through every combination of algorithm, array size, and array type. - For each run, it generates the specified array (e.g., a 1000-element reversed list).
- It starts a high-precision timer (
time.perf_counter). - It calls the sorting function (e.g.,
algorithms.insertion_sort()). - It stops the timer and records the elapsed milliseconds.
- The sorting function itself returns the number of
comparisonsandswapsit performed, which are also recorded. - All results are collected into a
pandasDataFrame.
- The
- Data Export: The DataFrame is saved to a
.csvfile (e.g.,sorting_results.csv). - Plotting (
plotter.py):- The
generate_plotsfunction takes the DataFrame. - It iterates through each metric ("Time", "Comparisons", "Swaps") and array type ("Random", "Sorted").
- It uses
seaborn.lineplotto create a chart comparing all algorithms for that specific scenario (e.g., "Time vs. Array Size for Random Arrays"). - Each plot is saved as a
.pngfile to theplots/directory.
- The
-
Clone the repository:
git clone https://github.com/msmrexe/python-sorting-analyzer.git cd python-sorting-analyzer -
Create a virtual environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the analysis:
# Run with default settings (max size 2000, 10 steps) python main.py # Run a larger analysis python main.py --max-size 10000 --steps 20 # Specify output files python main.py --csv my_data.csv --plots-dir my_plots
After running, check the
plots/directory for your.pnggraphs andsorting_results.csvfor the raw data.
The plots generated by this tool visually demonstrate core computer science concepts:
-
$O(n^2)$ vs.$O(n \log n)$ :- On Random arrays, the "Time" and "Comparisons" plots show the lines for Bubble Sort and Insertion Sort curving up quadratically (like
$x^2$ ), while Merge Sort, Quicksort, and Timsort look almost linear (log-linear,$n \log n$ ). This is the most important comparison.
- On Random arrays, the "Time" and "Comparisons" plots show the lines for Bubble Sort and Insertion Sort curving up quadratically (like
-
Best-Case Scenarios (Sorted Arrays):
-
Insertion Sort: The plot for Sorted arrays shows Insertion Sort is extremely fast. It becomes
$O(n)$ (linear time) because it only makes one comparison per element and no swaps. -
Bubble Sort: Our optimized version also becomes
$O(n)$ on a sorted array, as it will make one pass and then exit. -
Quicksort (Worst-Case): The Sorted array plot shows Quicksort's "Time" and "Comparisons" spike dramatically, performing as badly as (or worse than) Bubble Sort. This is the classic
$O(n^2)$ worst-case for Quicksort when the pivot is always the smallest/largest element.
-
Insertion Sort: The plot for Sorted arrays shows Insertion Sort is extremely fast. It becomes
-
Worst-Case Scenarios (Reversed Arrays):
-
Insertion Sort: The plot shows this is the worst-case. It requires the maximum number of comparisons and swaps, clearly showing its
$O(n^2)$ nature. -
Quicksort: This is also the
$O(n^2)$ worst-case, just like the sorted array.
-
Insertion Sort: The plot shows this is the worst-case. It requires the maximum number of comparisons and swaps, clearly showing its
-
Algorithm of Choice:
- Timsort (Python's built-in) is the fastest in all "Time" plots. It's a hybrid, highly-optimized algorithm (mergesort + insertion sort) written in C.
-
Merge Sort is be the most consistent. Its "Time" and "Comparisons" plots look nearly identical across Random, Sorted, and Reversed arrays, proving its
$O(n \log n)$ guarantee in all cases.
Feel free to connect or reach out if you have any questions!
- Maryam Rezaee
- GitHub: @msmrexe
- Email: ms.maryamrezaee@gmail.com
This project is licensed under the MIT License. See the LICENSE file for full details.