This project provides a complete benchmarking and visualization framework to evaluate sorting algorithms applied to large-scale log file processing. It extends traditional comparisons by capturing detailed metrics, enabling statistical validation, and supporting multiple log patterns:
- Quick Sort – Optimized with Median-of-Three Pivot
- Merge Sort – Stable and External Sort-ready
- Heap Sort – In-place, Memory-Efficient
- Radix Sort – Optimized for Timestamp-Based Structured Logs
The framework supports synthetic log generation, tracking of comparisons & swaps, and detailed performance visualization.
project_root/
│
├── src/
│ ├── algorithms/ # Sorting algorithm implementations
│ ├── log_generator/ # Generates log files in 4 patterns
│ ├── benchmarking/ # Metrics collector & dashboard visualizations
│ ├── optimizations/ # Advanced structures: multithreading, memory pool
│ └── stream_processing/ # Log stream handler
│
├── reports/ # Results and generated plots
│ ├── benchmark_results.csv
│ └── plots/
│ ├── time_plot.png
│ ├── memory_data_plot.png
│ ├── memory_program_plot.png
│ ├── comparisons_plot.png
│ └── swaps_plot.png
│
├── main.py # Entry point: log generation, benchmarking, and plotting
├── plot_benchmark_results.py # Optional CLI script to generate graphs from CSV
├── Log_Sorting_Demo_AutoPlot.ipynb# Jupyter notebook to demo and visualize all metrics
└── requirements.txt # Python dependencies
- 🔁 Log file generator supports 4 patterns:
random
,sorted
,reverse
,partial
- 🧪 Benchmarks with:
execution time
,memory (data + program)
,comparisons
,swaps
- 📊 Repeated trials (configurable) with CSV export
- 📈 Auto-generated performance plots with
matplotlib
andseaborn
- 🧠 Visual and statistical comparison across algorithms and patterns
- Python 3.11+
- Install dependencies:
pip install -r requirements.txt
python main.py
This will:
- Generate logs in all 4 patterns
- Apply all 4 sorting algorithms
- Measure metrics and store them in
benchmark_results.csv
- Automatically generate plots into
reports/plots/
python plot_benchmark_results.py
This reads reports/benchmark_results.csv
and generates:
time_plot.png
memory_program_plot.png
memory_data_plot.png
comparisons_plot.png
swaps_plot.png
from src.log_generator.log_generator import LogGenerator
LogGenerator().generate_log_file("logs/random.txt", 10000, pattern='random')
from src.algorithms.quick_sort import quick_sort
from src.benchmarking.metrics_collector import MetricsCollector, OperationTracker
logs = [...] # your log data
tracker = OperationTracker()
metrics = MetricsCollector()
sorted_data = metrics.measure(quick_sort, logs, tracker)
print(metrics.results)
Use the notebook:
jupyter notebook Log_Sorting_Demo_AutoPlot.ipynb
This demonstrates the entire benchmarking and plotting flow interactively.
This project is licensed under the MIT License – see the LICENSE file for details.
Contributions are welcome! Fork the repo, make improvements, and submit a pull request.