Comparative Analysis of Sorting Algorithms for Large-Scale Log File Processing

📌 Project Overview

This project provides a complete benchmarking and visualization framework to evaluate sorting algorithms applied to large-scale log file processing. It extends traditional comparisons by capturing detailed metrics, enabling statistical validation, and supporting multiple log patterns:

Quick Sort – Optimized with Median-of-Three Pivot
Merge Sort – Stable and External Sort-ready
Heap Sort – In-place, Memory-Efficient
Radix Sort – Optimized for Timestamp-Based Structured Logs

The framework supports synthetic log generation, tracking of comparisons & swaps, and detailed performance visualization.

📁 Project Directory Structure

project_root/
│
├── src/
│   ├── algorithms/                # Sorting algorithm implementations
│   ├── log_generator/             # Generates log files in 4 patterns
│   ├── benchmarking/              # Metrics collector & dashboard visualizations
│   ├── optimizations/             # Advanced structures: multithreading, memory pool
│   └── stream_processing/         # Log stream handler
│
├── reports/                       # Results and generated plots
│   ├── benchmark_results.csv
│   └── plots/
│       ├── time_plot.png
│       ├── memory_data_plot.png
│       ├── memory_program_plot.png
│       ├── comparisons_plot.png
│       └── swaps_plot.png
│
├── main.py                        # Entry point: log generation, benchmarking, and plotting
├── plot_benchmark_results.py      # Optional CLI script to generate graphs from CSV
├── Log_Sorting_Demo_AutoPlot.ipynb# Jupyter notebook to demo and visualize all metrics
└── requirements.txt               # Python dependencies

🚀 Key Features

🔁 Log file generator supports 4 patterns: random, sorted, reverse, partial
🧪 Benchmarks with: execution time, memory (data + program), comparisons, swaps
📊 Repeated trials (configurable) with CSV export
📈 Auto-generated performance plots with matplotlib and seaborn
🧠 Visual and statistical comparison across algorithms and patterns

⚡ Getting Started

Prerequisites

Python 3.11+
Install dependencies:

pip install -r requirements.txt

Running the Full Benchmarking Workflow

python main.py

This will:

Generate logs in all 4 patterns
Apply all 4 sorting algorithms
Measure metrics and store them in benchmark_results.csv
Automatically generate plots into reports/plots/

📈 Visualizing Results Independently

python plot_benchmark_results.py

This reads reports/benchmark_results.csv and generates:

time_plot.png
memory_program_plot.png
memory_data_plot.png
comparisons_plot.png
swaps_plot.png

🧪 Example: Generate One Log Type Manually

from src.log_generator.log_generator import LogGenerator
LogGenerator().generate_log_file("logs/random.txt", 10000, pattern='random')

🔍 Example: Custom Benchmark with Tracker

from src.algorithms.quick_sort import quick_sort
from src.benchmarking.metrics_collector import MetricsCollector, OperationTracker

logs = [...]  # your log data
tracker = OperationTracker()
metrics = MetricsCollector()
sorted_data = metrics.measure(quick_sort, logs, tracker)
print(metrics.results)

📓 Jupyter Demo

Use the notebook:

jupyter notebook Log_Sorting_Demo_AutoPlot.ipynb

This demonstrates the entire benchmarking and plotting flow interactively.

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Fork the repo, make improvements, and submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparative Analysis of Sorting Algorithms for Large-Scale Log File Processing

📌 Project Overview

📁 Project Directory Structure

🚀 Key Features

⚡ Getting Started

Prerequisites

Running the Full Benchmarking Workflow

📈 Visualizing Results Independently

🧪 Example: Generate One Log Type Manually

🔍 Example: Custom Benchmark with Tracker

📓 Jupyter Demo

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
logs		logs
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Log_Sorting_Demo_AutoPlot.ipynb		Log_Sorting_Demo_AutoPlot.ipynb
README.md		README.md
Sorting-Algorithms-Log-Processing.code-workspace		Sorting-Algorithms-Log-Processing.code-workspace
logs.txt		logs.txt
main.py		main.py
plot_benchmark_results.py		plot_benchmark_results.py

License

ragbhard/Sorting-Algorithms-Log-Processing

Folders and files

Latest commit

History

Repository files navigation

Comparative Analysis of Sorting Algorithms for Large-Scale Log File Processing

📌 Project Overview

📁 Project Directory Structure

🚀 Key Features

⚡ Getting Started

Prerequisites

Running the Full Benchmarking Workflow

📈 Visualizing Results Independently

🧪 Example: Generate One Log Type Manually

🔍 Example: Custom Benchmark with Tracker

📓 Jupyter Demo

📄 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages