Skip to content

ragbhard/Sorting-Algorithms-Log-Processing

Repository files navigation

Comparative Analysis of Sorting Algorithms for Large-Scale Log File Processing

📌 Project Overview

This project provides a complete benchmarking and visualization framework to evaluate sorting algorithms applied to large-scale log file processing. It extends traditional comparisons by capturing detailed metrics, enabling statistical validation, and supporting multiple log patterns:

  • Quick Sort – Optimized with Median-of-Three Pivot
  • Merge Sort – Stable and External Sort-ready
  • Heap Sort – In-place, Memory-Efficient
  • Radix Sort – Optimized for Timestamp-Based Structured Logs

The framework supports synthetic log generation, tracking of comparisons & swaps, and detailed performance visualization.


📁 Project Directory Structure

project_root/
│
├── src/
│   ├── algorithms/                # Sorting algorithm implementations
│   ├── log_generator/             # Generates log files in 4 patterns
│   ├── benchmarking/              # Metrics collector & dashboard visualizations
│   ├── optimizations/             # Advanced structures: multithreading, memory pool
│   └── stream_processing/         # Log stream handler
│
├── reports/                       # Results and generated plots
│   ├── benchmark_results.csv
│   └── plots/
│       ├── time_plot.png
│       ├── memory_data_plot.png
│       ├── memory_program_plot.png
│       ├── comparisons_plot.png
│       └── swaps_plot.png
│
├── main.py                        # Entry point: log generation, benchmarking, and plotting
├── plot_benchmark_results.py      # Optional CLI script to generate graphs from CSV
├── Log_Sorting_Demo_AutoPlot.ipynb# Jupyter notebook to demo and visualize all metrics
└── requirements.txt               # Python dependencies

🚀 Key Features

  • 🔁 Log file generator supports 4 patterns: random, sorted, reverse, partial
  • 🧪 Benchmarks with: execution time, memory (data + program), comparisons, swaps
  • 📊 Repeated trials (configurable) with CSV export
  • 📈 Auto-generated performance plots with matplotlib and seaborn
  • 🧠 Visual and statistical comparison across algorithms and patterns

⚡ Getting Started

Prerequisites

  • Python 3.11+
  • Install dependencies:
pip install -r requirements.txt

Running the Full Benchmarking Workflow

python main.py

This will:

  • Generate logs in all 4 patterns
  • Apply all 4 sorting algorithms
  • Measure metrics and store them in benchmark_results.csv
  • Automatically generate plots into reports/plots/

📈 Visualizing Results Independently

python plot_benchmark_results.py

This reads reports/benchmark_results.csv and generates:

  • time_plot.png
  • memory_program_plot.png
  • memory_data_plot.png
  • comparisons_plot.png
  • swaps_plot.png

🧪 Example: Generate One Log Type Manually

from src.log_generator.log_generator import LogGenerator
LogGenerator().generate_log_file("logs/random.txt", 10000, pattern='random')

🔍 Example: Custom Benchmark with Tracker

from src.algorithms.quick_sort import quick_sort
from src.benchmarking.metrics_collector import MetricsCollector, OperationTracker

logs = [...]  # your log data
tracker = OperationTracker()
metrics = MetricsCollector()
sorted_data = metrics.measure(quick_sort, logs, tracker)
print(metrics.results)

📓 Jupyter Demo

Use the notebook:

jupyter notebook Log_Sorting_Demo_AutoPlot.ipynb

This demonstrates the entire benchmarking and plotting flow interactively.


📄 License

This project is licensed under the MIT License – see the LICENSE file for details.


🤝 Contributing

Contributions are welcome! Fork the repo, make improvements, and submit a pull request.

About

Comparative Analysis of Sorting Algorithms for Large-Scale Log File Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published