Skip to content

faizan98515/Image-processing-using-Sequential-Parallel-and-distributed-computing

Repository files navigation

Image Processing Performance Analysis

This project compares Sequential, Parallel, and Distributed image preprocessing techniques using Python.


📂 Folder Structure

images_dataset/
│
├── cars/
├── Cat/
├── dogs/
└── Flowers/

Each folder contains class images that are processed (resized + watermarked).


🧩 Tasks Overview

Task 1 — Sequential Processing

Reads all images, resizes them to 128×128, adds a watermark, and saves them to output_seq/.

Run Command:

python sequential_process.py

Output Example:

Sequential Processing Time: 0.23 seconds

Task 2 — Parallel Processing

Performs the same operations in parallel using multiple worker processes.

Run Command:

python parallel_process.py

This script uses Python’s concurrent.futures.ProcessPoolExecutor to test configurations with 1, 2, 4, and 8 workers.

Actual Output:

Workers Time (s) Speedup
1 0.59 1.00x
2 0.68 0.87x
4 1.15 0.52x
8 1.31 0.45x

Results are saved in output_parallel/.


Task 3 — Simulated Distributed Task

Simulates a distributed environment using multiprocessing.Manager() and logical “nodes” within one system. Each node processes half of the dataset and reports its time.

Run Command:

python distributed_process.py

Actual Output:

Node 1 processed 47 images in 0.13s
Node 2 processed 47 images in 0.12s
Total distributed time: 0.61s
Efficiency: 0.38x over sequential

Task 4 — Report Generation

Generates a short performance report comparing all methods and configurations.

Run Command:

python generate_report.py

This creates a file named report.pdf summarizing:

  • Execution time comparison
  • Speedup table
  • Best configuration
  • Discussion on performance and bottlenecks

📊 Results Summary

Mode Configuration Time (s) Speedup
Sequential 0.23 1.00x
Parallel 1 Worker 0.59 0.39x
Parallel 2 Workers 0.68 0.34x
Parallel 4 Workers 1.15 0.20x
Parallel 8 Workers 1.31 0.18x
Distributed 2 Nodes 0.61 0.38x

Best Configuration

Sequential execution achieved the best performance at 0.23 seconds.

Parallel and distributed versions were slower due to:

  • Small dataset size
  • Lightweight operations (resize + watermark)
  • High process and I/O overhead compared to actual computation time

For larger datasets or CPU-intensive image processing, parallelism and distribution would likely outperform sequential execution.


💬 Discussion

Although parallelism is designed to improve performance, in this case, it did not provide speedup due to the nature of the workload.

  • The dataset was small, and image operations were I/O-bound (reading/writing files).
  • Multiprocessing overhead (process creation, data transfer) outweighed benefits.
  • The sequential version avoided these costs, completing faster overall.

Remaining Bottlenecks

  • Process initialization overhead for each worker
  • Disk I/O contention during concurrent file access
  • Limited CPU workload per image
  • Python multiprocessing overhead on small tasks

Possible Improvements

  • Use ThreadPoolExecutor for I/O-heavy workloads
  • Increase dataset size or apply heavier transformations
  • Store data on SSD/RAM disk for faster I/O
  • Explore GPU acceleration (e.g., CuPy, CUDA) or frameworks like Dask/Ray

🖼️ Sample Outputs

Example processed images are stored in the sample_output/ folder, showing:

  • Resized and watermarked images from each mode:
    • output_seq/
    • output_parallel/
    • output_distributed/

📄 Report

A detailed report file is included:
📘 report.pdf


👨‍💻 Author

Muhammad Faizan Sajid
Python Image Processing — Performance Benchmark Project (2025)


⚖️ License

This project is open for educational and research use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages