This repo covers various parallel programming techniques such as pthread, OpenMP, MPI, vectorization, and CUDA.
- Achieved a 2x speedup on a single machine using pthread (1 process vs. 8 processes).
- Improved performance by 1.12x on multiple machines using MPI (1 machine vs. 4 machines).
- Enhanced performance by 6x using multi-threading with pthread (1 core vs. 12 cores).
- Achieved a 6x speedup with MPI (2 machines vs. 12 machines).
- Utilized vectorization and OpenMP to achieve an 8x performance improvement compared to a single-core CPU.
- Implemented with CUDA and achieved a 160x performance improvement over the CPU version.
- Reduced execution time by 50% by running on 2 GPUs with OpenMP.