Parallel programming

course at NTHU CS 2019 fall

1. odd-even sort

MPI
optimized algorithm to minimize message size
asynchronous communication (non-blocking send/recv)

2. Mandelbrot set

hw2/hw2_hybrid_dynamic_p_v.c

MPI + pthreads + OpenMP
leader/follower architecture
load balance with dynamic scheduling
overlapped computing and file writing
vetorization with Intel SSE3 (SIMD)

3. all-pairs shortest path (cpu)

OpenMP
implemented blocked-Floyd-Warshall algorithm to utilize cache locality

4. all-pairs shortest path (CUDA)

4-1: single-GPU

utilized NVIDIA Pascal GPU memory hierarchy : shared memory, registers
fine-tuned block size and kernel size
resolved bank conflicts

4-2: multi-GPU

minimized peer-to-peer communication

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
hw1		hw1
hw2		hw2
hw3		hw3
hw4-1		hw4-1
hw4-2		hw4-2
lab1		lab1
lab3		lab3
lab4		lab4
lab5		lab5
.gitignore		.gitignore
README.md		README.md
lab1.tar.zst		lab1.tar.zst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel programming

1. odd-even sort

2. Mandelbrot set

3. all-pairs shortest path (cpu)

4. all-pairs shortest path (CUDA)

4-1: single-GPU

4-2: multi-GPU

About

Uh oh!

Releases

Packages

Languages

heyfey/Parallel-Programming

Folders and files

Latest commit

History

Repository files navigation

Parallel programming

1. odd-even sort

2. Mandelbrot set

3. all-pairs shortest path (cpu)

4. all-pairs shortest path (CUDA)

4-1: single-GPU

4-2: multi-GPU

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages