Skip to content

heyfey/Parallel-Programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel programming

course at NTHU CS 2019 fall

  • MPI
  • optimized algorithm to minimize message size
  • asynchronous communication (non-blocking send/recv)

hw2/hw2_hybrid_dynamic_p_v.c

  • MPI + pthreads + OpenMP
  • leader/follower architecture
  • load balance with dynamic scheduling
  • overlapped computing and file writing
  • vetorization with Intel SSE3 (SIMD)
  • OpenMP
  • implemented blocked-Floyd-Warshall algorithm to utilize cache locality

4. all-pairs shortest path (CUDA)

4-1: single-GPU

  • utilized NVIDIA Pascal GPU memory hierarchy : shared memory, registers
  • fine-tuned block size and kernel size
  • resolved bank conflicts

4-2: multi-GPU

  • minimized peer-to-peer communication

About

Course 10810CS542200 by jchou

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published