Skip to content

hailinkim/CS273_FinalProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Bellman-Ford Algorithm

This document compares/contrasts three parallel versions of the Bellman-Ford algorithm, highlighting key implementation decisions and performance observations.

What is Bellman-Ford?

The Bellman-Ford algorithm is a graph search algorithm that calculates the shortest paths from a single source vertex to all other vertices in a weighted graph. It is capable of handling graphs with negative weight edges, distinguishing it from algorithms like Dijkstra's, which cannot properly handle negative weights. By iteratively relaxing the edges of the graph, the Bellman-Ford algorithm efficiently updates the shortest path estimates until it achieves the final shortest path values or detects a negative weight cycle.

Framework for Parallelization

  • Parallelizable Tasks: The tasks involving the examination of outgoing neighbors for each vertex emerged as prime candidates for parallel execution. By dividing the vertices among multiple threads, each thread could independently assess the neighbors of its allocated subset of vertices.

  • What can't be parallelized: We recognized that the algorithm's iterative nature, the sequential update of the distance array in the outer loop, cannot be parallelized. This sequentiality ensures the integrity of the shortest path calculations.

Approach 1: Futures

  • Design: Utilized futures to synchronize tasks, with the outer loop executed sequentially.
  • Implementation: Tasks were submitted for each iteration of the outer loop, examining neighboring vertices of specific subsets.
  • Observation: Realized that futures were unnecessary due to the inherent sequential execution enforced by the outer loop's structure. This led to concurrently assigning subsets of vertices to tasks within the same iteration, eliminating the wait for individual task completion.

Appraoch 2: One-Thread-Per-Task

  • Design: Attempted parallel execution by creating a thread for each task.
  • Implementation: Threads were assigned tasks in the inner loop but executed them outside the loop, requiring synchronization through thread joining after task completion.
  • Challenges: Significant overhead from high thread count and synchronization, leading to the slowest performance among the attempted methods.

Approach 3: Thread Pools

  • Design: Employed thread pools to match the number of available processors, optimizing task execution.
  • Implementation: Used ExecutorService for task management, assigning tasks to the thread pool for immediate execution.
  • Performance:
    • Smaller Graphs: Setting tasks equal to the number of available processors showed improved performance, avoiding unnecessary task fragmentation.
    • Larger Graphs: Allocating more than 116 tasks enhanced performance, suggesting that dividing the workload into many smaller tasks allowed threads to efficiently process remaining tasks.
  • Key Insight: Subdividing workloads and utilizing a flexible thread pool model adapted to graph size and processor availability significantly improved algorithm efficiency.

Experiment Results (Tested on Amherst College High-Performance Computing Cluster Using 116 Cores)

Alt text

Runtime Comparison for Thread Pool Approach across Graph Sizes:

Alt text for the image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors