Skip to content

mcgizzle/Distributed-Argon

Repository files navigation

Distributed Argon

A distributed implementation of argon built using Cloud Haskell with a PostgreSQL database.

About

Distributed-Argon uses cloud haskell, implementing the work-stealing and the master/slave algorithm, for distributing the workload of argon, a library which measures code complexity.

The program accepts a GitHub repository and then calculates the complexity for every file of every commit in the project, storing the results in a database. I created another repository Charting-Complexity to generate the graphs.

Implementation

I decided to implement two algorithms and graph their results against eachother.

  1. Work-Stealing

A worker nodes steal work from the manager. the manager sends each file on a first-come-first-serve basis. The workers evaluate the complexity, return the result and request more work from the manager. This implementation is often referred to as the self-scheduling or work-stealing pattern.

Link to implementation in the source

  1. Master/Slave

A manager node decides on the distribution of the work. the manager splits up the work evenly (per-file basis) and distributes an even amount to each worker.

Link to implementation in the source

The manager stores the results it receives from the workers in a database as they come in non-deterministically.

Discussion

As I would have expected, the work-stealing pattern was a faster approach on average. This can be seen from the sample results provided below. Rather than the manager sending files, and the workers waiting, it is faster for the manager to send work to whoever is ready. In the master/slave there is the potential for lost working time while a manager is waiting for a worker to finish some previous task. This does not occur with the work-stealing pattern however, as the manager simply sends the work to whoever is ready.

Results

Work-Stealing

Master/Slave

The database

A PostgreSQL database is used to store the revelant information relating to a repositories complexity and the time taken with various amounts of nodes. There a database maintains two

Relations

Repository

Id Url Nodes Start Time End time
1 https://github.com... 2 2017-11-26 15:02:36.830273+00 2017-11-26 15:03:25.63044+00

Commit Info

Id Commit Start Time End Time
1 22939d... 2017-11-26 15:02:36.830273+00 2017-11-26 15:02:36.830273+00

Commit Results

Id Commit File Path Complexity
1 22939d... Distributed-Argon/src/Lib.hs JSON data

Prerequisites

To build with stack

stack build

To run

Fire up two shells and execute the following scripts.

Start the worker nodes

bash workers.sh

Start the mananger node

bash run.sh <Github Repository> <pattern> The patterns can be work-stealing or master-slave

Note

The number of workers, host address and port numbers, can be edited by altering the worker.sh and manager.sh scripts.

Viewing the results

I have built a graphical display of the results using Chart.js. A link to that repo can be found here

Alternatively, as all the necessary information is stored in a database, it can therefore be manipulated in any way you see fit.

Thanks

To all the argon contributors for allowing me to display my distributed programming skills with their great library!

About

A distributed implementation of argon built using Cloud Haskell.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published