Skip to content

KnowSciEng/Hogwild

 
 

Repository files navigation

A Distributed version of Hogwild!

Team : Grégoire Clément, Maxime Delisle, Sylvain Beaud

Description

Nowadays, robust and reliable systems are a core component of a respectable setup, for this reason, we focused more particularly on this side of the problem.

One of the main highlights of our implementation is the possibility to add and remove workers at will and at any time. Indeed, in the synchronous implementation, the coordinator monitors the number of workers and if a worker crashes, the computation can continue without him. On the contrary, if the user want to add more workers to the system, the new workers will connect to the coordinator or other workers and the computation will continue with these additional workers. In the asynchronous version, a new worker arrives, it will retrieve the list of workers from another worker and broadcast its updates to them and receive their computations; this is the only phase where a locking mechanism is used. When a worker encounters an error, it broadcasts an error message to the other workers and they will stop to communicate with the faulty node.

Another interesting feature of our implementation is the fact that once the computations are finished, the logs and statistics are uploaded and stored on transfer.sh and can be downloaded for a later use. We have also put options to adjust the level of verbosity of the logs.

How to run the project

$ sh run.sh $1 $2 $3

$1 argument is either sync or async

$2 argument is the number of replicas 1 to 100 (or more)

$3 argument is the log level (or verbosity) from 0 (minimal) to 3 (maximal)

Results

Results are uploaded on transfer.sh (linked displayed in the console). In case of failure (if server transfer.sh is down) we also print them in the console (just to be sure!).

Reference

Hogwild! implementation : "A Lock-Free Approach to Parallelizing Stochastic Gradient Descent"

About

A distributed version of Hogwild! implementation : "A Lock-Free Approach to Parallelizing Stochastic Gradient Descent"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 56.5%
  • Scala 40.0%
  • Batchfile 2.1%
  • Shell 1.3%
  • Dockerfile 0.1%