GitHub - ssarip1/BigJob-Azure: working code

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
BigJobService		BigJobService
BigjobAzureAgent		BigjobAzureAgent
_UpgradeReport_Files		_UpgradeReport_Files
remd		remd
winazurestorage		winazurestorage
BigJobService.sln		BigJobService.sln
BigJobService.suo		BigJobService.suo
README		README
ReadMe		ReadMe
UpgradeLog.XML		UpgradeLog.XML
WAPMMC20110505.exe		WAPMMC20110505.exe
__init__.py		__init__.py
azurecheck.pem		azurecheck.pem
bigjob_azure.conf		bigjob_azure.conf
bigjob_azure.py		bigjob_azure.py
bigjob_azure.pyc		bigjob_azure.pyc
example-azure-performance.py		example-azure-performance.py
example-azure.py		example-azure.py

Repository files navigation

Background:
Many applications are computationally intensive, requiring hundreds, if not thousands of processors. Common and not so common applications include
weather forecasting, molecular modeling, simulations, banking, etc. One such application is Replica-Exchange Molecular Dynamics (RE) . The RE methods
represent a class of algorithms that involve a large number of loosely coupled ensembles, called replicas. By loosely coupled, we mean that the replicas need to
communicate with each other and are bound to one-another in some way. The replicas are essentially the same simulation, but with minor di�erences, such as
temperature, etc. RE simulations are used to understand a range of physical phenomena ranging from protein folding dynamics to binding a�nity calculations.

In RE, each replica could be a molecular-dynamics application. RE is a bit more complicated than a straightforward computationally intensive application,
in that after each run the replicas communicate with other replicas and exchange information such as energy or temperature and restart. There could be tens or
hundreds of coupled-ensembles in a simulation. In addition, each MD application typically needs a lot of processing power. To solve this kind of computation-
ally intensive problems, researchers use powerful computers called supercomputers or high-performance computers (HPC). These machines are very powerful and
expensive. Government, military and corporations commission special-purposemachines, but individual researchers usually cannot a�ord them. Individual re-
searchers use shared resources, provided by universities and national scientific research grids such as the FutureGrid, Teragrid , LONI, etc. The shared
resources operate under certain policies and guidelines, which the user needs tofollow.

My Work:
I have evaluated the performance of the different algorithms and implementations on Microsoft Windows Azure. Due to limited number of cores availability
(20) on windows-azure, We have run the experiments by scaling-up the number of replicas (up to 20) upon keeping number of replicas to machines proportionate.
As the number of replicas increase, in the synchronous RE, the synchronization cost increases the total time to completion. In the centralized, asynchronous RE,
the cost of managing many replicas in a centralized manner increases the time to completion but not as much as in the synchronous RE. The decentralized asyn-
chronous RE scales much better with increasing number of replicas.

We have also run experiments of centralized-synchronous and centralized asynchronous replica-exchange algorithms on Futuregrid machines and analyzed the
runtimes. Obviously, asynchronous replica-exchange outperformed synchronous replica-exchange as we scaled up replicas on future grid machines.
Even though the main focus of my project is implementing the replica-exchange algorithms on windows azure, I would like to di�erentiate the underlying abstrac-
tions that Azure provides through infrastructure, messaging mechanism and queuing times compared to Futuregrid environment.