Dask on GridEngine system
Deploy a pool of Dask workers through SGE/PBS system using the drmaa library.
This package extends dask.distributed, a lightweight library for distributed computing in Python, which exports dask APIs to moderate sized clusters. One signle object will start scheduler and workers and will also takes care of stopping them once the work is done.
The default behavior is to run as many workers as requested with 1 thread each. The main class takes care of starting a scheduler similar to distributed.LocalCluster (which includes web monitor for instance) and submit an array of jobs to the SGE/PBS system to start dask-worker processes.
Note that the workers are using the command line dask-worker for simplicity.
A first example could be
sge = GridEngineScheduler() sge.start_pool(10) # ... work ... sge.stop_pool()
But the provided scheduler can also be used as context manager:
with GridEngineScheduler(nworkers=10) as sge: # ... work ....
pip install 'git+ssh://email@example.com/mfouesneau/dasksge.git'
This work was inspired by Matthew Rocklin's blog post