# Dispy: Distributed Programming Python

Dispy is a higher-level framework that leverages `pycos`. To use it, you should first start an instance of `dispynode.py` on each node in your cluster.

In [1]:
# 'compute' is distributed to each node running 'dispynode'
def compute(n):
    import time, socket
    time.sleep(n)
    host = socket.gethostname()
    return (host, n)

In [2]:
MY_IP = '192.168.1.70'

In [3]:
import dispy

In [4]:
cluster = dispy.JobCluster(compute, ip_addr=MY_IP)

2020-05-22 14:39:55 pycos - version 4.8.15 with kqueue I/O notifier
2020-05-22 14:39:55 dispy - dispy client version: 4.12.2
2020-05-22 14:39:55 dispy - Storing fault recovery information in "_dispy_20200522143955"


In [5]:
import random
def schedule_jobs(cluster):
    jobs = [cluster.submit(random.randint(5, 20)) for i in range(10)]
    for i, job in enumerate(jobs):
        job.id = i
    return jobs

def await_jobs(jobs):
    # cluster.wait() # waits for all scheduled jobs to finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
        # other fields of 'job' that may be useful:
        # print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)

In [6]:
jobs = schedule_jobs(cluster)

In [7]:
await_jobs(jobs)

faramir executed job 0 at 1590183596.316693 with 15
faramir executed job 1 at 1590183596.321036 with 17
faramir executed job 2 at 1590183596.321804 with 8
faramir executed job 3 at 1590183596.3238232 with 8
faramir executed job 4 at 1590183596.324147 with 16
faramir executed job 5 at 1590183596.326897 with 10
faramir executed job 6 at 1590183596.329204 with 11
faramir executed job 7 at 1590183596.331839 with 7
faramir executed job 8 at 1590183603.347698 with 16
faramir executed job 9 at 1590183604.336363 with 5


In [8]:
cluster.print_status()


            Node |  CPUs |    Jobs |  Sec/Job | Node Time Sec |    Sent |    Rcvd
---------------------------------------------------------------------------------
         faramir |     8 |      10 |     11.3 |         113.1 |   1.8 K |   2.4 K

Total job time: 113.078 sec, wall time: 23.832 sec, speedup: 4.745



## Managing clusters with a web front-end

In [9]:
import dispy.httpd
http_server = dispy.httpd.DispyHTTPServer(cluster)

2020-05-22 14:40:26 dispy - Started HTTP server at ('0.0.0.0', 8181)


In [10]:
jobs = schedule_jobs(cluster)
await_jobs(jobs)

faramir executed job 0 at 1590183646.701706 with 17
faramir executed job 1 at 1590183646.709685 with 10
faramir executed job 2 at 1590183646.710319 with 13
faramir executed job 3 at 1590183646.713085 with 15
faramir executed job 4 at 1590183646.7135282 with 20
faramir executed job 5 at 1590183646.715692 with 6
faramir executed job 6 at 1590183646.716566 with 9
faramir executed job 7 at 1590183646.71979 with 6
faramir executed job 8 at 1590183652.731968 with 10
faramir executed job 9 at 1590183652.736709 with 10


In [11]:
cluster.close()

True

In [12]:
http_server.shutdown()

2020-05-22 14:42:01 dispy - HTTP server waiting for 10 seconds for client updates before quitting


## Transferring dependencies

Jobs can also specify Python objects or files on which the depend:

In [19]:
def make_df():
    import pandas as pd
    return pd.read_csv('./data/states.csv')

In [25]:
cluster = dispy.JobCluster(
    make_df, ip_addr='192.168.1.90', 
    depends=['data/states.csv']
)

In [26]:
job = cluster.submit()

In [27]:
job()

Unnamed: 0,Abbreviation,state,area,pop
0,AL,Alabama,135767,4874747.0
1,AK,Alaska,1723337,739795.0
2,AZ,Arizona,295234,7016270.0
3,AR,Arkansas,137732,3004279.0
4,CA,California,423967,39536653.0
5,CO,Colorado,269601,5607154.0
6,CT,Connecticut,14357,3588184.0
7,DE,Delaware,6446,961939.0
8,FL,Florida,170312,20984400.0
9,GA,Georgia,153910,10429379.0


In [28]:
cluster.print_status()


            Node |  CPUs |    Jobs |  Sec/Job | Node Time Sec |    Sent |    Rcvd
---------------------------------------------------------------------------------
         faramir |     8 |       1 |      0.4 |           0.4 |   1.9 K |   5.6 K

Total job time: 0.415 sec, wall time: 2.870 sec, speedup: 0.145



In [29]:
cluster.close()

True

# Lab

Open [dispy lab](./dispy-lab.ipynb)