# Simple usage of Dask vectors and Dask operators

@Author: Ettore Biondi - ettore88@stanford.edu

In this notebook, we describe the usage of the Dask-based classes. These objects are designed to take advantage of computational power of computer clusters composed of multiple nodes. To this end, we employ the existing classes in combination of Dask (https://dask.org/). We show the syntax with which a user can instantiate Dask-based objects from existing constructors using a local Dask cluster. The same syntax applies to the other supported Dask clusters.

### Importing necessary libraries

In [1]:
import numpy as np
import occamypy
#Plotting library
import matplotlib
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
# %matplotlib inline
params = {
    'image.interpolation': 'nearest',
    'image.cmap': 'gray',
    'savefig.dpi': 300,  # to adjust notebook inline plot size
    'axes.labelsize': 14, # fontsize for x and y labels (was 10)
    'axes.titlesize': 14,
    'font.size': 14,
    'legend.fontsize': 14,
    'xtick.labelsize': 14,
    'ytick.labelsize': 14,
    'text.usetex':True
}
matplotlib.rcParams.update(params)



### Starting a Dask cluster and client
Let's start by starting a local Dask client and show how to get some information from such object. We are going to start 4 workers.

In [2]:
help(occamypy.DaskClient)

Help on class DaskClient in module occamypy.dask.utils:

class DaskClient(builtins.object)
 |  Class useful to construct a Dask Client to be used with Dask vectors and operators
 |  
 |  Methods defined here:
 |  
 |  __init__(self, **kwargs)
 |      Constructor for obtaining a client to be used when Dask is necessary
 |      1) Cluster with shared file system and ssh capability:
 |      :param hostnames : - list; list of strings containing the host names or IP addresses of the machines that
 |      the user wants to use in their cluster/client (First hostname will be running the scheduler!) [None]
 |      :param scheduler_file_prefix : string; prefix to used to create dask scheduler-file.
 |      :param logging : - boolean; Logging scheduler and worker stdout to files within dask_logs folder [True]
 |      Must be a mounted path on all the machines. Necessary if hostnames are provided [$HOME/scheduler-]
 |      2) Local cluster:
 |      :param local_params : - dict; dictionary contain

In [3]:
client_params = {"processes":True}
client = occamypy.DaskClient(local_params=client_params,n_wrks=4)

In [4]:
print("Number of workers = %d"%client.getNworkers())
print("Workers Ids = %s"%client.getWorkerIds())

Number of workers = 4
Workers Ids = ['tcp://127.0.0.1:54403', 'tcp://127.0.0.1:54404', 'tcp://127.0.0.1:54405', 'tcp://127.0.0.1:54409']


### Dask vectors
Now that we have a Dask client, we can instantiate vectors using the Dask interface. The currently supported methods to create such objects are the following:
1. Instantiate a vector template and spread it using the chunk parameter
2. Instantiate multiple vectors and spreading them to the given workers

In [5]:
# Method 1
vec_temp = occamypy.VectorIC((200,300))
chunks = (3,4,6,2) # 3 vectors to worker 1; 4 vectors to worker 2; ...
vecD = occamypy.DaskVector(client, vector_template=vec_temp, chunks=chunks)

vecD inherits all the methods from the abstract vector class. Let's try some of them.

In [6]:
# shape
print("List of shapes: %s"%vecD.shape)
# Randomize
vecD.rand()
# Norm
print("Dask vector norm = %s"%vecD.norm())
# Scaling
vecD.scale(10)
print("Scaled Dask vector norm = %s"%vecD.norm())
# Cloning
vecD1 = vecD.clone()
# Summing two vectors
vecD1+vecD
# Check norm
print("Sum Dask vector norm = %s"%vecD1.norm())

List of shapes: [(300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200), (300, 200)]
Dask vector norm = 547.4544448450604
Scaled Dask vector norm = 5474.544448450603
Sum Dask vector norm = 10949.088896901207


The Dask vector contains a list of the future objects pointing to the vector chunks. Let's see how to see which worker has a given chunk.

In [7]:
print("Future object to first chunk: %s"%vecD.vecDask[0])
print("Worker having given chunk: %s"%client.getClient().who_has(vecD.vecDask[0]))

Future object to first chunk: <Future: status: finished, type: VectorIC, key: call_clone-418c8d68-fb46-4138-b7b5-f6ca69905d07>
Worker having given chunk: {'call_clone-418c8d68-fb46-4138-b7b5-f6ca69905d07': ('tcp://127.0.0.1:54403',)}


Let's now create a vector using a different Dask-vector constructor. Here, we instantiate all the chunks and then spread them onto the given workers.

In [8]:
vec1 = occamypy.VectorIC((200,300))
vec2 = occamypy.VectorIC((10,30)) 
vec3 = occamypy.VectorIC((250,1))
# We use the parameter chunks to select which worker will have a given vector instance
vecD = occamypy.DaskVector(client, vectors=[vec1,vec2,vec3], chunks=(1,1,0,1))

Let's try similar tests as before.

In [9]:
# shape
print("List of shapes: %s"%vecD.shape)
# Randomize
vecD.rand()
# Norm
print("Dask vector norm = %s"%vecD.norm())
# Scaling
vecD.scale(10)
print("Scaled Dask vector norm = %s"%vecD.norm())
# Cloning
vecD1 = vecD.clone()
# Summing two vectors
vecD1+vecD
# Check norm
print("Sum Dask vector norm = %s"%vecD1.norm())
print("Future object to third chunk: %s"%vecD.vecDask[2])
print("Worker having given chunk: %s"%client.getClient().who_has(vecD.vecDask[2]))

List of shapes: [(300, 200), (30, 10), (1, 250)]
Dask vector norm = 141.5051962280391
Scaled Dask vector norm = 1415.051962280391
Sum Dask vector norm = 2830.103924560782
Future object to third chunk: <Future: status: finished, type: VectorIC, key: VectorIC-5fc6c93802952e9fd5eca53fdf64897f>
Worker having given chunk: {'VectorIC-5fc6c93802952e9fd5eca53fdf64897f': ('tcp://127.0.0.1:54409',)}


### Dask operators
Now, let's try to instantiate Dask operators. These kind of objects are pretty useful when large-scale problems have to be solved. The main idea behind the interface is to pass a given operator constructor and the necessary parameters so that the object is directly instantiated within the Dask workers of a client.

In [10]:
# Construct a simple scaling operator acting on each chunk of a Dask Vector
vec = occamypy.VectorIC((100,25))
chunks = (2,3,5,10)
sc = 10.0
vecD = occamypy.DaskVector(client, vector_template=vec, chunks=chunks)
# Creating list of lists of the arguments for the operator's constructor
scal_op_args = [(vec_i, sc) for vec_i in vecD.vecDask]

# Instantiating Dask operator
scaleOpD = occamypy.DaskOperator(client, occamypy.scalingOp, scal_op_args, chunks)

Similarly to the Dask vector class, a Dask operator object inherits all the methods from the corresponding abstract class. Let's try some of those methods.

In [11]:
# Dot-product test
scaleOpD.dotTest(True)
# Power method
max_eig = scaleOpD.powerMethod()
print("\nMaximum eigenvalue = %s"%max_eig)

Dot-product tests of forward and adjoint operators
--------------------------------------------------
Applying forward operator add=False
 Runs in: 0.11086535453796387 seconds
Applying adjoint operator add=False
 Runs in: 0.11270880699157715 seconds
Dot products add=False: domain=2.459442e+02 range=2.459442e+02 
Absolute error: 3.126388e-13
Relative error: 1.271178e-15 

Applying forward operator add=True
 Runs in: 0.12662506103515625 seconds
Applying adjoint operator add=True
 Runs in: 0.12821316719055176 seconds
Dot products add=True: domain=4.918885e+02 range=4.918885e+02 
Absolute error: 3.979039e-13
Relative error: 8.089312e-16 

-------------------------------------------------

Maximum eigenvalue = 10.000000000000004


Let's now try to apply this Dask operator.

In [12]:
vecD.rand()
vecD1 = scaleOpD.getRange().clone()
scaleOpD.forward(False, vecD, vecD1)
print("Norm of the input = %s"%vecD.norm())
print("Norm of the output = %s"%vecD1.norm())

Norm of the input = 128.78129341595502
Norm of the output = 1287.8129341595502


Finally, let's combine an operator that spreads and collects a local vector onto a Dask-vector chunks. Such operator is useful when the same vector is employed multiple times on different operators embarrassingly-parallelizable.

In [13]:
S = occamypy.DaskSpreadOp(client, vec, chunks)
S.dotTest(True) # checking dot-product

Dot-product tests of forward and adjoint operators
--------------------------------------------------
Applying forward operator add=False
 Runs in: 0.30388712882995605 seconds
Applying adjoint operator add=False
 Runs in: 0.0908358097076416 seconds
Dot products add=False: domain=3.663421e+01 range=3.663421e+01 
Absolute error: 2.131628e-14
Relative error: 5.818683e-16 

Applying forward operator add=True
 Runs in: 0.2698941230773926 seconds
Applying adjoint operator add=True
 Runs in: 0.07140302658081055 seconds
Dot products add=True: domain=7.326841e+01 range=7.326841e+01 
Absolute error: 1.421085e-14
Relative error: 1.939561e-16 

-------------------------------------------------


In [14]:
#Chain of scaling and spreading operator
scale_S = scaleOpD*S
scale_S.dotTest(True) # checking dot-product
# Testing product of Dask Operators
x = vec.rand()
y = scale_S.getRange().clone()
scale_S.forward(False,x,y)
print("\nFirst element of x = %s"%x.getNdArray()[0,0])
print("First element of y = %s"%y.getNdArray()[0][0,0])

Dot-product tests of forward and adjoint operators
--------------------------------------------------
Applying forward operator add=False
 Runs in: 0.48972487449645996 seconds
Applying adjoint operator add=False
 Runs in: 0.2069098949432373 seconds
Dot products add=False: domain=4.147137e+02 range=4.147137e+02 
Absolute error: 6.252776e-13
Relative error: 1.507733e-15 

Applying forward operator add=True
 Runs in: 0.4320719242095947 seconds
Applying adjoint operator add=True
 Runs in: 0.23174500465393066 seconds
Dot products add=True: domain=8.294273e+02 range=8.294273e+02 
Absolute error: 1.591616e-12
Relative error: 1.918933e-15 

-------------------------------------------------

First element of x = 0.624938433540196
First element of y = 6.24938433540196
