# Alchemist: A Python <=> MPI Interface

We start the Alchemist tutorial by playing around with its Python interface. The Python interface itself is running on a single core, but connects to Alchemist, which is running on multiple machines/cores, to do its distributed computations.

Let us use Alchemist to perform some distributed operation on a dataset. In line with what we have seen in the tutorials for the Ristretto package, we will compute the first 200 singular values of a large dataset, the 36GB ocean temperature HDF5 file. In Ristretto we saw how we can do this by loading chunks of the data into memory, in this case we will be sending the data to Alchemist (in chunks), and then we will fetch the results.

## Starting Alchemist

Let us start Alchemist:

1) Go to the Jupyter tab in your browser (the one to the left of this one)

2) Click on the 'New' button (top right)

2) Select 'Terminal' under 'Other' - this will open a new tab with a terminal 

3) Enter 'cd /usr/local/Alchemist'

4) Enter './start.sh'

This will start Alchemist on your machine. You should see some preamble, concluding with "Accepting connections ...". This means that Alchemist is ready.

## Connecting to Alchemist

We now ignore the terminal and work in this notebook. First, let us load some dependencies:

In [1]:
from alchemist import *
import numpy as np

Next, we need to start an AlchemistSession instance. 

In [2]:
als = AlchemistSession()                  # Start AlchemistSession instance

Starting Alchemist session
Alchemist session ready


Connect to Alchemist server at address "0.0.0.0" and port 24960. 

Note 1: The Alchemist server would normally not be running locally.

Note 2: The Alchemist interface would usually read this information automatically from a file provided by the network administrator, but for the purposes of this tutorial we input the information directly.

In [3]:
address = "0.0.0.0"
port = 24960                     # Enter the correct port number here

als.connect_to_alchemist(address, port)   

Connecting to Alchemist at 0.0.0.0:24960 ...
Connected to Alchemist!


Now that we're connected to Alchemist, we need to request some workers. In general, if Alchemist is running on N nodes, then one of those will be the driver node, so there will be N-1 worker nodes. Some of these nodes could be used by other jobs, so we can ask Alchemist to list the status of the workers using the 'list_alchemist_workers()' command.

In [4]:
als.list_alchemist_workers()

List of workers:
    Worker-001 running on KaisMacBookPro.local at 0.0.0.0:24961 - idle
    Worker-002 running on KaisMacBookPro.local at 0.0.0.0:24962 - idle
    Worker-003 running on KaisMacBookPro.local at 0.0.0.0:24963 - idle



All nodes are currently available, which shouldn't be all too surprising since nobody else is using this particular Alchemist instance. Let's request some available workers.

In [5]:
num_workers = 2                         # Try a sensible number of workers here

als.request_workers(num_workers)        # Request 'num_workers' workers from Alchemist

Requesting 2 workers from Alchemist
Allocated 2 workers:
  Worker-1 on KaisMacBookPro.local at 0.0.0.0:24961
  Worker-2 on KaisMacBookPro.local at 0.0.0.0:24962
Connecting to Alchemist at 0.0.0.0:24961 ...
Connected to Alchemist!
Connecting to Alchemist at 0.0.0.0:24962 ...
Connected to Alchemist!


As you can see by the output, the interface automatically connects to the allocated workers. 

This isn't necessary, but we can check the connection by sending a test string from each workers and displaying the response by Alchemist.

In [6]:
als.workers.send_test_string()          # Send test string and display response

Sending test message: 'This is a test message from client 1'
Alchemist returned: 'Alchemist worker received: 'This is a test message from client 1''
Sending test message: 'This is a test message from client 1'
Alchemist returned: 'Alchemist worker received: 'This is a test message from client 1''


## Loading data and sending it to Alchemist

We are now ready to load some data and have Alchemist work on it. We'll be using the 'sstHD.hdf5' dataset used in the Ristretto tutorial. The 'read_from_hdf5' function in ALchemist requires the file name as input:

In [7]:
# Read H5 file
file_name = "/Users/kai/Downloads/NEON-DS-Imaging-Spectrometer-Data.h5"
# file_name = "/mnt/data/sstHD.hdf5"

f = als.read_from_hdf5(file_name)                # Fix this

Loaded /Users/kai/Downloads/NEON-DS-Imaging-Spectrometer-Data.h5


There's only one dataset in this particular HDF5 file, called 'sstHD', so let's extract it:

In [8]:
# sstHD = f['sstHD']
sstHD = np.float64(f['Reflectance'][:,:,20])

The dataset can now be sent from the interface to Alchemist for processing. As before, it is assumed to be too large to fit in memory, so behind the scenes the dataset is split up into chunks and sent to Alchemist in pieces

In [9]:
alA = als.send_hdf5(sstHD)

Sending matrix info to Alchemist ...
 
                         Client ID:           1
                         Session ID:          0
                         Command code:        16 (MATRIX_INFO)
                         Message body length: 885
                         ----------------------------------------------
                         Datatype (length):   SHORT (1)
                         Data:                1 
 
                         Datatype (length):   LONG (2)
                         Data:                426 502 
 
                         Datatype (length):   SHORT (426)
                         Data:                1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1

When a dataset gets sent to Alchemist, a matrix handle gets returned, in this case called 'alA'. This matrix handle includes meta-data such as the ID assigned to the dataset by Alchemist and the dimensions of the dataset. We can see this information using the handle's 'meta' function.

In [None]:
alA.meta()

Congratulations, you have successfully sent an HDF5 dataset to Alchemist! Now we can do something with it, but before that, we need to tell Alchemist what library has the MPI-based function that we want to use.

## Loading a library

To use Alchemist, we must have one or more MPI-based libraries in mind. We tell Alchemist to load the library using the 'load_library' command. 

Behind the scenes, every MPI-based library that Alchemist has access to is interfaced with an Alchemist-Library Interface (ALI), which is a shared object file that is loaded dynamically at run time by the workers allocated to the current job. In our case, we have 'num_workers' workers assigned to this job, and only these workers load the requested library. But we, as users, don't have to worry about any of this.

For the purposes of this tutorial, there is just one library available, a little testing library called 'TestLib' that has the distributed SVD implemented.

In [None]:
lib_name = "TestLib"

testlib = als.load_library(lib_name)    # The allocated workers load the library 'TestLib'

What gets returned here is a handle to the library that we have called 'testlib' in this case. This handle provides the Alchemist interface with meta-data about the library, such as available methods and their input and output parameters.

## Running a task on Alchemist

Now we can finally run the SVD ... again! Because we clearly didn't get enough of it in the Ristretto tutorial.

The 'truncated_SVD' method is in the 'TestLib' library. AlchemistSession has a method 'run_task' that takes in the library handle, the name of the method as a string, and a variable length list of parameters that are the input parameters for the method. 

In the case of the 'truncated_SVD' method, it must know what matrix to operate on, and the number of singular values it should compute (called the 'rank'), and these need to be provided, in that order. 

The output of 'run_task' is a tuple with all the output parameters of the method. In the case of the SVD, these are the factors U and V, and the singular values (stored in the vector S). But we're not ready to receive all the output, nor would we necessarily want to have all of it. Alchemist instead returns matrix handles to all output matrices, so in this case we'd have the matrix handles 'alU', 'alS', and 'alV'.

In [None]:
lib_handle = testlib
method_name = "truncated_SVD"
mat_handle = alA
k = 20

alU, alS, alV = als.run_task(lib_handle, method_name, mat_handle, k)

We'll pretend that we're only interested in the first k singular values, so we'll ignore the matrix handles to U and V. 

To get Alchemist to send the data to us, we call AlchemistSession's 'get_array' method with the appropriate matrix handle (in this case, 'alS').

In [None]:
S = als.get_array(alS)

We can now do whatever we want with these first k singular values. Let's plot them using matplotlib:

In [None]:
import matplotlib.pyplot as plt

print(S)

plt.plot(S)

We can also ask Alchemist to return subarrays to us. Let's ask it to return only the first 5 singular values:

In [None]:
S = als.get_array(alS, rows=range(k-5,k))

print(S)

## Disconnecting from and stopping Alchemist

Once done with Alchemist, it is important to stop the AlchemistSession instance using the 'stop()' command. This disconnects from Alchemist and frees up resources that Alchemist can then allocate to other jobs.

In [None]:
als.stop()

We are done with the current instance of Alchemist, so let us go back to the terminal in the other tab and stop this instance of Alchemist. There isn't an elegant way of stopping Alchemist (at least not from this interface), so we're going to have to kill it using brute force (i.e. Ctrl-C).

The intention is for Alchemist to keep running and be available for other jobs, but for the purposes of this tutorial, we're done with this instance.

For any questions regarding Alchemist, e-mail kai.rothauge@berkeley.edu