# Using cloudknot to perform matrix-vector multiplication of random matrices

This example uses cloudknot to perform matrix-vector multiplication of some random matrices with varying standard deviations.

In [1]:
import cloudknot as ck
import logging
logger = logging.getLogger()
logger.setLevel(
    logging.DEBUG
)  # Change this to logging.DEBUG if you want more verbose output

First, we write the python script that we want to run on AWS batch. Note that we import the necessary python packages within the function `random_mv_prod`.

In [2]:
def random_mv_prod(b):    
    import numpy as np
    import pandas as pd
    import s3fs
    import json
    import logging
    import os.path as op
    import nibabel as nib
    import dipy.data as dpd
    import dipy.tracking.utils as dtu
    import dipy.tracking.streamline as dts
    from dipy.io.streamline import save_tractogram, load_tractogram
    from dipy.stats.analysis import afq_profile, gaussian_weights
    from dipy.io.stateful_tractogram import StatefulTractogram
    from dipy.io.stateful_tractogram import Space
    import dipy.core.gradients as dpg
    from dipy.segment.mask import median_otsu
    
    x = np.random.normal(0, b, 1024)
    A = np.random.normal(0, b, (1024, 1024))
    
    return np.dot(A, x)

Create a knot using the `random_mv_prod` function and a job definition memory of 128 MiB.

In [3]:
knot = ck.Knot(name='random-mv-prod-04', base_image="python:slim", func=random_mv_prod, memory=128, retries=3)

DEBUG:root:Found packages: {'json', 'logging', 'pickle', 'cloudpickle', 'os', 'nibabel', 'functools', 'dipy', 's3fs', 'boto3', 'numpy', 'argparse', 'pandas'}
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.python.org:443
DEBUG:urllib3.connectionpool:https://iam.amazonaws.com:443 "POST / HTTP/1.1" 200 454
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): cloudknot-root-4e3592e1-36c8-4295-88f1-4a45a1f08bc6.s3.amazonaws.com:443
DEBUG:urllib3.connectionpool:https://pypi.python.org:443 "GET /pypi/cloudpickle/json HTTP/1.1" 301 122
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.org:443
DEBUG:urllib3.connectionpool:https://pypi.org:443 "GET /pypi/cloudpickle/json HTTP/1.1" 200 14981
Please, verify manually the final list of requirements.txt to avoid possible dependency confusions.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pypi.python.org:443
DEBUG:urllib3.connectionpool:https://pypi.python.org:443 "GET /pypi/nibabe

Submit 20 batch jobs to the knot. The `map()` method returns a list of futures for the results of each batch job. You can optionally supply a list of environment variables to each job.

In [4]:
# import numpy since it was only imported in the `random_mv_prod` function above
import numpy as np

In [5]:
# Submit the jobs
result_future = knot.map(np.linspace(0.1, 100, 17), env_vars=[{'name': 'MY_ENV_VAR', 'value': 'foo'}])

DEBUG:urllib3.connectionpool:Resetting dropped connection: cloudknot-root-4e3592e1-36c8-4295-88f1-4a45a1f08bc6.s3.amazonaws.com
DEBUG:urllib3.connectionpool:https://cloudknot-root-4e3592e1-36c8-4295-88f1-4a45a1f08bc6.s3.amazonaws.com:443 "PUT / HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://iam.amazonaws.com:443 "POST / HTTP/1.1" 200 454
DEBUG:urllib3.connectionpool:https://cloudknot-root-4e3592e1-36c8-4295-88f1-4a45a1f08bc6.s3.amazonaws.com:443 "PUT /?tagging HTTP/1.1" 204 0
DEBUG:urllib3.connectionpool:https://iam.amazonaws.com:443 "POST / HTTP/1.1" 409 376
DEBUG:urllib3.connectionpool:https://iam.amazonaws.com:443 "POST / HTTP/1.1" 200 2263
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/submitjob HTTP/1.1" 200 169
DEBUG:urllib3.connectionpool:https://cloudknot-root-4e3592e1-36c8-4295-88f1-4a45a1f08bc6.s3.amazonaws.com:443 "PUT /cloudknot.jobs/random-mv-prod-04-ck-jd/8eeaff40-3e8f-43c8-a181-0518954b3d28/input.pickle HTTP/1.1" 200 0
DEBUG:urllib

We can query the jobs associated with this knot by calling `knot.view_jobs()`, prints a bunch of job info and provides a consice summary of job statuses.

In [6]:
# Rerun this cell as often as you like to update your job status info
knot.view_jobs()

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (2): batch.us-east-1.amazonaws.com:443
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1649
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1649


Job ID              Name                        Status   
---------------------------------------------------------
8eeaff40-3e8f-43c8-a181-0518954b3d28        random-mv-prod-04-0         SUBMITTED


We can also inspect each BatchJob instance by looking at `knot.jobs` which returns a list of BatchJob instances for each submitted job, e.g.:

In [7]:
last_job = knot.jobs[-1]

In [8]:
print(last_job.done)
print(last_job.result(timeout=15))

DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1649


False


DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1649
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819


CKTimeoutError: The job with job-id 8eeaff40-3e8f-43c8-a181-0518954b3d28 did not finish within the requested timeout period

DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HTTP/1.1" 200 1819
DEBUG:urllib3.connectionpool:https://batch.us-east-1.amazonaws.com:443 "POST /v1/describejobs HT

In [None]:
last_job.status

`Knot.map()` returns a list of futures so you can use any of the futures methods to query the results, e.g. `done()` or `result()`.

In [None]:
print(result_future.done())

In [None]:
print(result_future.result())

Once you're all done, clobber the knot, including the underlying PARS and the remote repo.

In [None]:
knot.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True)