# Using cloudknot to batch write files to an S3 bucket

This example uses cloudknot to return an array of varying length whose elements are all 'hello' strings

In [1]:
import cloudknot as ck

First, we write the python script that we want to run on AWS batch. Note that we import the necessary python packages within the function `hello_array`. Also note that we convert the input into an `int`. By default, all the arguments are treated as strings because they come from command line arguments via the `docker run` command in the AWS ECS task. You can change this behavior by [converting arguments](https://clize.readthedocs.io/en/stable/basics.html#converting-arguments) (links to the `clize` docs).

In [2]:
def hello_array(length):
    import platform

    host = platform.node()
    result = {
        'host': host,
        'array': ['hello'] * int(length)
    }

    return result

Create a knot using the `hello_array` function and a job definition memory of 1000 MiB.

In [3]:
#knot = ck.Knot(name='hello_array', func=hello_array, memory=1000)

# If you previously created this knot but didn't clobber it, then just supply
# the name in order to retrieve the knot info from the cloudknot config file
knot = ck.Knot(name='hello_array')

Submit 10 batch jobs to the knot. `commands` must be a sequence of commands and each command must be a sequence of strings. Therefore, `commands` must be a sequence of sequences of strings. For example, if you wanted to pass three commands `echo 1`, `echo 2`, `echo 3`, then then you would use `submit(commands=[["echo", "1"], ["echo", "2"], ["echo", "3"]])`. The commands must be strings because they will eventually be passed to our command line interface via the `docker run` command.

In [4]:
n_commands = 10
commands = [[str(i)] for i in range(n_commands)]
print(commands)

[['0'], ['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]


In [5]:
result_futures = knot.map(commands=commands)

We can query the jobs associated with this knot by calling `knot.get_jobs()` and `knot.view_jobs()`.

`view_jobs()` prints a bunch of job info and provides a more consice summary of job statuses.

In [6]:
# Rerun this cell as often as you like to update your job status info
knot.view_jobs()

Job ID              Name                        Status   
---------------------------------------------------------
23984207-6a30-458c-ba00-06bcb0b952bd        hello_array-0               SUCCEEDED
4ca741b3-9adb-4802-9d9e-233ff86eb264        hello_array-1               SUCCEEDED
976c1fa1-921e-4db0-aeb8-63a8a38467a6        hello_array-2               SUCCEEDED
0abb88df-74d2-40bb-a000-aac3fe385b33        hello_array-3               SUCCEEDED
0174b52a-c04e-457c-ae37-7a19db5da831        hello_array-4               SUCCEEDED
3f8c2483-954a-45fa-b0ae-fc04084c7d33        hello_array-5               SUCCEEDED
025fd87f-73d2-4b2a-8acd-09f063e06daf        hello_array-6               SUCCEEDED
56d5fdb2-27f4-473f-bff6-d661a7ffd692        hello_array-7               SUCCEEDED
9c04c7da-0927-469e-979c-eb371a319799        hello_array-8               SUCCEEDED
31f0b02f-11ad-4870-a4ab-5c7a17945f6d        hello_array-9               SUCCEEDED


 `get_jobs()` returns a dictionary of jobs info with `BatchJob` instances that you can interact with programatically.

In [7]:
knot.get_jobs()

[{'attempts': [{'container': {'containerInstanceArn': 'arn:aws:ecs:us-east-1:455598791984:container-instance/70e858fb-ef54-4118-b230-7a004ecd16c0',
     'exitCode': 0,
     'logStreamName': 'hello_array-cloudknot-job-definition/default/1bd6b10d-9625-49a9-8c31-1b83855964a4',
     'taskArn': 'arn:aws:ecs:us-east-1:455598791984:task/1bd6b10d-9625-49a9-8c31-1b83855964a4'},
    'startedAt': 1510005348501,
    'statusReason': 'Essential container in task exited',
    'stoppedAt': 1510005349768}],
  'id': '23984207-6a30-458c-ba00-06bcb0b952bd',
  'job': <cloudknot.aws.batch.BatchJob at 0x10a6b19e8>,
  'name': 'hello_array-0',
  'status': 'SUCCEEDED',
  'status-reason': 'Essential container in task exited'},
 {'attempts': [{'container': {'containerInstanceArn': 'arn:aws:ecs:us-east-1:455598791984:container-instance/ed916765-2d0e-4c5a-ad17-6bb6b469561e',
     'exitCode': 0,
     'logStreamName': 'hello_array-cloudknot-job-definition/default/9ce2db7d-5ccf-454e-9b57-f30366086053',
     'taskArn':

We can also inspect each BatchJob instance by looking at `knot.jobs` which returns a list of BatchJob instances for each submitted job, e.g.:

In [8]:
last_job = knot.jobs[-1]

In [9]:
print(last_job.done)
print(last_job.result(timeout=5))

True
{'array': ['hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello'], 'host': 'ip-172-31-63-96'}


In [11]:
print(result_futures[0].done())
print(result_futures[0].result())

True
{'array': [], 'host': 'ip-172-31-2-170'}


Once you're all done, clobber the knot, including the underlying PARS and the remote repo.

In [None]:
knot.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True)