# Using cloudknot to batch write files to an S3 bucket

This example uses cloudknot to return an array of varying length whose elements are all 'hello' strings

In [1]:
import cloudknot as ck

First, we write the python script that we want to run on AWS batch. Note that we import the necessary python packages within the function `hello_array`. Also note that we convert the input into an `int`. By default, all the arguments are treated as strings because they come from command line arguments via the `docker run` command in the AWS ECS task. You can change this behavior by [converting arguments](https://clize.readthedocs.io/en/stable/basics.html#converting-arguments) (links to the `clize` docs).

In [2]:
def hello_array(length):
    import platform

    host = platform.node()
    result = {
        'host': host,
        'array': ['hello'] * int(length)
    }

    return result

Create a knot using the `hello_array` function and a job definition memory of 1000 MiB.

In [3]:
knot = ck.Knot(name='hello_array', func=hello_array, memory=1000)

# If you previously created this knot but didn't clobber it, then just supply
# the name in order to retrieve the knot info from the cloudknot config file
# knot = ck.Knot(name='hello_array')

Submit 10 batch jobs to the knot. `commands` must be a sequence of commands and each command must be a sequence of strings. Therefore, `commands` must be a sequence of sequences of strings. For example, if you wanted to pass three commands `echo 1`, `echo 2`, `echo 3`, then then you would use `submit(commands=[["echo", "1"], ["echo", "2"], ["echo", "3"]])`. The commands must be strings because they will eventually be passed to our command line interface via the `docker run` command.

In [4]:
n_commands = 10
commands = [[str(i)] for i in range(n_commands)]
print(commands)

[['0'], ['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]


In [5]:
result_futures = knot.map(commands=commands)

We can query the jobs associated with this knot by calling `knot.get_jobs()` and `knot.view_jobs()`.

`view_jobs()` prints a bunch of job info and provides a more consice summary of job statuses.

In [6]:
# Rerun this cell as often as you like to update your job status info
knot.view_jobs()

Job ID              Name                        Status   
---------------------------------------------------------
12f3e2f4-eec7-4b94-ae04-ce5c59429031        hello_array-0               SUCCEEDED
dc752333-adb7-41f0-a327-4bc2897a51b4        hello_array-1               SUCCEEDED
5c517986-f7b4-4fc0-ac80-6a91a77fd7a8        hello_array-2               SUCCEEDED
9a6f1a8c-b9e1-46eb-a9b5-5ee01cbbc345        hello_array-3               SUCCEEDED
a4c14239-a9a4-48d9-8fda-4d6d9a3556da        hello_array-4               SUCCEEDED
be9e1bbd-c233-4dd2-83bd-fe6c6bf7f429        hello_array-5               SUCCEEDED
80add267-5774-4cdf-8f99-dd7e431fe4ea        hello_array-6               SUCCEEDED
069121fd-03df-4540-92be-1e115bdbf2ad        hello_array-7               SUCCEEDED
b1519e12-73f5-48f9-974b-927a302081aa        hello_array-8               SUCCEEDED
11385165-e746-4ca5-8d36-f29536a27926        hello_array-9               SUCCEEDED


 `get_jobs()` returns a dictionary of jobs info with `BatchJob` instances that you can interact with programatically.

In [7]:
knot.get_jobs()

[{'attempts': [{'container': {'containerInstanceArn': 'arn:aws:ecs:us-east-1:455598791984:container-instance/ad1a7d9c-9286-442f-87be-9bb602a30de1',
     'exitCode': 0,
     'logStreamName': 'hello_array-cloudknot-job-definition/default/ee801bf5-d8a3-47c3-ad7f-752ee16882b2',
     'taskArn': 'arn:aws:ecs:us-east-1:455598791984:task/ee801bf5-d8a3-47c3-ad7f-752ee16882b2'},
    'startedAt': 1510016241929,
    'statusReason': 'Essential container in task exited',
    'stoppedAt': 1510016243401}],
  'id': '12f3e2f4-eec7-4b94-ae04-ce5c59429031',
  'job': <cloudknot.aws.batch.BatchJob at 0x10679cac8>,
  'name': 'hello_array-0',
  'status': 'SUCCEEDED',
  'status-reason': 'Essential container in task exited'},
 {'attempts': [{'container': {'containerInstanceArn': 'arn:aws:ecs:us-east-1:455598791984:container-instance/ad1a7d9c-9286-442f-87be-9bb602a30de1',
     'exitCode': 0,
     'logStreamName': 'hello_array-cloudknot-job-definition/default/19e01977-810a-48db-beff-991aa0df1c03',
     'taskArn':

We can also inspect each BatchJob instance by looking at `knot.jobs` which returns a list of BatchJob instances for each submitted job, e.g.:

In [8]:
last_job = knot.jobs[-1]

In [9]:
print(last_job.done)
print(last_job.result(timeout=5))

True
{'array': ['hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello', 'hello'], 'host': 'ip-172-31-32-84'}


In [10]:
print(last_job.log_urls)

['https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/batch/job;stream=hello_array-cloudknot-job-definition/default/f1f4bdc3-8a33-448f-b4b4-979ca31279a5']


In [11]:
print(result_futures[0].done())
print(result_futures[0].result())

True
{'array': [], 'host': 'ip-172-31-32-84'}


Once you're all done, clobber the knot, including the underlying PARS and the remote repo.

In [12]:
knot.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True)