# Using cloudknot to batch write files to an S3 bucket

This example uses cloudknot to write files to an Amazon S3 bucket.

In [1]:
import cloudknot as ck

First, we write the python script that we want to run on AWS batch. Note that we import the necessary python packages within the function `write_to_bucket`. You should change the `bucket_name` to the S3 bucket that you would like to write to.

In [2]:
def write_to_bucket(index):
    import boto3
    import platform
    
    client = boto3.resource('s3')
    
    host = platform.node()
    
    fn = 'temp_{i:03d}.txt'.format(i=int(index))
    with open(fn, 'w') as f:
        f.write("Hello World from index {i:s} on host {host:s}!".format(
            i=str(index), host=host))

    bucket_name = 'escience.washington.edu.public'
    b = client.Bucket(bucket_name)
    b.upload_file(fn, fn)

By default, cloudknot does not attach any additional policies to its IAM roles. But since we are writing to an S3 bucket, we want our IAM roles to have S3 access. So we add that to the roles in the PARS that our knot is based on. First we try to create or retrieve the default VPC for this PARS. As a fallback option, create a non-default VPC.

In [3]:
s3_access_pars = ck.Pars(name='s3_access', policies=('AmazonS3FullAccess',))

Create a knot using the `write_to_bucket` function and the PARS that we just created.

In [4]:
knot = ck.Knot(name='test_s3_knot', func=write_to_bucket, pars=s3_access_pars)

# If you previously created this knot but didn't clobber it, then just supply
# the name in order to retrieve the knot info from the cloudknot config file
# knot = ck.Knot(name='test_s3_knot')

Submit 10 batch jobs to the knot. `commands` must be a sequence of commands and each command must be a sequence of strings. Therefore, `commands` must be a sequence of sequences of strings. For example, if you wanted to pass three commands `echo 1`, `echo 2`, `echo 3`, then then you would use `submit(commands=[["echo", "1"], ["echo", "2"], ["echo", "3"]])`. The commands must be strings because they will eventually be passed to our command line interface via the `docker run` command.

In [5]:
n_commands = 10
commands = [[str(i)] for i in range(n_commands)]
print(commands)

[['0'], ['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']]


In [None]:
knot.submit(commands=commands)

We can query the jobs associated with this knot by calling `knot.get_jobs()` and `knot.view_jobs()`. `get_jobs()` returns a dictionary of jobs info with `BatchJob` instances that you can interact with programatically.

In [None]:
knot.get_jobs()

`view_jobs()` prints a bunch of job info and provides a more consice summary of job statuses.

In [None]:
# Rerun this cell as often as you like to update your job status info
knot.view_jobs()

To check the results, you can login to the S3 console page at https://s3.console.aws.amazon.com/s3/home. Verify that cloudknot created a bunch of text files in your S3 bucket.

Once you're all done, clobber the knot, including the underlying PARS and the remote repo.

In [6]:
knot.clobber(clobber_pars=True, clobber_repo=True)

