# AWS system commands
I use this notebook to test/develop/record/document system administration commands which are useful for managing the AWS Batch integration with various GenePattern Server instances. Most examples are run locally as shell commands by way of the ```%%bash``` built-in magic command.

### initialize awscli

In [None]:
%%bash -e
# fail on first error
#   bash -e ...
#   set -e
source activate awscli
which aws

## aws batch commands

### describe job queues
command: [describe-job-queues](https://docs.aws.amazon.com/cli/latest/reference/batch/describe-job-queues.html)

In [None]:
%%bash -e
# list all job queues
aws batch describe-job-queues

### describe job definitions
command: [describe-job-definitions](https://docs.aws.amazon.com/cli/latest/reference/batch/describe-job-definitions.html)

In [None]:
%%bash -e
aws batch describe-job-definitions

In [None]:
%%bash -e

# show all job definitions, sorted 
#   sorted by jobDefinitionName asc
#   sorted by revision descending
#   see: https://stackoverflow.com/questions/39203630/linux-shell-sort-column-1-in-ascending-order-column-3-in-descending-order
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].[jobDefinitionName,revision,containerProperties.image]' \
  --output text \
| sort -k1,1f -k2,2nr


In [None]:
%%bash -e
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].{jobDefinitionName:jobDefinitionName}'

In [None]:
%%bash -e
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].{jobDefinitionName:jobDefinitionName,containerProperties_image:containerProperties.image}' \
  --output text | sort

### get container details
For a given, completed GenePattern job, how do I know what version of the docker image was used?

In [None]:
%%bash -e
job_id="73543554-6065-4b06-b540-4c8b0521bc2b"
aws batch describe-jobs --jobs ${job_id}

### list failed jobs
command: [list-jobs](https://docs.aws.amazon.com/cli/latest/reference/batch/list-jobs.html)

Example:
```bash
aws batch list-jobs --job-status FAILED --job-queue my-job-queue
```
--job-status = 
SUBMITTED | 
PENDING | 
RUNNABLE | 
STARTING | 
RUNNING | 
SUCCEEDED | 
FAILED

In [None]:
# must set jobQueue as a python variable
jobQueue = "gpbeta-default"

%%bash -e -s "$jobQueue"

# set bash variable from python variable
jobQueue="${1}"

# init awscli
source activate awscli

# list failed jobs
aws batch list-jobs \
  --job-queue $jobQueue \
  --job-status FAILED

### use-case: job details for a failed job
Given a GenePattern job number (e.g. 11813), make the necessary aws batch cli calls to track down the error message.

Example command ('?jobName=='):
```bash
aws batch list-jobs \
  --job-queue gpbeta-default \
  --job-status FAILED \
  --query 'jobSummaryList[?jobName==`GP_Job_11813`]'
```

Example command ('?ends_with'):
```bash
aws batch list-jobs \
  --job-queue gpbeta-default \
  --job-status FAILED \
  --query 'jobSummaryList[?ends_with(jobName,`_11813`) == `true`].{jobId:jobId}'
```

Example command (with xargs):
```bash
aws batch list-jobs --job-queue gpbeta-default \
  --job-status 'FAILED' \
  '--query' "jobSummaryList[*].{jobId:jobId}" \
  --output text \
| xargs -I{} \
  aws batch describe-jobs --jobs {} \
    --query "jobs[*].{jobName:jobName,image:container.image,reason:container.reason}" \
    --output table
```

In [None]:
# must set gpJobNo
gpJobNo = "11458"

In [None]:
%%bash -e -s "$gpJobNo"

# set bash variable from python variable
gpJobNo="${1}"
jobName="GP_Job_${gpJobNo}"

# init awscli
source activate awscli

#
# step 1: get the aws batch jobId from the gp jobId
#
jobId=$(aws batch list-jobs \
  '--job-queue' 'gpbeta-default' \
  '--job-status' 'FAILED' \
  '--query' "jobSummaryList[?ends_with(jobName,\`_$gpJobNo\`) == \`true\`].{jobId:jobId}" \
  '--output' 'text')
echo jobId=$jobId
aws batch describe-jobs --jobs $jobId

# alternative queries
# take 1
#   '--query' 'jobSummaryList[?jobName==`GP_Job_11813`]'
# take 2
#   query="jobSummaryList[?jobName==\`$jobName\`]"
#   '--query' "${query}"

In [None]:
%%bash -e

#
# list recently FAILED jobs
#
aws batch list-jobs --job-queue gpbeta-default \
  --job-status 'FAILED' \
  '--query' "jobSummaryList[*].{jobId:jobId}" \
  --output text \
| xargs -I{} \
  aws batch describe-jobs --jobs {} \
    --query "jobs[*].{jobName:jobName,image:container.image,reason:container.reason}" \
    --output text
#    \
#    --output table

## aws s3 commands

### aws s3 sync

template:  

```bash
aws s3 sync <LocalPath> <S3Uri> [--exclude exclude-pattern] [--include include-pattern] [aws-profile]
```

<div class="alert alert-block alert-info">
For the GenePattern AWS Batch integration, files are copied from the local file system to an S3 bucket with an optional prefix, e.g. <br/>
&nbsp;&nbsp;&nbsp;&nbsp;<b>S3Prefix</b>=s3://gpbeta <br/><br/>
The LocalPath is the fully qualified path on the server head node, e.g. <br/>
&nbsp;&nbsp;&nbsp;&nbsp;<b>LocalPath</b>=/opt/gp/gp_home/jobResults/1 <br/><br/>
The S3Uri is the S3Prefix prepended to the LocalPath, <br/>
&nbsp;&nbsp;&nbsp;&nbsp;<b>S3Uri</b>=&lt;<b>S3Prefix</b>&gt;&lt;<b>LocalPath</b>&gt;, e.g.,<br/>
&nbsp;&nbsp;&nbsp;&nbsp;<b>S3Uri</b>=s3://gpbeta/opt/gp/gp_home/jobResults/1 <br/>
</div>

See the [aws s3 reference](https://docs.aws.amazon.com/cli/latest/reference/s3/index.html) for more details about the [path-argument-type](https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#path-argument-type).

### directory upload  
template:  

```
aws s3 sync <LocalPath> <S3Prefix><LocalPath> 
  --exclude ".DS_Store" 
  --exclude "*~"

```

example:
```
localPath="/shared_data/gp_tutorial_files/all_aml"
s3Prefix="s3://gpbeta"
aws s3 sync "${localPath}" "${s3Prefix}${localPath}" \
  --exclude ".DS_Store" \
  --exclude "*~"
```

### file upload  

template:  

```
aws s3 sync <LocalDir> <S3Prefix><LocalDir> \
  --exclude "*" \
  --include "<FileName>" \
  [--profile <AwsProfile>]

```

example:  

```
aws s3 sync \
  /shared_data/gp_tutorial_files/all_aml \
  s3://gpbeta/shared_data/gp_tutorial_files/all_aml \
  --exclude "*" \
  --include all_aml_test.gct
```

example (with variables):  

```
localPath="/shared_data/gp_tutorial_files/all_aml/all_aml_test.gct"
localDir="/shared_data/gp_tutorial_files/all_aml"
fileName="all_aml_test.gct"
s3Prefix="s3://gpbeta"

aws s3 sync \
  "${localDir}" \
  "${s3Prefix}${localDir}" \
  --exclude "*" \
  --include "${fileName}"
```


### delete directory

template:  
```
aws s3 rm <S3Uri> --recursive
aws s3 rm <S3Prefix><LocalDir>
```


example:  

```
# init variables
localDir="/shared_data/gp_tutorial_files/all_aml"
s3Prefix="s3://gpbeta"
s3Uri={s3Prefix}{localDir}

# sanity check
aws s3 ls s3Uri
# check exitCode, expecting 0 when the directory exists
echo ?*

# remove the directory from s3
aws s3 rm ${s3Uri} --recursive

# sanity check
aws s3 ls s3Uri
# check exitCode, expecting non-zero when the directory does not exist
echo ?*

```


## SQL Queries
Connect to the MySQL database with the ```pymysql.cursors``` library.

* see: https://pymysql.readthedocs.io/en/latest/index.html

In [None]:
#
# Note: must change to actual credentials
#
db_host = "my-db-host"
db_port = 3306
db_user = "mydbuser"
db_password = "mydbpassword"
db_schema = "mydbschema"
%who

In [None]:
import pymysql.cursors

# connect to the database
db = pymysql.connect(
    host=db_host,
    port=db_port,
    user=db_user,
    password=db_password,
    db=db_schema
)

try:
    with db.cursor() as cursor:
        sql = "SELECT user_id, email from gp_user"
        cursor.execute(sql)
        result = cursor.fetchall();
        print(result);
finally:
    connection.close()            

## Parking lot
Parking lot for example code and other snippets.

### (bash) command array

In [None]:
%%bash

# declare command line as an array
cmd=()

# add items to cmd
cmd=(${cmd[@]} "echo" "Hello, World!")

# print the command
echo "Command args ..."
printf '    %s\n' "${cmd[@]}"

# run the command
echo "Running command ..."
"${cmd[@]}"

## Hints

### list magics
```
%lsmagic
```


### display content in a colored block
```html
<div class="alert alert-block alert-info">
```

### display README.md
```
from IPython.display import display, Markdown

with open('README.md', 'r') as fh:
    content = fh.read()

display(Markdown(content))
```

## Links

### AWS
* See: https://docs.aws.amazon.com/cli/
* See: https://docs.aws.amazon.com/cli/latest/reference/batch/index.html
* See: https://docs.aws.amazon.com/cli/latest/userguide/controlling-output.html#controlling-output-filter
* See: http://jmespath.org/specification.html
* See: http://opensourceconnections.com/blog/2015/07/27/advanced-aws-cli-jmespath-query/
* See:  https://docs.aws.amazon.com/cli/latest/userguide/cli-environment.html
* See: https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#path-argument-type
* See: https://docs.aws.amazon.com/cli/latest/reference/s3/index.html
* See: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html


### Other
* See: https://docs.python.org/3/
* See: https://jupyter.readthedocs.io/en/latest/
* See: https://jupyter-notebook.readthedocs.io/en/latest/
* See: https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html
* See: https://datascience.ibm.com/docs/content/analyze-data/markd-jupyter.html
* See: https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/, example of how to use 'bash'
* See: http://ipython.readthedocs.io/en/stable/interactive/magics.html
