# AWS Batch system commands
I use this notebook to test/develop/record/document system administration commands which are useful for managing the AWS Batch integration with various GenePattern Server instances. Most examples are run locall as shell commands by way of the ```%%bash``` built-in magic command.

Hint: list all available magics
```
%lsmagic
```
Hint: display content in a colored block
```html
<div class="alert alert-block alert-info">
```

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Links" data-toc-modified-id="Links-1">Links</a></span><ul class="toc-item"><li><span><a href="#AWS" data-toc-modified-id="AWS-1.1">AWS</a></span></li><li><span><a href="#Other" data-toc-modified-id="Other-1.2">Other</a></span></li></ul></li><li><span><a href="#Commands" data-toc-modified-id="Commands-2">Commands</a></span><ul class="toc-item"><li><span><a href="#initialize-awscli" data-toc-modified-id="initialize-awscli-2.1">initialize awscli</a></span></li><li><span><a href="#describe-job-queues" data-toc-modified-id="describe-job-queues-2.2">describe job queues</a></span></li><li><span><a href="#describe-job-definitions" data-toc-modified-id="describe-job-definitions-2.3">describe job definitions</a></span></li><li><span><a href="#list-failed-jobs" data-toc-modified-id="list-failed-jobs-2.4">list failed jobs</a></span></li><li><span><a href="#use-case:-job-details-for-a-failed-job" data-toc-modified-id="use-case:-job-details-for-a-failed-job-2.5">use-case: job details for a failed job</a></span></li></ul></li><li><span><a href="#SQL-Queries" data-toc-modified-id="SQL-Queries-3">SQL Queries</a></span></li><li><span><a href="#Parking-lot" data-toc-modified-id="Parking-lot-4">Parking lot</a></span><ul class="toc-item"><li><span><a href="#(bash)-command-array" data-toc-modified-id="(bash)-command-array-4.1">(bash) command array</a></span></li></ul></li></ul></div>

## Links

### AWS
* See: https://docs.aws.amazon.com/cli/
* See: https://docs.aws.amazon.com/cli/latest/reference/batch/index.html
* See: https://docs.aws.amazon.com/cli/latest/userguide/controlling-output.html#controlling-output-filter
* See: http://jmespath.org/specification.html
* See: http://opensourceconnections.com/blog/2015/07/27/advanced-aws-cli-jmespath-query/

### Other
* See: https://docs.python.org/3/
* See: https://jupyter.readthedocs.io/en/latest/
* See: https://jupyter-notebook.readthedocs.io/en/latest/
* See: https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html
* See: https://datascience.ibm.com/docs/content/analyze-data/markd-jupyter.html
* See: https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/, example of how to use 'bash'
* See: http://ipython.readthedocs.io/en/stable/interactive/magics.html


## Commands

### initialize awscli

In [None]:
%%bash -e
# fail on first error
#   bash -e ...
#   set -e
source activate awscli
which aws

### describe job queues
command: [describe-job-queues](https://docs.aws.amazon.com/cli/latest/reference/batch/describe-job-queues.html)

In [None]:
%%bash -e
# list all job queues
aws batch describe-job-queues

### describe job definitions
command: [describe-job-definitions](https://docs.aws.amazon.com/cli/latest/reference/batch/describe-job-definitions.html)

In [None]:
%%bash -e
aws batch describe-job-definitions

In [None]:
%%bash -e

# show all job definitions, sorted 
#   sorted by jobDefinitionName asc
#   sorted by revision descending
#   see: https://stackoverflow.com/questions/39203630/linux-shell-sort-column-1-in-ascending-order-column-3-in-descending-order
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].[jobDefinitionName,revision]' \
  --output text \
| sort -k1,1f -k2,2nr


In [None]:
%%bash -e
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].{jobDefinitionName:jobDefinitionName}'

In [None]:
%%bash -e
aws batch describe-job-definitions \
  --query 'jobDefinitions[*].{jobDefinitionName:jobDefinitionName,containerProperties_image:containerProperties.image}'

### list failed jobs
Command: [list-jobs](https://docs.aws.amazon.com/cli/latest/reference/batch/list-jobs.html)

Example:
```bash
aws batch list-jobs --job-status FAILED --job-queue my-job-queue
```
--job-status = 
SUBMITTED | 
PENDING | 
RUNNABLE | 
STARTING | 
RUNNING | 
SUCCEEDED | 
FAILED

In [None]:
# must set jobQueue as a python variable
jobQueue = "gpbeta-default"

%%bash -e -s "$jobQueue"

# set bash variable from python variable
jobQueue="${1}"

# init awscli
source activate awscli

# list failed jobs
aws batch list-jobs \
  --job-queue $jobQueue \
  --job-status FAILED

### use-case: job details for a failed job
Given a GenePattern job number (e.g. 11813), make the necessary aws batch cli calls to track down the error message.

Example command ('?jobName=='):
```bash
aws batch list-jobs \
  --job-queue gpbeta-default \
  --job-status FAILED \
  --query 'jobSummaryList[?jobName==`GP_Job_11813`]'
```

Example command ('?ends_with'):
```bash
aws batch list-jobs \
  --job-queue gpbeta-default \
  --job-status FAILED \
  --query 'jobSummaryList[?ends_with(jobName,`_11813`) == `true`].{jobId:jobId}'
```

Example command (with xargs):
```bash
aws batch list-jobs --job-queue gpbeta-default \
  --job-status 'FAILED' \
  '--query' "jobSummaryList[*].{jobId:jobId}" \
  --output text \
| xargs -I{} \
  aws batch describe-jobs --jobs {} \
    --query "jobs[*].{jobName:jobName,image:container.image,reason:container.reason}" \
    --output table
```

In [None]:
# must set gpJobNo
gpJobNo = "11458"

In [None]:
%%bash -e -s "$gpJobNo"

# set bash variable from python variable
gpJobNo="${1}"
jobName="GP_Job_${gpJobNo}"

# init awscli
source activate awscli

#
# step 1: get the aws batch jobId from the gp jobId
#
jobId=$(aws batch list-jobs \
  '--job-queue' 'gpbeta-default' \
  '--job-status' 'FAILED' \
  '--query' "jobSummaryList[?ends_with(jobName,\`_$gpJobNo\`) == \`true\`].{jobId:jobId}" \
  '--output' 'text')
echo jobId=$jobId
aws batch describe-jobs --jobs $jobId

# alternative queries
# take 1
#   '--query' 'jobSummaryList[?jobName==`GP_Job_11813`]'
# take 2
#   query="jobSummaryList[?jobName==\`$jobName\`]"
#   '--query' "${query}"

In [None]:
%%bash -e

#
# list recently FAILED jobs
#
aws batch list-jobs --job-queue gpbeta-default \
  --job-status 'FAILED' \
  '--query' "jobSummaryList[*].{jobId:jobId}" \
  --output text \
| xargs -I{} \
  aws batch describe-jobs --jobs {} \
    --query "jobs[*].{jobName:jobName,image:container.image,reason:container.reason}" \
    --output text
#    \
#    --output table

## SQL Queries
Connect to the MySQL database with the ```pymysql.cursors``` library.

* see: https://pymysql.readthedocs.io/en/latest/index.html

In [None]:
#
# Note: must change to actual credentials
#
db_host = "my-db-host"
db_port = 3306
db_user = "mydbuser"
db_password = "mydbpassword"
db_schema = "mydbschema"
%who

In [None]:
import pymysql.cursors

# connect to the database
db = pymysql.connect(
    host=db_host,
    port=db_port,
    user=db_user,
    password=db_password,
    db=db_schema
)

try:
    with db.cursor() as cursor:
        sql = "SELECT user_id, email from gp_user"
        cursor.execute(sql)
        result = cursor.fetchall();
        print(result);
finally:
    connection.close()            

## Parking lot
Parking lot for example code and other snippets.

### (bash) command array

In [None]:
%%bash

# declare command line as an array
cmd=()

# add items to cmd
cmd=(${cmd[@]} "echo" "Hello, World!")

# print the command
echo "Command args ..."
printf '    %s\n' "${cmd[@]}"

# run the command
echo "Running command ..."
"${cmd[@]}"