#  2 Setting Up StarCluster for QIIME
Amanda Birmingham, CCBB, UCSD (abirmingham@ucsd.edu)

<a name = "table-of-contents"></a>

## Table of Contents

* [Creating a Configuration File](#creating-a-configuration-file)
* [Creating a Volume in StarCluster](#creating-a-volume-in-starcluster)
* [Launching a QIIME StarCluster](#launching-a-qiime-starcluster)
* [Running Screen](#running-screen)

Related Notebooks:
* 1 Introducing 16S Microbiome Primary Analysis
* 3 Validation, Demultiplexing, and Quality Control
* 4 OTU Picking and Rarefaction Depth Selection
* 5 Analyzing Core Diversity

<a name="creating-a-configuration-file"></a>

## Creating a Configuration File

In order to run this SOP, you will need to have the most recent version of StarCluster installed on your system.  You will also need a QIIME-specific StarCluster config file, modeled on the one below.  Fields that must be modified are highlighted in yellow, while those that may optionally be modified are highlighted in aqua:

`### QIIME StarCluster Configuration File ###`
    
`[global]`
    
`DEFAULT_TEMPLATE = 3node_c3-2xlarge #set per project`
    
`ENABLE_EXPERIMENTAL = True					`	  

`## AWS Info ##  `
    
`[aws info]  `
    
`AWS_ACCESS_KEY_ID =  ` <span style = "background-color:yellow;">XXXXXXXXXXXXXXXXXXX</span> `#set per analyst; MUST MODIFY`

`AWS_SECRET_ACCESS_KEY = ` <span style = "background-color:yellow;">XXXXXXXXXXXXXXXXXXXX</span> `#set per analyst; MUST MODIFY`

`AWS_USER_ID = ` <span style = "background-color:yellow;">XXXXXXXXXXXXXXXXXXXX</span>  `#set per user or institute; MUST MODIFY`
    
`AWS_REGION_NAME = us-west-2 #set per project`   
    
`AWS_REGION_HOST = ec2.us-west-2.amazonaws.com #set to be consistent with AWS_REGION_NAME`
 
`## EC2 Keypairs ##`  

`[key uswest]  `
    
`KEY_LOCATION = ` <span style = "background-color:yellow;">~/XXX/XXXXXX.pem</span> `#set per analyst*region; MUST MODIFY`

`## EBS Volumes ##  `
    
`#NOTE: best to create volumes via starcluster so file system is automatically installed` 

`[volume QIIME_cluster_volume]   `

`VOLUME_ID = ` <span style = "background-color:aqua;">vol-fbb144ba</span> `#must reside in AWS_REGION_NAME; OPTIONALLY MODIFY`

`MOUNT_PATH = /data	#set per cluster `
  
`## Cluster Templates ##`

`[cluster 3node_c3-2xlarge] `

`KEYNAME = ` <span style = "background-color:yellow;">uswest</span> `#must reside in AWS_REGION_NAME; MUST MODIFY`

`CLUSTER_SIZE = 3  `
    
`CLUSTER_USER = sgeadmin`  
    
`CLUSTER_SHELL = bash  `
    
`NODE_IMAGE_ID = ami-d181b5e1 #created from QIIME snapshot for AWS_REGION_NAME  `

`NODE_INSTANCE_TYPE = ` <span style = "background-color:aqua;">c3.2xlarge</span> `#OPTIONALLY MODIFY`				
    
`MASTER_INSTANCE_TYPE = ` <span style = "background-color:aqua;">c3.2xlarge</span> `#OPTIONALLY MODIFY`			

`VOLUMES = QIIME_cluster_volume		`				


[Table of Contents](#table-of-contents)

<a name = "creating-a-volume-in-starcluster"></a>

## Creating a Volume in StarCluster

The volume referenced above already exists in the uswest2 region and is available for shared use.  It is a 10GB drive (which is generally adequate for microbiome analyses) and is already formatted and has the "/data" directory (used for the mount) created.  If creating your own drive, ensure that these two conditions are met; formatting is most easily accomplished by creating the drive through StarCluster:

    starcluster createvolume --name=[myvolumename] [size in GB] [availability-zone]

For example, the volume referenced above was created with the command:

    starcluster createvolume --name=AB_QIIME_10GB_volume 10 us-west-2b

Meanwhile, creating the mount directory is simple (but should be done before the cluster creation): attach the volume to a running instance, ssh into the instance, and type

    sudo mkdir /data

The name "data" could of course be changed to another one, but that would require updating the config's `MOUNT_PATH` value. 

In my experience, `c3.2xlarge` or `m3.xlarge` are both acceptable instance types for typical microbiome analysis work, and the choice between them can be made based on spot instance price.  If considering other instance types, note that:

* QIIME requires at least 8 GB of memory
* The QIIME AMI can only be run on instances using paravirtualization, not those using HVM virtualization
    * This rules out a lot of the instance types, including the d\*, g\*, i\*, r\*, and c[even number]\* instances 
    
Changing to a region other than us-west-2 requires pervasive changes to the config, as noted in the config comments above.  Don't attempt unless you are already comfortable with StarCluster and AWS regions.

[Table of Contents](#table-of-contents)

<a name = "launching-a-qiime-starcluster"></a>

## Launching a QIIME StarCluster

Launching the QIIME StarCluster defined in this config using spot instances requires a terminal command of the following format:

    starcluster --config=[config file path] start -b [bid price in dollars] [cluster name]

(The `--config` switch can be elided if you have the QIIME config parameters in the default config file at `~/.starcluster/config`, but this is unlikely to be convenient if you need to run multiple clusters.)  An example command, which bids 28 cents for each spot instance, is shown below:

    starcluster --config=~/Virtualenvs/QiimeStarClusterEnv/config start -b 0.28 AB_qiime_cluster

Recall that the cluster master node will always be an on-demand instance, so the three-node cluster defined in the example config will have one on-demand node and two spot nodes (assuming the spot requests are filled successfully).

Once the cluster is launched, you will need to ssh into the master node, as in this example:

    starcluster --config=~/Virtualenvs/QiimeStarClusterEnv/config sshmaster AB_qiime_cluster

[Table of Contents](#table-of-contents)

<a name = "running-screen"></a>

## Running Screen

I highly recommend that you then run 

    screen

This program (see http://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ ) allows creation of multiple shell windows from a single ssh session and, more importantly for long-running analysis work, keeps the shell active even if the network connection to it is interrupted.  Screen starts with one shell window; additional ones can be created with 

    ctrl-a c

and you can cycle through the existing shells with 

    ctrl-a n

If your connection drops and you need to re-attach to screen, ssh into the master node again and run

    screen -r

Further usage details can be found at the link above.

[Table of Contents](#table-of-contents)