### Assignment: assemble an ipyrad example data set

Follow the instructions here: http://ipyrad.readthedocs.io/API_user-guide.html to assemble a dataset using the ipyrad API. You will need to download the dataset as instructed below. This dataset is different from the one in the linked tutorial. Be sure to download the data into your scratch space, and to set the project directory for you ipyrad analysis to your scratch directory. You can use any of the datasets in the downloaded directory. Read the ipyrad docs if you have questions and/or hit up the gitter chatroom. 

** When finished copy this notebook to your assignments/ dir, push it, and make a pull request**. 

In [None]:
# Note, none of this code is being executed as I cannot run it on my machine, due to issues with capacity

import ipyrad as ip
import ipyparallel as ipp

### Download the data
You will probably want to move the data to your scratch directory. You can run this code here to download it, or from a terminal. 

In [None]:
%%bash
## The curl command needs a capital O, not a zero
curl -LkO https://github.com/dereneaton/ipyrad/raw/master/tests/ipsimdata.tar.gz
tar -xvzf ipsimdata.tar.gz

In [None]:
ls ipsimdata/

### Connect to an ipcluster instance

In [None]:
# The first line of code below is executed in a separate terminal to start an ipcluster instance
# ipcluster start --n=4

%px import time, os

ipyclient = ipp.Client()

### Assembly the dataset from step 1 to step 7

In [None]:
ipsimdata = ip.Assembly("ipsimdata")

## setting/modifying parameters for this Assembly object
ipsimdata.set_params('project_dir', "pedicularis")
ipsimdata.set_params('sorted_fastq_path', "./example_empirical_rad/*.gz") # Path needs to be modified accordingly
ipsimdata.set_params('filter_adapters', 2)
ipsimdata.set_params('datatype', 'rad')

## prints the parameters to the screen
ipsimdata.get_params()


# Running the first step (Before this step is run, ensure that the right folder path is called above)
# This step loads in the data
ipsimdata.run("1", ipyclient=ipyclient, force=True)

## print full stats summary
print ipsimdata.stats

## let's create a dictionary to hold the finished assemblies
## Creating small for-loops as suggested by the ipyrad tutorial to run each step of the assembly process
adict = {}

## iterate over parameters settings creating a new named assembly
for filter_setting in [1, 2]:
    ## create a new name for the assembly and branch
    newname = ipsimdata.name + "_f{}".format(filter_setting)
    child1 = ipsimdata.branch(newname)
    child1.set_params("filter_adapters", filter_setting)
    child1.run("2")

    ## iterate over clust thresholds
    for clust_threshold in ['0.85', '0.90']:
        newname = child1.name + "_c{}".format(clust_threshold[2:])
        child2 = child1.branch(newname)
        child2.set_params("clust_threshold", clust_threshold)
        child2.run("3456")

        ## iterate over min_sample coverage
        for min_samples_locus in [4, 12]:
            newname = child2.name + "_m{}".format(min_samples_locus)
            child3 = child2.branch(newname)
            child3.set_params("min_samples_locus", min_samples_locus)
            child3.run("7")

            ## store the complete assembly in the dictionary by its name
            ## so it is easy for us to access and retrieve, since we wrote
            ## over the variable name 'child' during the loop. You can do
            ## this using dictionaries, lists, etc., or, as you'll see below,
            ## we can use the 'load_json()' command to load a finished assembly
            ## from its saved file object.
            
            adict[newname] = child3



### Print the final assembly stats

In [None]:
## save assembly object (also auto-saves after every run() command)
child3.save()

## load assembly object
child3 = ip.load_assembly("pedicularis/child3.json")

## write params file for use by the CLI
child3.write_params()

### Show the location of your assembled output files

In [None]:
child3.run("7")