### Update BIOM file with data from STOQS
*Given a .biom file and multiple STOQS databases, explore Next Generation Sequence and environmental data*


Executing this Notebook requires a personal STOQS database. Follow the [steps to build your own development system](https://github.com/stoqs/stoqs/blob/master/README.md) &mdash; this will take a few hours and depends on a good connection to the Internet.  Once your server is up log into it (after a `cd ~/Vagrants/stoqsvm`) and activate your virtual environment with the usual commands:

    vagrant ssh -- -X
    cd ~/dev/stoqsgit
    source venv-stoqs/bin/activate
    
Then load all of the SIMZ databases with the commands below. In order to have all the subsample analysis data (Sampled Parameters) loaded it's necessary to have `SIMZ<month><year>` directories containing those .csv files. (See the `subsample_csv_files` attribute setting in the load script for the campaign.)

    cd stoqs
    ln -s mbari_campaigns.py campaigns.py
    export DATABASE_URL=postgis://stoqsadm:CHANGEME@127.0.0.1:5432/stoqs
    loaders/load.py --db stoqs_simz_aug2013 stoqs_simz_oct2013 \
    stoqs_simz_spring2014 stoqs_simz_jul2014 stoqs_simz_oct2014 
    loaders/load.py --db stoqs_simz_aug2013 stoqs_simz_oct2013 \
    stoqs_simz_spring2014 stoqs_simz_jul2014 stoqs_simz_oct2014 --updateprovenance
   
Loading these database will take a few hours. Once it's finished you can interact with the data quite efficiently, as this Notebook demonstrates. Launch Jupyter Notebook with:

    cd contrib/notebooks
    ../../manage.py shell_plus --notebook
    
navigate to this file and open it. You will then be able to execute the cells and experiment with different settings and code.

Let's make a list of all SIMZ database from the campaigns on our system

In [1]:
from campaigns import campaigns
dbs = [c for c in campaigns if 'simz' in c]
print dbs

['stoqs_simz_aug2013', 'stoqs_simz_oct2013', 'stoqs_simz_spring2014', 'stoqs_simz_jul2014', 'stoqs_simz_oct2014']


Open a .biom file that contains sequence data from Net Tows conducted on these campaigns

In [2]:
biom_file = '../../loaders/MolecularEcology/BIOM/otu_table_newsier_90nounclass.biom'
from biom import load_table
table = load_table(biom_file)
print table.ids(axis='sample')
print table.ids(axis='observation')[:10]

[u'SIMZ1' u'SIMZ11' u'SIMZ10' u'SIMZ13' u'SIMZ17' u'SIMZ6' u'SIMZ2'
 u'SIMZ5' u'SIMZ4' u'SIMZ18' u'SIMZ3' u'SIMZ14' u'SIMZ12' u'SIMZ7' u'SIMZ9'
 u'SIMZ8' u'SIMZ16']
[u'denovo3239' u'denovo1173' u'denovo1774' u'denovo1778' u'denovo1779'
 u'denovo2765' u'denovo2601' u'denovo1516' u'denovo1517' u'denovo1510']


Find all the VerticalNetTow Sample identifiers for all our SIMZ campaigns. These will be our links to the environmental and other sample data.

In [3]:
nettows = {}
for db in dbs:
    for s in Sample.objects.using(db).filter(sampletype__name='VerticalNetTow'
                ).order_by('instantpoint__activity__name'):
        print s.instantpoint.activity.name, db
        nettows[s.instantpoint.activity.name] = db

simz2013c01_NetTow1 stoqs_simz_aug2013
simz2013c02_NetTow1 stoqs_simz_aug2013
simz2013c03_NetTow1 stoqs_simz_aug2013
simz2013c04_NetTow1 stoqs_simz_aug2013
simz2013c05_NetTow1 stoqs_simz_aug2013
simz2013c06_NetTow1 stoqs_simz_aug2013
simz2013c07_NetTow1 stoqs_simz_aug2013
simz2013c08_NetTow1 stoqs_simz_aug2013
simz2013c09_NetTow1 stoqs_simz_aug2013
simz2013c10_NetTow1 stoqs_simz_aug2013
simz2013c11_NetTow1 stoqs_simz_aug2013
simz2013c12_NetTow1 stoqs_simz_aug2013
simz2013c13_NetTow1 stoqs_simz_aug2013
simz2013c14_NetTow1 stoqs_simz_aug2013
simz2013c15_NetTow1 stoqs_simz_aug2013
simz2013c16_NetTow1 stoqs_simz_aug2013
simz2013c17_NetTow1 stoqs_simz_aug2013
simz2013c18_NetTow1 stoqs_simz_aug2013


It looks as though the BIOM table ids (SIMZ1, SIMZ2, ...) correspond to the STOQS s.instantpoint.activity.names (simz2013c01_NetTow1, simz2013c02_NetTow1, ...). Let's loop through the BIOM file sample ids and pull data from STOQS for each BIOM file sample.

In [4]:
samples = ['simz2013c{:02d}_NetTow1'.format(int(n[4:])) for n in table.ids()]
for s in sorted(samples):
    print s
    sps = SampledParameter.objects.using(nettows[s]).filter(sample__instantpoint__activity__name=s)
    for sp in sps:
        print '{:>30s}: {:.2f} ({})'.format(sp.parameter.name, sp.datavalue, sp.parameter.units)
    print

simz2013c01_NetTow1
          B1006_barnacles_A650: 0.07 (OD A650 nm)
              M2B_mussels_A650: 0.05 (OD A650 nm)
           GCRAB_Carcinus_A650: 0.04 (OD A650 nm)
         CAL903_calanoida_A650: 0.07 (OD A650 nm)
        CAL1939_calanoida_A650: 4.00 (OD A650 nm)
         POD1951_podoplea_A650: 0.08 (OD A650 nm)
    SAB1182_sabellariidae_A650: 0.05 (OD A650 nm)
        CRAB903_brachyura_A650: 0.05 (OD A650 nm)
        OS1022_polychaeta_A650: 0.13 (OD A650 nm)
        SPI1181_spionidae_A650: 0.04 (OD A650 nm)
          B1006_barnacles_A450: 0.12 (OD A450 nm)
              M2B_mussels_A450: 0.07 (OD A450 nm)
           GCRAB_Carcinus_A450: 0.06 (OD A450 nm)
         CAL903_calanoida_A450: 0.12 (OD A450 nm)
        CAL1939_calanoida_A450: 4.00 (OD A450 nm)
         POD1951_podoplea_A450: 0.14 (OD A450 nm)
    SAB1182_sabellariidae_A450: 0.07 (OD A450 nm)
        CRAB903_brachyura_A450: 0.08 (OD A450 nm)
        OS1022_polychaeta_A450: 0.24 (OD A450 nm)
        SPI1181_spionidae_A450