<h1>Interacting with the VAMPS-API</h1>
<h2>This notebook:</h2>
<ul>
<li>Logs into VAMPS account, if you don't have an account, use guest account (Username: 'guest', Password: 'guest')</li>
<li>Produces a config file for MVCO eukaryote data, selecting the Alpha Diversity visualization</li>
<li>Displays visualization using configuration file</li>
<li>Takes the same configuration file, modifies the project to be for MVCO bacteria data</li>
<li>Displays same vizualization using same constraints, with the different project</li>
</ul>
<h2>To use this notebook:</h2>
<ul><li>Press run cell button</li>
<li>Enter information if promtped</li>
<li>Do not run the next cell if the previous cell has an asterisk next to "In" ("In [*]:"), this means it is still processing</li>
<li>When the asterisk turns into any number, you can run the next cell</li></ul>

<h3>Import relevant python packages; Allow both Python 2 and 3</h3>

In [82]:
import os,sys
import requests
from bs4 import BeautifulSoup   # parser for html
import json, string, getpass

#to allow both Python 2 and 3
try:
    input = raw_input  
except NameError: #Python 3
    pass

<h3>Gets VAMPS username and password, then attempts login to VAMPS</h3>

In [83]:
#get VAMPS username and password
user = input("Enter your VAMPS username: ")
pw = getpass.getpass("Enter your VAMPS password: ")

conn = {'user': user,
        'passwd': pw,
         # vamps:             https://vamps2.mbl.edu
         # vampsdev (private) http://vampsdev.jbpc-np.mbl.edu:8124 
         # localhost:         http://localhost:3000 
        'hosturl':'https://vamps2.mbl.edu'
       }

#attempt login to VAMPS
s = requests.Session()
r = s.post(conn['hosturl']+'/users/login', data={'username':conn['user'], 'password':conn['passwd']})

Enter your VAMPS username: ruzics
Enter your VAMPS password: ········


<h3>If username/password is incorrect, exit program</h3>

In [84]:
#exit program if login unsuccessful
if r.url == 'https://vamps2.mbl.edu/users/login':
    sys.exit('Login not successful')
elif r.url == 'https://vamps2.mbl.edu/':
    print('Login successful')

Login successful


<h3>Set original configuration file, using MVCO eukaryote data as project</h3>

In [85]:
#original config; Emily B.'s MVCO eukaryote data
config = {
        "api":"1",
        "source":"VAMPS-API",
        "update_data":1,
        "normalization":"none",              
        "selected_distance":"morisita-horn",  
        "tax_depth":"phylum",                 
        "domains":["Eukarya"],  #["Archaea","Bacteria","Eukarya","Organelle","Unknown"] 
        "include_nas":"yes",                  
        "min_range":0,                        
        "max_range":100,                      

          # Emily B.'s MVCO eukaryote data
        'project':'MVCO_ciliate_timeseries2',   

          # Alpha Diversity visualization
        'image':'adiversity'
    }

<h3>Get and display project IDs</h3>

In [86]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
print(config['ds_order'])

[353191,353192,353193,353194,353195,353196,353197,353198,353199,353200,353201,353202,353203,353204,353205,353206,353207,353208,353209,353210,353211,353212,353213,353214,353215,353216,353217]


<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [87]:
# Get timestamp (filename prefix):
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string

<h3>Save matrix file which is integral to VAMPS images</h3>

In [88]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save image file</h3>

In [89]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

ruzics_1501864934492-adiversity-api.csv
Done writing local file: ruzics_1501864934492-adiversity-api.csv


<h3>Show Alpha Diversity output for MVCO eukaryote data</h3>

In [90]:
from IPython.core.display import display, HTML
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

0,1,2,3,4,5
Dataset,observed richness,ACE,chao1,Shannon,Simpson
MVCO_ciliate_timeseries2--RG10join,3,4.125,3.0,0.00104311787583,0.0001297636222
MVCO_ciliate_timeseries2--RG11join,6,12.0,7.5,0.04560676973,0.00951792891686
MVCO_ciliate_timeseries2--RG12join,7,7.66666666667,7.0,0.0354457589366,0.00640060925154
MVCO_ciliate_timeseries2--RG13join,11,11.5204081633,11.0,0.260846073267,0.0568602997546
MVCO_ciliate_timeseries2--RG14join,3,3.0,3.0,0.00134989442722,0.000169181660708
MVCO_ciliate_timeseries2--RG15join,7,10.0106846063,10.0,0.00540072764325,0.000718990523862
MVCO_ciliate_timeseries2--RG16join,7,11.9393939394,10.0,0.00502377048831,0.000680516529676
MVCO_ciliate_timeseries2--RG17join,6,8.23140495868,7.0,0.00917918876043,0.00133804122773
MVCO_ciliate_timeseries2--RG18join,7,13.0,9.0,0.00670204393226,0.00088875304526


<h3>Get project Metadata</h3>

In [95]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)

<h3>Format and show Metadata, saving Latitude and Longitude</h3>

In [96]:
data_lat = ''
data_long = ''
for ids in result:
    print("Project ID",ids)
    print("    ","Collection Date:",result[ids]['collection_date'])
    print("    ", "DNA Region:",result[ids]['dna_region'])
    print("    ","Environment Biome:",result[ids]["env_biome"])
    print("    ","Environment Package:",result[ids]["env_package"])
    print("    ","Latitude:",result[ids]['latitude'])
    data_lat = result[ids]['latitude']
    print("    ","Longitude:",result[ids]['longitude'])
    data_long = result[ids]['longitude']
    print("    ","Primer Suite:",result[ids]['primer_suite'])
    print("    ","Sequencing Platform:",result[ids]["sequencing_platform"])
    print("    ","Target Gene:",result[ids]["target_gene"])
    print("")

Project ID 353191
     Collection Date: 2014-12-19
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Latitude: 41.325
     Longitude: -70.565
     Primer Suite: Vibrio V4
     Sequencing Platform: illumina
     Target Gene: 18s

Project ID 353192
     Collection Date: 2015-03-10
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Latitude: 41.325
     Longitude: -70.565
     Primer Suite: Vibrio V4
     Sequencing Platform: illumina
     Target Gene: 18s

Project ID 353193
     Collection Date: 2015-03-25
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Latitude: 41.325
     Longitude: -70.565
     Primer Suite: Vibrio V4
     Sequencing Platform: illumina
     Target Gene: 18s

Project ID 353194
     Collection Date: 2015-04-02
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Latitude: 41.325
     Longit

<h3>Use MVCO Eukaryote data latitude/longitude metadata to find MVCO Bacteria dataset name (if lat/long metadata available)</h3>
<h3>If MVCO Bacteria dataset is found, change config file</h3>

In [97]:
data = {'nw_lat': data_lat,'nw_lon': data_long,'se_lat':'','se_lon':''}
r = s.post(conn['hosturl']+'/api/find_projects_in_geo_area', timeout=15, data=data)  
result = json.loads(r.text)

found = 'N'
for sets in result:
    if sets == 'AFP_MVCO_Bv6':
        #if MVCO bacteria data is found, change config project
        config['project'] = sets
        found = 'Y'

<h3>Change configuration file to be for MVCO bacteria data and domain to be Bacteria, keeping everything else the same</h3>

In [98]:
#change config file project to Kristen's MVCO bacteria data if project wasn't found using lat/long metadata
if found == 'N':
    config['project'] = 'AFP_MVCO_Bv6'
config['domains'] = ["Bacteria"]

<h3>Now use the same method as before to produce the visualization, first getting and displaying dataset IDs</h3>

In [99]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
print(config['ds_order'])

[336411,336422,336433,336444,336455,336460,336461,336462,336463,336412,336413,336414,336415,336416,336418,336417,336419,336420,336421,336423,336424,336425,336426,336427,336428,336429,336430,336431,336432,336434,336435,336436,336437,336438,336439,336440,336441,336442,336443,336445,336446,336447,336448,336449,336450,336451,336452,336453,336454,336456,336457,336458,336459,336408,336409,336407,336410]


<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [100]:
# Get timestamp (filename prefix):
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string

<h3>Save matrix file which is integral to VAMPS images</h3>

In [101]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save image file</h3>

In [102]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

ruzics_1501865539654-adiversity-api.csv
Done writing local file: ruzics_1501865539654-adiversity-api.csv


<h3>Show Alpha Diversity output for MVCO bacteria data</h3>

In [103]:
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

0,1,2,3,4,5
Dataset,observed richness,ACE,chao1,Shannon,Simpson
AFP_MVCO_Bv6--MVCO_254,29,34.6730324074,32.3333333333,1.62495258096,0.518883000781
AFP_MVCO_Bv6--MVCO_257,28,28.5853658537,28.5,2.18715870548,0.7125100003
AFP_MVCO_Bv6--MVCO_258,26,26.3430882541,26.0,2.13766882129,0.694597927615
AFP_MVCO_Bv6--MVCO_259,28,28.0,28.0,2.24073767471,0.71908800898
AFP_MVCO_Bv6--MVCO_260,27,27.2284754732,27.0,2.26900233154,0.730918884404
AFP_MVCO_Bv6--MVCO_261,29,30.929375,30.5,2.26350751433,0.723522308981
AFP_MVCO_Bv6--MVCO_262,29,29.3070956622,29.0,2.23718728732,0.728145007451
AFP_MVCO_Bv6--MVCO_263,28,28.9314634146,28.5,2.08852046801,0.684876266827
AFP_MVCO_Bv6--MVCO_265,27,27.4888311688,28.0,2.02154711628,0.67640099354


<h3>Get project Metadata</h3>

In [104]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)

<h3>Format and show Metadata</h3>

In [105]:
for ids in result:
    print("Project ID",ids)
    print("    ", "DNA Region:",result[ids]['dna_region'])
    print("    ","Domain:",result[ids]["domain"])
    print("    ","Geographic Location:",result[ids]["geo_loc_name"])
    print("    ","Illumina Index:",result[ids]['illumina_index'])
    print("    ","Primer Suite:",result[ids]['primer_suite'])
    print("    ","Sequencing Platform:",result[ids]["sequencing_platform"])
    print("    ","Target Gene:",result[ids]["target_gene"])
    print("")

Project ID 336407
     DNA Region: v6
     Domain: Bacteria
     Geographic Location: United States of America
     Illumina Index: CTTGTA
     Primer Suite: Bacterial V6 Suite
     Sequencing Platform: illumina
     Target Gene: 16s

Project ID 336408
     DNA Region: v6
     Domain: Bacteria
     Geographic Location: United States of America
     Illumina Index: ATCACG
     Primer Suite: Bacterial V6 Suite
     Sequencing Platform: illumina
     Target Gene: 16s

Project ID 336409
     DNA Region: v6
     Domain: Bacteria
     Geographic Location: United States of America
     Illumina Index: ATCACG
     Primer Suite: Bacterial V6 Suite
     Sequencing Platform: illumina
     Target Gene: 16s

Project ID 336410
     DNA Region: v6
     Domain: Bacteria
     Geographic Location: United States of America
     Illumina Index: CTTGTA
     Primer Suite: Bacterial V6 Suite
     Sequencing Platform: illumina
     Target Gene: 16s

Project ID 336411
     DNA Region: v6
     Domain: Bacteria
