# Interacting with the VAMPS-API
## Requirements:
* **[Python](https://www.python.org/downloads/)**  
 * required external packages: BeautifulSoup, lxml, requests
* **[Anaconda](https://www.continuum.io/downloads)** 
 * needed to run Jupyter notebook environment  
   
## This notebook:
* Logs into VAMPS account  
 * if you don't have an account, use guest account (Username: 'guest', Password: 'guest')
* Produces a configuration file for a selected project 
 * Default: Eukaryote data from the Martha's Vineyard Coastal Observatory (MVCO), selecting one of several visualizations (default: piecharts)
* Displays visualization and saves time-stamped matrix and image files to local computer
* Uses geographic metadata for the first project to discover a second project at the same location  
 * Default: MVCO bacteria data
* Modifies the first configuration file to be able to use it for the second project
* Displays same visualization and saves files for the second project

## To use this notebook:
* Run jupyter enviroment locally by typing "jupyter notebook" into Terminal
* Make sure cells are cleared by clicking "Cell" -> "All Output" -> "Clear"
* Press run cell button
* Enter information if promtped   
* Do not run the next cell if the previous cell has an asterisk next to "In" ("In [*]:"), this means it is still processing  
* When the asterisk turns into any number, you can run the next cell  

<h3>Import relevant python packages; Allow both Python 2 and 3</h3>

In [1]:
import os,sys
import requests
from bs4 import BeautifulSoup   # parser for html
import json, string, getpass

#to allow both Python 2 and 3
try:
    input = raw_input  
except NameError: #Python 3
    pass

<h3>Gets VAMPS username and password, then attempts login to VAMPS</h3>

In [2]:
#get VAMPS username and password
user = input("Enter your VAMPS username: ")
pw = getpass.getpass("Enter your VAMPS password: ")

conn = {'user': user,
        'passwd': pw,
         # vamps:             https://vamps2.mbl.edu
         # vampsdev (private) http://vampsdev.jbpc-np.mbl.edu:8124 
         # localhost:         http://localhost:3000 
        'hosturl':'https://vamps2.mbl.edu'
       }

#attempt login to VAMPS
s = requests.Session()
r = s.post(conn['hosturl']+'/users/login', data={'username':conn['user'], 'password':conn['passwd']})

Enter your VAMPS username: ruzics
Enter your VAMPS password: ········


<h3>If username/password is incorrect, exit program</h3>

In [3]:
#exit program if login unsuccessful
if r.url == 'https://vamps2.mbl.edu/users/login':
    sys.exit('Login not successful')
elif r.url == 'https://vamps2.mbl.edu/':
    print('Login successful')

Login successful


<h3>Option to upload an existing config file or see list of datasets (if not using guest account)</h3>

In [4]:
if user != 'guest':
    upload = input("Do you want to use an already existing config file? ('Y' or 'N'): ")

    #to upload config: 
    if upload[0].capitalize() == "Y":
        file = input('Enter JSON Config File: ')
        with open(file) as f:        
            config = json.load(f)
        id_list = 'N'
    else:
        id_list = input("Do you want to search through datasets or see all you have access to? ('Y' or 'N'): ")

Do you want to use an already existing config file? ('Y' or 'N'): N
Do you want to search through datasets or see all you have access to? ('Y' or 'N'): Y


<h3>If selected, search through datasets</h3>

In [5]:
if user == 'guest':
    upload = "N"
    id_list = input("Do you want to search through datasets or see all you have access to? ('Y' or 'N'): ")
    print("Edit 'config' below to match preferences before running cell")
    
if id_list[0].capitalize() == "Y":
    search = input("Enter dataset you are looking for to get a list of matches: ")
    data = {
   'search_string': search,  # If not empty will search for projects with string in 
                             # project name, title or description (case insensitive)
   # Uncomment below line to include project information
   #'include_info':''        # if present, data will include project information
    }
    r = s.post(conn['hosturl']+'/api/find_user_projects', timeout=15, data=data) 
    result = json.loads(r.text)
    print(result)

Enter dataset you are looking for to get a list of matches: MVCO
['AFP_MVCO_Bv6', 'MVCO_ciliate_timeseries2']


<h3>If a configuration file was not uploaded or you are using guest account, set config using MVCO eukaryote data as default project</h3>

In [25]:
if upload[0].capitalize() == "N":
    #default config (if not uploaded); Emily B.'s MVCO eukaryote data
    config = {
        "api":"1",
        "source":"VAMPS-API",
        "update_data":1,
        "normalization":"none",               # none, maximum, frequency             
        "selected_distance":"morisita-horn",  # morisita-horn, jaccard, kulczynski, canberra bray-curtis
        "tax_depth":"family",                  # domain, phylum, klass, order, family, genus, species, strain
        "domains":["Eukarya"],                #["Archaea","Bacteria","Eukarya","Organelle","Unknown"] 
        "include_nas":"yes",                  # yes or no             
        "min_range":0,                        # integer 0-99
        "max_range":100,                      # integer 1-100

        # Must be a valid project - with correct permissions for the above user. 
        # Default is Emily B.'s MVCO eukaryote data
        'project':'MVCO_ciliate_timeseries2',   
        
        # Currently avalable: "dheatmap", "fheatmap", "piecharts", "barcharts", "counts_matrix", "metadata_csv", "adiversity", "dendrogram" 
        'image':'counts_matrix'
        } 

<h3>Get and display dataset IDs for selected project</h3>

In [26]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
print(config['ds_order'])

[353191,353192,353193,353194,353195,353196,353197,353198,353199,353200,353201,353202,353203,353204,353205,353206,353207,353208,353209,353210,353211,353212,353213,353214,353215,353216,353217]


<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [27]:
# Get timestamp to be used as a prefix for files:
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string
print("Timestamp/file prefix:",ts)

Timestamp/file prefix: ruzics_1503430421441


<h3>Save matrix file which is integral to VAMPS images</h3>

In [28]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save VAMPS visualization output file</h3>

In [29]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

ruzics_1503430421441-counts_table-api.html
Done writing local file: ruzics_1503430421441-counts_table-api.html


<h3>Show visualization output for first dataset</h3>

In [30]:
from IPython.core.display import display, HTML
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
,Domain,Phylum,Class,Order,Family,1) MVCO_ciliate_timeseries2--RG10join,2) MVCO_ciliate_timeseries2--RG11join,3) MVCO_ciliate_timeseries2--RG12join,4) MVCO_ciliate_timeseries2--RG13join,5) MVCO_ciliate_timeseries2--RG14join,6) MVCO_ciliate_timeseries2--RG15join,7) MVCO_ciliate_timeseries2--RG16join,8) MVCO_ciliate_timeseries2--RG17join,9) MVCO_ciliate_timeseries2--RG18join,10) MVCO_ciliate_timeseries2--RG19join,11) MVCO_ciliate_timeseries2--RG1join,12) MVCO_ciliate_timeseries2--RG20join,13) MVCO_ciliate_timeseries2--RG21join,14) MVCO_ciliate_timeseries2--RG22join,15) MVCO_ciliate_timeseries2--RG23join,16) MVCO_ciliate_timeseries2--RG24join,17) MVCO_ciliate_timeseries2--RG25join,18) MVCO_ciliate_timeseries2--RG26join,19) MVCO_ciliate_timeseries2--RG27join,20) MVCO_ciliate_timeseries2--RG2join,21) MVCO_ciliate_timeseries2--RG3join,22) MVCO_ciliate_timeseries2--RG4join,23) MVCO_ciliate_timeseries2--RG5join,24) MVCO_ciliate_timeseries2--RG6join,25) MVCO_ciliate_timeseries2--RG7join,26) MVCO_ciliate_timeseries2--RG8join,27) MVCO_ciliate_timeseries2--RG9join,Total,Avg,Min,Max,Std Dev
1.0,Eukarya,Apicomplexa,Coccidia,Eucoccidiorida,Cryptosporidiidae,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0.30,0,8,1.51
2.0,Eukarya,Apicomplexa,empty_class,empty_order,Colpodellidae,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,3,0.11,0,2,0.42
3.0,Eukarya,Arthropoda,Insecta,Coleoptera,Melolonthidae,0,92,56,24,4,8,1,15,2,0,13,0,5,2,1,0,0,0,0,1111,17,40,0,1,0,3,7,1402,51.93,0,1111,208.71
4.0,Eukarya,Brachiopoda,empty_class,empty_order,Productidae,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0.07,0,2,0.38
5.0,Eukarya,Cercozoa,Imbricatea,Euglyphida,Euglyphidae,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,3,0.11,0,3,0.57
6.0,Eukarya,Cercozoa,Imbricatea,Thaumatomonadida,Peregriniidae,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,2,0.07,0,2,0.38
7.0,Eukarya,Cercozoa,class_NA,order_NA,family_NA,0,0,0,0,0,0,16,0,0,0,0,0,0,0,12,2,0,0,0,0,0,10,1,0,0,0,3,44,1.63,0,16,4.05
8.0,Eukarya,Cercozoa,empty_class,empty_order,Protaspidae,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0.11,0,2,0.42
9.0,Eukarya,Chlorophyta,Mamiellophyceae,Mamiellales,Mamiellaceae,0,0,0,0,0,0,0,0,0,0,0,0,0,0,80,0,0,0,0,0,0,0,0,0,0,1,0,81,3.00,0,80,15.10


<h3>Get project Metadata</h3>

In [12]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)
print("Loaded metadata")

Loaded metadata


<h3>Format and show sample of Metadata, saving Latitude and Longitude information</h3>

In [13]:
data_lat = ''
data_long = ''
count = 0

for ids in result:
    if count == 3:
        continue
    count += 1
    for mdname in result[ids]:
        print(mdname + ": " + result[ids][mdname])
        if mdname == "longitude":
            data_long = result[ids][mdname]
        elif mdname == "latitude":
            data_lat = result[ids][mdname]
    print()

adapter_sequence: unknown
geo_loc_name: unknown
run: unknown
collection_date: 2014-12-19
dna_region: v9v6
longitude: -70.565
target_gene: 18s
env_package: water-marine
illumina_index: unknown
env_biome: ocean
sequencing_platform: illumina
primer_suite: Vibrio V4
latitude: 41.325
domain: Eukarya

adapter_sequence: unknown
geo_loc_name: unknown
run: unknown
collection_date: 2015-03-10
dna_region: v9v6
longitude: -70.565
target_gene: 18s
env_package: water-marine
illumina_index: unknown
env_biome: ocean
sequencing_platform: illumina
primer_suite: Vibrio V4
latitude: 41.325
domain: Eukarya

adapter_sequence: unknown
geo_loc_name: unknown
run: unknown
collection_date: 2015-03-25
dna_region: v9v6
longitude: -70.565
target_gene: 18s
env_package: water-marine
illumina_index: unknown
env_biome: ocean
sequencing_platform: illumina
primer_suite: Vibrio V4
latitude: 41.325
domain: Eukarya



### Use first project latitude/longitude metadata to search for next project name
### If MVCO Bacteria dataset is found, change config file

In [17]:
found = 'N'

data = {'nw_lat': data_lat,'nw_lon': data_long,'se_lat':data_lat,'se_lon':data_long}
r = s.post(conn['hosturl']+'/api/find_projects_in_geo_area', timeout=15, data=data)  
result = json.loads(r.text)
print(result)
for sets in result:
    if sets == 'AFP_MVCO_Bv6':
        print("Found second project dataset using latitude/longitude data")
        #if MVCO bacteria data is found, change config project
        config['project'] = sets
        found = 'Y'

{'AFP_MVCO_Bv6': {'latitude': 41.325, 'longitude': -70.565}, 'MVCO_ciliate_timeseries2': {'latitude': 41.325, 'longitude': -70.565}}
Found second project dataset using latitude/longitude data


### If second project wasn't found using latitude/longitude metadata, use the same configuration file and modify it as necessary

In [15]:
#change config file project to second project if not found using lat/long metadata
if found[0].capitalize() == 'N':
    config['project'] = 'AFP_MVCO_Bv6'
config['domains'] = ["Bacteria"]
config["tax_depth"] = "family"

### Now use the same method as before to produce the visualization, first getting and displaying dataset IDs
### Exclude 4 datasets from second project

In [18]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
# exclude 4 datasets
if config['project'] == 'AFP_MVCO_Bv6':
    #get datasets (string form)
    temp = config['ds_order']
    #convert string to list in order to remove datasets
    temp = temp.strip('[').strip(']').split(',')
    temp.remove('336408')
    temp.remove('336409')
    temp.remove('336407')
    temp.remove('336410')
    #convert back to string form
    temp = (",").join(temp)
    temp = '[' + temp + ']'
    #set temp datasets to replace ds_order in config
    config['ds_order'] = temp
print(config['ds_order'])

[336411,336422,336433,336444,336455,336460,336461,336462,336463,336412,336413,336414,336415,336416,336418,336417,336419,336420,336421,336423,336424,336425,336426,336427,336428,336429,336430,336431,336432,336434,336435,336436,336437,336438,336439,336440,336441,336442,336443,336445,336446,336447,336448,336449,336450,336451,336452,336453,336454,336456,336457,336458,336459]


<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [19]:
# Get timestamp (filename prefix):
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string
print("Timestamp/file prefix:",ts)

Timestamp/file prefix: ruzics_1503429703147


<h3>Save matrix file which is integral to VAMPS images</h3>

In [20]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save image file</h3>

In [21]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

ruzics_1503429703147-piecharts-api.svg
Done writing local file: ruzics_1503429703147-piecharts-api.svg


<h3>Show visualization output for second project</h3>

In [22]:
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

<h3>Get project Metadata</h3>

In [23]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)
print("Loaded metadata")

Loaded metadata


<h3>Format and show sample of Metadata</h3>

In [24]:
data_lat = ''
data_long = ''
count = 0

for ids in result:
    if count == 3:
        continue
    count += 1
    for mdname in result[ids]:
        print(mdname + ": " + result[ids][mdname])
        if mdname == "longitude":
            data_long = result[ids][mdname]
        elif mdname == "latitude":
            data_lat = result[ids][mdname]
    print()

adapter_sequence: NNNNGTATC
geo_loc_name: United States of America
run: 20130322
collection_date: unknown
dna_region: v6
longitude: -70.565
target_gene: 16s
env_package: unknown
illumina_index: CTTGTA
sequencing_platform: illumina
primer_suite: Bacterial V6 Suite
latitude: 41.325
domain: Bacteria

adapter_sequence: NNNNGCTAC
geo_loc_name: United States of America
run: 20130322
collection_date: unknown
dna_region: v6
longitude: -70.565
target_gene: 16s
env_package: unknown
illumina_index: ATCACG
sequencing_platform: illumina
primer_suite: Bacterial V6 Suite
latitude: 41.325
domain: Bacteria

adapter_sequence: NNNNGCTAC
geo_loc_name: United States of America
run: 20130322
collection_date: unknown
dna_region: v6
longitude: -70.565
target_gene: 16s
env_package: unknown
illumina_index: ATCACG
sequencing_platform: illumina
primer_suite: Bacterial V6 Suite
latitude: 41.325
domain: Bacteria

