# Interacting with the VAMPS-API
## Requirements:
* **[Python](https://www.python.org/downloads/)**  
 * required external packages: BeautifulSoup, lxml, requests
* **[Anaconda](https://www.continuum.io/downloads)** 
 * needed to run Jupyter notebook environment  
   
## This notebook:
* Logs into VAMPS account  
 * if you don't have an account, use guest account (Username: 'guest', Password: 'guest')
* Produces a configuration file for a selected project 
 * Default: Eukaryote data from the Martha's Vineyard Coastal Observatory (MVCO), selecting one of several visualizations (default: piecharts)
* Displays visualization and saves time-stamped matrix and image files to local computer
* Uses geographic metadata for the first project to discover a second project at the same location  
 * Default: MVCO bacteria data
* Modifies the first configuration file to be able to use it for the second project
* Displays same visualization and saves files for the second project

## To use this notebook:
* Run jupyter enviroment locally by typing "jupyter notebook" into Terminal
* Make sure cells are cleared by clicking "Cell" -> "All Output" -> "Clear"
* Press run cell button
* Enter information if promtped   
* Do not run the next cell if the previous cell has an asterisk next to "In" ("In [*]:"), this means it is still processing  
* When the asterisk turns into any number, you can run the next cell  

<h3>Import relevant python packages; Allow both Python 2 and 3</h3>

In [2]:
import os,sys
import requests
from bs4 import BeautifulSoup   # parser for html
import json, string, getpass

#to allow both Python 2 and 3
try:
    input = raw_input  
except NameError: #Python 3
    pass

<h3>Gets VAMPS username and password, then attempts login to VAMPS</h3>

In [3]:
#get VAMPS username and password
user = input("Enter your VAMPS username: ")
pw = getpass.getpass("Enter your VAMPS password: ")

conn = {'user': user,
        'passwd': pw,
         # vamps:             https://vamps2.mbl.edu
         # vampsdev (private) http://vampsdev.jbpc-np.mbl.edu:8124 
         # localhost:         http://localhost:3000 
        'hosturl':'https://vamps2.mbl.edu'
       }

#attempt login to VAMPS
s = requests.Session()
r = s.post(conn['hosturl']+'/users/login', data={'username':conn['user'], 'password':conn['passwd']})

Enter your VAMPS username: ruzics
Enter your VAMPS password: ········


<h3>If username/password is incorrect, exit program</h3>

In [4]:
#exit program if login unsuccessful
if r.url == 'https://vamps2.mbl.edu/users/login':
    sys.exit('Login not successful')
elif r.url == 'https://vamps2.mbl.edu/':
    print('Login successful')

Login successful


<h3>Option to upload an existing config file or see list of datasets (if not using guest account)</h3>

In [5]:
if user != 'guest':
    upload = input("Do you want to use an already existing config file? ('Y' or 'N'): ")

    #to upload config: 
    if upload[0].capitalize() == "Y":
        file = input('Enter JSON Config File: ')
        with open(file) as f:        
            config = json.load(f)
        id_list = 'N'
    else:
        id_list = input("Do you want to search through datasets or see all you have access to? ('Y' or 'N'): ")

Do you want to use an already existing config file? ('Y' or 'N'): N
Do you want to search through datasets or see all you have access to? ('Y' or 'N'): Y
Edit 'config' below to match preferences before running cell


<h3>If selected, search through datasets</h3>

In [7]:
if user == 'guest':
    upload = "N"
    id_list = input("Do you want to search through datasets or see all you have access to? ('Y' or 'N'): ")
    print("Edit 'config' below to match preferences before running cell")
    
if id_list[0].capitalize() == "Y":
    search = input("Enter dataset you are looking for to get a list of matches: ")
    data = {
   'search_string': search,  # If not empty will search for projects with string in 
                             # project name, title or description (case insensitive)
   # Uncomment below line to include project information
   #'include_info':''        # if present, data will include project information
    }
    r = s.post(conn['hosturl']+'/api/find_user_projects', timeout=15, data=data) 
    result = json.loads(r.text)
    print(result)

Enter dataset you are looking for to get a list of matches: MVCO
['AFP_MVCO_Bv6', 'MVCO_ciliate_timeseries2']


<h3>If a configuration file was not uploaded or you are using guest account, set config using MVCO eukaryote data as default project</h3>

In [8]:
if upload[0].capitalize() == "N":
    #default config (if not uploaded); Emily B.'s MVCO eukaryote data
    config = {
        "api":"1",
        "source":"VAMPS-API",
        "update_data":1,
        "normalization":"none",               # none, maximum, frequency             
        "selected_distance":"morisita-horn",  # morisita-horn, jaccard, kulczynski, canberra bray-curtis
        "tax_depth":"family",                  # domain, phylum, klass, order, family, genus, species, strain
        "domains":["Eukarya"],                #["Archaea","Bacteria","Eukarya","Organelle","Unknown"] 
        "include_nas":"yes",                  # yes or no             
        "min_range":0,                        # integer 0-99
        "max_range":100,                      # integer 1-100

        # Must be a valid project - with correct permissions for the above user. 
        # Default is Emily B.'s MVCO eukaryote data
        'project':'MVCO_ciliate_timeseries2',   
        
        # Currently avalable: "dheatmap", "piecharts", "barcharts", "counts_matrix", "metadata_csv", "adiversity", "fheatmap", "dendrogram" 
        # Default is Alpha Diversity visualization
        'image':'piecharts'
        } 

<h3>Get and display dataset IDs for selected project</h3>

In [9]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
print(config['ds_order'])

[353191,353192,353193,353194,353195,353196,353197,353198,353199,353200,353201,353202,353203,353204,353205,353206,353207,353208,353209,353210,353211,353212,353213,353214,353215,353216,353217]


<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [10]:
# Get timestamp to be used as a prefix for files:
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string
print("Timestamp/file prefix:",ts)

Timestamp/file prefix: ruzics_1503411724269


<h3>Save matrix file which is integral to VAMPS images</h3>

In [11]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save VAMPS visualization output file</h3>

In [12]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

ruzics_1503411724269-piecharts-api.svg
Done writing local file: ruzics_1503411724269-piecharts-api.svg


<h3>Show visualization output for first dataset</h3>

In [13]:
from IPython.core.display import display, HTML
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

<h3>Get project Metadata</h3>

In [14]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)
print("Loaded metadata")

Loaded metadata


<h3>Format and show sample of Metadata, saving Latitude and Longitude information</h3>

In [15]:
data_lat = ''
data_long = ''
count = 0
for ids in result:
    if count >= 3:
        continue
    print("Dataset ID",ids)
    print("    ","Adapter Sequence:",result[ids]['adapter_sequence'])
    print("    ","Geo Location Name:",result[ids]['geo_loc_name'])
    print("    ","Run:",result[ids]['run'])
    print("    ","Collection Date:",result[ids]['collection_date'])
    print("    ","Environment Material:",result[ids]['env_material'])
    print("    ", "DNA Region:",result[ids]['dna_region'])
    print("    ","Environment Biome:",result[ids]["env_biome"])
    print("    ","Environment Package:",result[ids]["env_package"])
    print("    ","Environment Feature:",result[ids]["env_feature"])
    print("    ","Latitude:",result[ids]['latitude'])
    data_lat = result[ids]['latitude']
    print("    ","Longitude:",result[ids]['longitude'])
    data_long = result[ids]['longitude']
    print("    ","Primer Suite:",result[ids]['primer_suite'])
    print("    ","Sequencing Platform:",result[ids]["sequencing_platform"])
    print("    ","Target Gene:",result[ids]["target_gene"])
    print("    ","Illumina Index:",result[ids]["illumina_index"])
    print("    ","Domain:",result[ids]["domain"])
    print("")
    count += 1

Dataset ID RG10join
     Adapter Sequence: unknown
     Geo Location Name: unknown
     Run: unknown
     Collection Date: 2014-12-19
     Environment Material: unknown
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Environment Feature: unknown
     Latitude: 41.325
     Longitude: -70.565
     Primer Suite: Vibrio V4
     Sequencing Platform: illumina
     Target Gene: 18s
     Illumina Index: unknown
     Domain: Eukarya

Dataset ID RG11join
     Adapter Sequence: unknown
     Geo Location Name: unknown
     Run: unknown
     Collection Date: 2015-03-10
     Environment Material: unknown
     DNA Region: v9v6
     Environment Biome: ocean
     Environment Package: water-marine
     Environment Feature: unknown
     Latitude: 41.325
     Longitude: -70.565
     Primer Suite: Vibrio V4
     Sequencing Platform: illumina
     Target Gene: 18s
     Illumina Index: unknown
     Domain: Eukarya

Dataset ID RG12join
     Adapter Sequence: unk

### Use first project latitude/longitude metadata to search for next project name
### If MVCO Bacteria dataset is found, change config file

In [18]:
found = 'N'

data = {'nw_lat':'42','nw_lon':'-75','se_lat':'40','se_lon':'-70'}
r = s.post(conn['hosturl']+'/api/find_projects_in_geo_area', timeout=15, data=data)  
result = json.loads(r.text)
print(result)

for sets in result:
    if sets == 'AFP_MVCO_Bv6':
        print("Found second project dataset using latitude/longitude data")
        #if MVCO bacteria data is found, change config project
        config['project'] = sets
        found = 'Y'

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### If second project wasn't found using latitude/longitude metadata, use the same configuration file and modify it as necessary

In [None]:
#change config file project to second project if not found using lat/long metadata
if found[0].capitalize() == 'N':
    config['project'] = 'AFP_MVCO_Bv6'
config['domains'] = ["Bacteria"]
config["tax_depth"] = "family"

### Now use the same method as before to produce the visualization, first getting and displaying dataset IDs
### Exclude 4 datasets from second project

In [None]:
# get project ids:
r = s.post(conn['hosturl']+'/api/get_dids_from_project', timeout=15, data=config)  
config['ds_order'] = r.text
# exclude 4 datasets
if config['project'] == 'AFP_MVCO_Bv6':
    #get datasets (string form)
    temp = config['ds_order']
    #convert string to list in order to remove datasets
    temp = temp.strip('[').strip(']').split(',')
    temp.remove('336408')
    temp.remove('336409')
    temp.remove('336407')
    temp.remove('336410')
    #convert back to string form
    temp = (",").join(temp)
    temp = '[' + temp + ']'
    #set temp datasets to replace ds_order in config
    config['ds_order'] = temp
print(config['ds_order'])

<h3>Create remote configuration and get timestamp (file_prefix)</h3>

In [None]:
# Get timestamp (filename prefix):
r = s.post(conn['hosturl']+'/visuals/view_selection', timeout=15, data=config)
soup = BeautifulSoup(r.text, "lxml")  # html5lib  lxml html.parser

ts = soup.find(id="ts_for_bs").string
print("Timestamp/file prefix:",ts)

<h3>Save matrix file which is integral to VAMPS images</h3>

In [None]:
import json
biom_matrix_file = ts+'_count_matrix.biom'
url = conn['hosturl']+"/"+biom_matrix_file
response = requests.get(url, stream=True)
response.raise_for_status()
out_file = biom_matrix_file
with open(out_file, "wb") as handle:
    for block in response.iter_content(1024):
        handle.write(block)

<h3>Save image file</h3>

In [None]:
r = s.post(conn['hosturl']+'/api/create_image', timeout=30, data=config)

try:
    result = json.loads(r.text)
except:
    print(r.text)
    sys.exit()
local_filename = result['filename']
return_result = result['html']
print(local_filename)
remote_file_name = conn['hosturl']+"/"+local_filename

r = requests.get(remote_file_name, stream=True)
with open(local_filename, 'wb') as f:
    f.write(r.content)
print('Done writing local file:',local_filename)

<h3>Show visualization output for second project</h3>

In [None]:
out = ''
out = HTML("<style>.container { width:100% !important; }</style>"+return_result)
out

<h3>Get project Metadata</h3>

In [None]:
data = {"project": config['project']}
r = s.post(conn['hosturl']+'/api/get_metadata_from_project', timeout=15, data=data)  
result = json.loads(r.text)
print("Loaded metadata")

<h3>Format and show sample of Metadata</h3>

In [None]:
count = 0
for ids in result:
    if count >= 3:
        continue
    print("Dataset ID",ids)
    print("    ","Adapter Sequence:",result[ids]['adapter_sequence'])
    print("    ","Geo Location Name:",result[ids]['geo_loc_name'])
    print("    ","Run:",result[ids]['run'])
    print("    ","Collection Date:",result[ids]['collection_date'])
    print("    ","Environment Material:",result[ids]['env_material'])
    print("    ", "DNA Region:",result[ids]['dna_region'])
    print("    ","Environment Biome:",result[ids]["env_biome"])
    print("    ","Environment Package:",result[ids]["env_package"])
    print("    ","Environment Feature:",result[ids]["env_feature"])
    print("    ","Latitude:",result[ids]['latitude'])
    print("    ","Longitude:",result[ids]['longitude'])
    print("    ","Primer Suite:",result[ids]['primer_suite'])
    print("    ","Sequencing Platform:",result[ids]["sequencing_platform"])
    print("    ","Target Gene:",result[ids]["target_gene"])
    print("    ","Illumina Index:",result[ids]["illumina_index"])
    print("    ","Domain:",result[ids]["domain"])
    print("")
    count += 1