### Perspective User: Bioinformatician
Use case would be “A recent GWAS paper said genetic variant rs1361754 is important for heart health, I want to know if that rs-number effects gene expression a.k.a. has any eQTLs.

#### Overview of Steps

#1) Use the correct GUID to download the GTEx file “GTEx_Analysis_v7_eQTL.tar.gz”
#2) tar xf GTEx_Analysis_v7_eQTL.tar.gz
#2) Convert the dbSNP variant id “rs1361754” to GTEx’s variant id “1_205801872_A_G_b37”
#3) Grep all significant eQTLs with the variant id “1_205801872_A_G_b37” from all tissue files
#4) Convert Gencode gene id’s in the grep results to HGNC gene symbols
#5) Create new GUID for resulting file

## Behind the Scenes Minting the Identifiers

### Invisibe to the user, data stewards have responsibility for minting GUIDs

In [3]:
ADDR = "http://54.242.133.97"
BAGIT = "http://52.204.34.204"

In [14]:
# Build Test Identifiers
import requests
import json

# Data Catalog
gtex = {"@context" : "http://schema.org", 
        "@id" : "ark:/99999/fk4DownloadTestDC", 
        "@type": "DataCatalog", 
        "identifier": "https://www.gtexportal.org/home/", 
        "name": "GTEx Portal"}  

# Dataset
gtex_eQTL = {"@context": "http://schema.org",
        "@type": "Dataset",
        "@id": "ark:/99999/fk4DownloadTestDS", 
        "identifier": "ark:/99999/fk4DownloadTestDS", 
        "includedInDataCatalog": "ark:/99999/fk4DownloadTestDC", 
        "dateCreated": "01-29-2018"}

# DataDownload
gtex_eQTL_download = {"@context": "http://schema.org",
        "@type": "DatasetDownload" , 
        "@id": "ark:/99999/fk4DownloadTestDD",
        "identifier": "ark:/99999/fk4DownloadTestDD", 
        "version": "1.0.0", 
        "includedInDataset": "ark:/99999/fk4DownloadTestDS", 
        "contentSize": " bytes", 
        "fileFormat": ".tar.gz",
        "contentUrl": "http://s3.amazonaws.com/dcppc/test.txt",
        "checksum": "madeupchecksum123",
        "checksumMethod": "md5",
        "filename": "test.txt"}

# basic authentication
basicAuth = requests.auth.HTTPBasicAuth('apitest', 'apitest')

response =  requests.put(
    url = ADDR+'/mint',
    auth = basicAuth,
    data = json.dumps(gtex)
)

print(response.status_code)

response =  requests.put(
    url = ADDR+'/mint',
    auth = basicAuth,
    data = json.dumps(gtex_eQTL)
)
print(response.status_code)

response =  requests.put(
    url = ADDR+'/mint',
    auth = basicAuth,
    data = json.dumps(gtex_eQTL_download)
)

print(response.status_code)

201
201
201


In [10]:
requests.delete(
    url=ADDR+'/ark:/99999/fk4DownloadTestDD',
    auth=basicAuth
)

requests.delete(
    url=ADDR+'/ark:/99999/fk4DownloadTestDS',
    auth=basicAuth
)

requests.delete(
    url=ADDR+'/ark:/99999/fk4DownloadTestDC',
    auth=basicAuth
)

<Response [200]>

### Finding the File
- Landing page for the service generated from Identifier Level metadata
    - http:// ark:/99999/fk4DownloadTestDD 
- Resolving to a bucket using api
    - download directly or using cloud provider tools
- Build a Bag Using the BagIt Service
    - fetch the files

In [17]:
# Using the API find the content url
# transfer using cloud provider tools
api_response = requests.get(url = ADDR+'/ark:/99999/fk4DownloadTestDD', headers= {"Accept":"application/json"})

In [20]:
data = api_response.json()
data.get('contentUrl', None)

'http://s3.amazonaws.com/dcppc/test.txt'

In [45]:
# transfer using AWS tools
import boto3
s3_resource = boto3.resource("s3")
bucket = s3_resource.Bucket('dcppctest')

with open('test.txt', 'wb') as data:
    bucket.download_fileobj('test.txt', data)


In [35]:
# Build a Bag, which can then be built with the 
response = requests.get(url = BAGIT+'/bag/ark:/99999/fk4DownloadTestDS')
assert response.status_code == 200

In [41]:
# save response content as bagit.zip
with open('bagit.zip', 'wb') as bag:
    bag.write(response.content)
    
# unzip file
import zipfile
with zipfile.ZipFile("bagit.zip","r") as zip_ref:
    zip_ref.extractall()

# use bag utilities to fetch files

In [46]:
#Import seven bridges

# Do some amazing research that outputs a file

with open("research.txt", 'w') as research:
    research.write('My Awesome Research')

In [50]:
# Upload the resulting analysis to the Cloud
# upload to an s3 bucket in the cloud
s3_client = boto3.client("s3")

with open("research.txt", "rb") as f:
    s3_client.upload_fileobj(f, "dcppctest", "analysis")

In [33]:
# Mint a Minid Identifier for our new analysis file


analysis = {"identifier": "ark:/99999/fk4r8776t",
            "created": "2015-11-10 04:44:44.387671",
            "creator": "0000-0003-2129-5269",
            "checksum": "cacc1abf711425d3c554277a5989df269cefaa906d27f1aaa72205d30224ed5f",
            "checksumMethod": "sha1",
            "status": "ACTIVE",
            "locations": ["http://http://s3.amazonaws.com/dcppctest/analysis"],
            "titles": ["minid: A BD2K Minimal Viable Identifier Pilot v0.1"]}

response = requests.put(
    url = ADDR+'/mint',
    auth = basicAuth,
    data = json.dumps(analysis)
)


In [34]:
response.status_code

201

# Analysis now Has a landing page!

http://54.242.133.97/ark:/99999/fk4r8776t