This notebook explores use of the SRA DRS server. It is derived from FASPScript14.py but has been adapted to use a Seven Bridges WES service. 

The mapping of DRS ids to SRA accessions may be done in different ways and the process to do so is in flux.

The approach taken below is using mapping is available through subject and specimen data available through the Search API. In fact in this case the SRR accession shown only for information. The query is formulated in terms of a particular population and that we want mapped cam files. This gives us DRS id's directly. Alternatively a list of SRR accessions could be used.

In [25]:
from fasp.search import DiscoverySearchClient

# Step 1 - Discovery
# query for relevant DRS objects
searchClient = DiscoverySearchClient('https://ga4gh-search-adapter-presto-public.prod.dnastack.com/')

query = '''SELECT f.sample_name, drs_id bam_drs_id, acc
FROM thousand_genomes.onek_genomes.ssd_drs s 
join thousand_genomes.onek_genomes.sra_drs_files f on f.sample_name = s.su_submitter_id 
where filetype = 'bam' and mapped = 'mapped' 
and sequencing_type ='exome' and  population = 'JPT' LIMIT 3'''

resultRows = searchClient.runQuery(query, returnType='dataframe')
resultRows

_Retrieving the query_
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________


Unnamed: 0,sample_name,bam_drs_id,acc
0,NA18948,fb1cfb04d3ef99d07c21f9dbf87ccc68,SRR1601121
1,NA18945,9327fb44eb81b49a41e38c8d86eb3b3a,SRR1601115
2,NA18943,9f38253b281c7e9c99e4bdbececd8e2f,SRR1606910


The method of calling the Search client above returns a dataframe. This is convenient for many purposes, including listing the results as above. The default return type from the runQuery gives a list of lists.

In [33]:
resultRows = searchClient.runQuery(query)
#df = pd.DataFrame(resultRows,  index=None)
#df
resultRows

_Retrieving the query_
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________


[['NA18948', 'fb1cfb04d3ef99d07c21f9dbf87ccc68', 'SRR1601121'],
 ['NA18945', '9327fb44eb81b49a41e38c8d86eb3b3a', 'SRR1601115'],
 ['NA18943', '9f38253b281c7e9c99e4bdbececd8e2f', 'SRR1606910']]

The following shows how the SRA DRS server can be used to determine where the files can be obtained from. The following shows this for the first DRS id from the query results. 

In [29]:
from fasp.loc import DRSClient

#drsClient = DRSMetaResolver()
drsClient = DRSClient('https://locate.ncbi.nlm.nih.gov', public=True, debug=True)
test_id = resultRows[0][1]
print(test_id)
objDetails = drsClient.getObject(test_id)
objDetails


fb1cfb04d3ef99d07c21f9dbf87ccc68
https://locate.ncbi.nlm.nih.gov/ga4gh/drs/v1/objects/fb1cfb04d3ef99d07c21f9dbf87ccc68


{'access_methods': [{'access_id': 'b5f46aadbcb48d7141104db0440feb63cd4e61c8',
   'region': 's3.us-east-1',
   'type': 'https'},
  {'access_id': '1bc0bc010f0edf4ef18af594acdba5db864db67e',
   'region': 'gs.US',
   'type': 'https'},
  {'access_id': '722d3466edf7ad5f6797f9774e21b368c45ad5b1', 'type': 'https'}],
 'checksums': [{'checksum': 'fb1cfb04d3ef99d07c21f9dbf87ccc68',
   'type': 'md5'}],
 'created_time': '2013-02-25T23:24:10Z',
 'id': 'fb1cfb04d3ef99d07c21f9dbf87ccc68',
 'name': 'NA18948.mapped.ILLUMINA.bwa.JPT.exome.20121211.bam',
 'self_url': 'drs://locate.md-be.ncbi.nlm.nih.gov/fb1cfb04d3ef99d07c21f9dbf87ccc68',
 'size': 8752606127}

A second DRS call can be used to obtain a url to access the file from one of the above locations.

In [39]:
access_id = objDetails['access_methods'][0]['access_id']
print('access_id:{}'.format(access_id))
url = drsClient.getAccessURL(test_id, access_id=access_id)
print('url:{}'.format(url))

access_id:b5f46aadbcb48d7141104db0440feb63cd4e61c8
https://locate.ncbi.nlm.nih.gov/ga4gh/drs/v1/objects/fb1cfb04d3ef99d07c21f9dbf87ccc68/access/b5f46aadbcb48d7141104db0440feb63cd4e61c8
<Response [400]>
<Response [400]>
b'{\n  "detail": "The browser (or proxy) sent a request that this server could not understand.",\n  "status": 400,\n  "title": "Bad Request",\n  "type": "about:blank"\n}\n'
url:None


In the following FASPRunner encapsulates most of the steps above for convenience. It also accesses values from a settings file which contains details about which Seven Bridges instance we want to use as a WES Server, the project within that instance.

In [None]:
from fasp.runner import FASPRunner
from fasp.workflow import sbWESClient

# The program value is used simply to log which script or notebook submitted WES requests via FASPRunner
faspRunner = FASPRunner(program='SRAExample')
settings = faspRunner.settings

# Settings for which instance and project to use are stored in settings, or you may enter values directly
wesClient = sbWESClient(settings['SevenBridgesInstance'], settings['SevenBridgesProject'],
                        '~/.keys/sbcgc_key.json')

Finally the FASPRunner is configured with the three clients we have set up, and is passed the query we tested out above.

In [None]:
faspRunner.configure(searchClient, drsClient, wesClient)

faspRunner.runQuery(query, 'One k query SRA DRS')