<img src="../fasp/runner/credits/images/nb1.jpg" style="float: right;">

### TCGA and GTEx

This variant of the GTEX TCGA workflow uses FASPRunner which is simply called twice in succession with the relevant Search and WES clients. As the DRS ids returned by the searches are prefixed with CURIEs, DRSMetaResolver can be used as the DRS Client in both cases.



In [1]:
from fasp.search import DiscoverySearchClient, Gen3ManifestClient
from fasp.loc import DRSMetaResolver


from fasp.runner import FASPRunner

faspRunner = FASPRunner(program='GTEX_TCGA_viaFASPRunner.ipynb')
runNote = 'GTEX and TCGA via FASPRunner'

The following sets clients to handle the TCGA data. Note that the DRS ids prefixed with CURIEs (crdc for the Cancer Research Data Commons and anv for Anvil). This indicates which namespace the ids come from and allows the referenced file to be retrieved from the correct DRS server. 

Note that for the data in the Google Cloud we are using GCPLSsamtools a fasp class which accesses Google Cloud's Life Science Pipeline API. The plan is to replace that with the DNA Stack WES server when that is updated. 

In [3]:
# TCGA Query - CRDC
crdcquery = """
    SELECT 'case_'||associated_entities__case_gdc_id case_id, 'crdc:'||file_id drs_id
    FROM search_cloud.cshcodeathon.gdc_rel24_filedata_active 
    where data_format = 'BAM' 
    and project_disease_type = 'Breast Invasive Carcinoma'
    limit 3"""

searchClient = DiscoverySearchClient('https://ga4gh-search-adapter-presto-public.prod.dnastack.com/')
drsClient = DRSMetaResolver()

from fasp.workflow import GCPLSsamtools
settings = faspRunner.settings
gcplocation = 'projects/{}/locations/{}'.format(settings['GCPProject'], settings['GCPPipelineRegion'])
wesClient = GCPLSsamtools(gcplocation, settings['GCPOutputBucket'])

faspRunner.configure(searchClient, drsClient, wesClient)
runList = faspRunner.runQuery(crdcquery, runNote)


Running query

    SELECT 'case_'||associated_entities__case_gdc_id case_id, 'crdc:'||file_id drs_id
    FROM search_cloud.cshcodeathon.gdc_rel24_filedata_active 
    where data_format = 'BAM' 
    and project_disease_type = 'Breast Invasive Carcinoma'
    limit 3
_Retrieving the query_
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
subject=case_1b703058-e596-45bc-80fe-8b98d545c2e2, drsID=crdc:030e5e74-6461-4f05-a399-de8e470bc056
sending id 030e5e74-6461-4f05-a399-de8e470bc056 to: crdcDRSClient
workflow submitted, run:6626509063595843058
____________________________________________________________
subject=case_a947a945-4721-45cc-bc45-13b8ea41c10e, drsID=crdc:04c68898-ddac-4e15-9f9a-5bf278d55e4a
sending id 04c68898-ddac-4e15-9f9a-5bf278d55e4a to: crdcDRSClient
workflow submitted, run:9907653859890778715
____________________________________________________________
subject=case_c462e422-

A Search and WES client are then set up to work with the Anvil data

The Search client here  is a placeholder to search a local file. That file contains file ids downloaded as a manifest from the Gen3 Anvil portal. That list of files in that manifest had already been filtered to relevant samples. The anv: DRS prefix was added in an edited version of the file.

#Todo check what access_ids DRSMetaresolver is using for each run

In [4]:
from fasp.workflow import sbcgcWESClient
searchClient = Gen3ManifestClient('../fasp/data/gtex/gtex-cram-manifest_wCuries.json')
# drsClient No need to reset this. DRS Metasolver will pick the right client
#wesClient = sbWESClient(settings['SevenBridgesInstance'], settings['SevenBridgesProject'],
                    #'~/.keys/sbcgc_key.json')

wesClient = sbcgcWESClient(settings['SevenBridgesProject'])

faspRunner.configure(searchClient, drsClient, wesClient)
runList2 = faspRunner.runQuery(3, runNote)


Running query
3
subject=GTEX-1GTWX-0001-SM-7J3A5.cram, drsID=anv:dg.ANV0/76bb893d-12da-41ca-8828-ff89551d3e15
sending id dg.ANV0/76bb893d-12da-41ca-8828-ff89551d3e15 to: anvilDRSClient
workflow submitted, run:37c14324-4e81-413d-9b86-0eca72c9e24d
____________________________________________________________
subject=GTEX-14PQA-0003-SM-7DLH4.cram, drsID=anv:dg.ANV0/66352de8-4b50-4cae-881d-b76d03df5ac8
sending id dg.ANV0/66352de8-4b50-4cae-881d-b76d03df5ac8 to: anvilDRSClient
workflow submitted, run:92d21ed2-53ba-402a-b077-32cb12ac9beb
____________________________________________________________
subject=GTEX-1B98T-0004-SM-7J38T.cram, drsID=anv:dg.ANV0/ed9ac9ae-02da-4e97-93da-ad86aa77d227
sending id dg.ANV0/ed9ac9ae-02da-4e97-93da-ad86aa77d227 to: anvilDRSClient
workflow submitted, run:7dc06985-3e58-4466-9b00-a799b99759f0
____________________________________________________________
