This variant of the GTEX TCGA workflow uses FASPRunner which is simply called twice in succession with the relevant Search and WES clients. As the DRS ids returned by the searches are prefixed with CURIEs, DRSMetaResolver can be used as the DRS Client in both cases.

In [7]:
from fasp.search import BigQuerySearchClient, Gen3ManifestClient
from fasp.loc import DRSMetaResolver
from fasp.workflow import GCPLSsamtools, sbWESClient

from fasp.runner import FASPRunner

faspRunner = FASPRunner(program='GTEX_TCGA_viaFASPRunner.ipynb')
runNote = 'GTEX and TCGA via FASPRunner'

Running <ipython-input-7-bf605e16a3f9>


The following sets clients to handle the TCGA data. Note that the DRS ids prefixed with CURIEs (crdc for the Cancer Research Data Commons and anv for Anvil). This indicates which namespace the ids come from and allows the referenced file to be retrieved from the correct DRS server. 

Note that for the data in the Google Cloud we are using GCPLSsamtools a fasp class which accesses Google Cloud's Life Science Pipeline API. The plan is to replace that with the DNA Stack WES server when that is updated. 

In [8]:
# TCGA Query - CRDC
crdcquery = """
    SELECT 'case_'||associated_entities__case_gdc_id , 'crdc:'||file_id
    FROM `isb-cgc.GDC_metadata.rel24_fileData_active` 
    where data_format = 'BAM' 
    and project_disease_type = 'Breast Invasive Carcinoma'
    limit 3"""

searchClient = BigQuerySearchClient()
drsClient = DRSMetaResolver()

settings = faspRunner.settings
gcplocation = 'projects/{}/locations/{}'.format(settings['GCPProject'], settings['GCPPipelineRegion'])
wesClient = GCPLSsamtools(gcplocation, settings['GCPOutputBucket'])

faspRunner.configure(searchClient, drsClient, wesClient)
runList = faspRunner.runQuery(crdcquery, runNote)


Running query

    SELECT 'case_'||associated_entities__case_gdc_id , 'crdc:'||file_id
    FROM `isb-cgc.GDC_metadata.rel24_fileData_active` 
    where data_format = 'BAM' 
    and project_disease_type = 'Breast Invasive Carcinoma'
    limit 3
subject=case_b9c6c069-2a2b-4cc7-a398-dc45c132d979, drsID=crdc:03b2f263-0567-4524-9116-4f7aa4f5f645
sending id 03b2f263-0567-4524-9116-4f7aa4f5f645 to: crdcDRSClient
workflow submitted, run:3295866049710983754
subject=case_eb93df3d-9e25-4e4c-9a5d-350f296e5acf, drsID=crdc:04bd94e2-78d1-4dac-b365-13833eccf467
sending id 04bd94e2-78d1-4dac-b365-13833eccf467 to: crdcDRSClient
workflow submitted, run:6541266407094919341
subject=case_c462e422-eb8d-4daf-9897-2a9c6cbd783a, drsID=crdc:00589653-5840-4c11-8572-5aa7d00a73f8
sending id 00589653-5840-4c11-8572-5aa7d00a73f8 to: crdcDRSClient
workflow submitted, run:6709262275696804824


A Search and WES client are then set up to work with the Anvil data

The Search client here  is a placeholder to search a local file. That file contains file ids downloaded as a manifest from the Gen3 Anvil portal. That list of files in that manifest had already been filtered to relevant samples. The anv: DRS prefix was added in an edited version of the file.

#Todo check what access_ids DRSMetaresolver is using for each run

In [9]:
searchClient = Gen3ManifestClient('../fasp/data/gtex/gtex-cram-manifest_wCuries.json')
# drsClient No need to reset this. DRS Metasolver will pick the right client
wesClient = sbWESClient(settings['SevenBridgesInstance'], settings['SevenBridgesProject'],
                    '~/.keys/sbcgc_key.json')
faspRunner.configure(searchClient, drsClient, wesClient)
runList2 = faspRunner.runQuery(3, runNote)


Running query
3
subject=GTEX-1GTWX-0001-SM-7J3A5.cram, drsID=anv:dg.ANV0/76bb893d-12da-41ca-8828-ff89551d3e15
sending id dg.ANV0/76bb893d-12da-41ca-8828-ff89551d3e15 to: anvilDRSClient
workflow submitted, run:9dd9e09b-4c6d-4507-a76e-0d76e219b814
subject=GTEX-14PQA-0003-SM-7DLH4.cram, drsID=anv:dg.ANV0/66352de8-4b50-4cae-881d-b76d03df5ac8
sending id dg.ANV0/66352de8-4b50-4cae-881d-b76d03df5ac8 to: anvilDRSClient
workflow submitted, run:2b3d485f-7c52-4b89-a0e0-ac4e64fe1fa9
subject=GTEX-1B98T-0004-SM-7J38T.cram, drsID=anv:dg.ANV0/ed9ac9ae-02da-4e97-93da-ad86aa77d227
sending id dg.ANV0/ed9ac9ae-02da-4e97-93da-ad86aa77d227 to: anvilDRSClient
workflow submitted, run:dd6b1efe-17cc-4a15-8aa0-13aea875243a
