## Running a Workflow on a Seven Bridges WES server
I'm setting out to use the SevenBridges WES client to run samtools stats on a cram file. The instructions described here https://docs.cancergenomicscloud.org/docs/run-a-workflow are the starting point for how to do this.


In [2]:
#from fasp.workflow import sbWESClient
from fasp.workflow import sbcgcWESClient

cl = sbcgcWESClient('forei/CNest', debug=True)

### CNest Step 1 via WES
Reverse engineering what we can see above we can run CNest via WES as follows:

In [5]:
params = {
    "project": "forei/cnest",
        "inputs": {
        "bed": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/626bfb1bf26c93517368984e",
          "name": "hg38.1kb.baits.bed",
          "class": "File"
        },
        "project": "test_proj"
      }
 
    }


Now we have formulated the body in the way that it can be passed to a client function as follows.

In [6]:
import json
run_id= cl.runGenericWorkflow(
    workflow_url='sbg://forei/cnest/cnest-step1',
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "v1.1",
    verbose=False
)
run_id

'1c344a90-0e97-4309-baeb-1b367a4098af'

In [11]:
cl.getTaskStatus(run_id)

'QUEUED'

### Running Step 2
Get the details of the manual run of step 2

### Run CNest Step 2 via WES

Set up the paramters as above

In [4]:
params = {
    "project": "forei/cnest",
    "inputs": {
        "index_txt": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ac",
          "name": "index.txt",
          "class": "File"
        },
        "index_bed": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ae",
          "name": "index.bed",
          "class": "File"
        },
        "project": "test_proj",
        "index_tab": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92af",
          "name": "index_tab.txt",
          "class": "File"
        },
        "sample": "test_bam",
        "bam": {
          "path": "drs://cgc-ga4gh-api.sbgenomics.com/6272e873d125a52cff9b0247",
          "name": "TCGA-3X-AAVA-01A-11R-A41D-13_mirna_gdc_realn.bam",
          "secondaryFiles": [
            {
              "path": "drs://cgc-ga4gh-api.sbgenomics.com/6272ec5df26c93517378730b",
              "name": "TCGA-3X-AAVA-01A-11R-A41D-13_mirna_gdc_realn.bam.bai",
              "class": "File"
            }
          ],
          "class": "File"
        }
      }
    }



In [10]:
#import json
run_id= cl.runGenericWorkflow(
    workflow_url='sbg://forei/cnest/cnest-step2/14',
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "sbg:draft-2",
    verbose=False
)
run_id

'1bb836cb-7905-476d-8dbf-278a8fbf6394'

### Can we access the BioDataCatalyst file directly via DRS?

In [5]:
params['inputs']['bam'] = {
          "path": "drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671d1",
          "name": "HG00445.final.cram",
          "secondaryFiles": [
            {
              "path": "drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671cf",
              "name": "HG00445.final.cram.crai",
              "class": "File"
            }
          ],
          "class": "File"
        }

In [14]:
params

{'project': 'forei/cnest',
 'inputs': {'index_txt': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ac',
   'name': 'index.txt',
   'class': 'File'},
  'index_bed': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ae',
   'name': 'index.bed',
   'class': 'File'},
  'project': 'test_proj',
  'index_tab': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92af',
   'name': 'index_tab.txt',
   'class': 'File'},
  'sample': 'test_bam',
  'bam': {'path': 'drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671d1',
   'name': 'HG00445.final.cram',
   'secondaryFiles': [{'path': 'drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671cf',
     'name': 'HG00445.final.cram.crai',
     'class': 'File'}],
   'class': 'File'}}}

In [15]:
run_id= cl.runGenericWorkflow(
    workflow_url='sbg://forei/cnest/cnest-step2/14',
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "sbg:draft-2",
    verbose=False
)
run_id

Full response status:
<Response [400]>
Full response content:
b'{"msg":"Following file references can not be resolved: drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671d1","status_code":400}'
Full response headers:
{'Server': 'nginx', 'Date': 'Wed, 18 May 2022 12:54:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '148', 'Connection': 'keep-alive', 'X-Frame-Options': 'DENY', 'X-Xss-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Download-Options': 'noopen', 'Content-Security-Policy': "frame-ancestors 'none'; report-uri https://sbgenomics.report-uri.com/r/d/csp/enforce", 'Strict-Transport-Security': 'max-age=63072000'}


RuntimeError: WES run submission failed. Response status:400

So we cannot pass a BDC drs id to a WES task run on CGC. 

Validated that CGC is capable of "importing" the file using the same DRS id as above. It is validating that I have access to the file (though note this is a public file) It's just passing at as WES that doesn't work.

### Running via a signed URL obtained from DRS

In [None]:
drs://ga4gh-api.sb.biodatacatalyst.nhlbi.nih.gov/626c079e645ccb7324c671d1

Can we run the above with a bam file from a URL obtained via DRS?

We'll try with the Gen3 id of the same file as above



In [21]:
params

{'project': 'forei/cnest',
 'inputs': {'index_txt': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ac',
   'name': 'index.txt',
   'class': 'File'},
  'index_bed': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92ae',
   'name': 'index.bed',
   'class': 'File'},
  'project': 'test_proj',
  'index_tab': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/627653faf26c9351737f92af',
   'name': 'index_tab.txt',
   'class': 'File'},
  'sample': 'test_bam2',
  'bam': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/6272e873d125a52cff9b0247',
   'name': 'TCGA-3X-AAVA-01A-11R-A41D-13_mirna_gdc_realn.bam',
   'secondaryFiles': [{'path': 'drs://cgc-ga4gh-api.sbgenomics.com/6272ec5df26c93517378730b',
     'name': 'TCGA-3X-AAVA-01A-11R-A41D-13_mirna_gdc_realn.bam.bai',
     'class': 'File'}],
   'class': 'File'}},
 'bam': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/6272e873d125a52cff9b0247',
  'name': 'TCGA-3X-AAVA-01A-11R-A41D-13_mirna_gdc_realn.bam',
  'secondaryFiles

#### Instantiate DRS client

In [24]:
from fasp.loc import bdcDRSClient
drs_client = bdcDRSClient("~/.keys/bdc_credentials.json", 's3')

#### Show the details for the file using the DRS id on BiodataCatalyst

In [21]:
drs_id = '3e691d82-7e39-43d3-8380-38e926396827'
drs_client.getObject(drs_id)

{'access_methods': [{'access_id': 'gs',
   'access_url': {'url': 'gs://nih-nhlbi-biodata-catalyst-1000-genomes/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG00445/analysis/HG00445.final.cram'},
   'region': '',
   'type': 'gs'},
  {'access_id': 's3',
   'access_url': {'url': 's3://nih-nhlbi-biodata-catalyst-1000-genomes-high-coverage/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG00445/analysis/HG00445.final.cram'},
   'region': '',
   'type': 's3'}],
 'aliases': [],
 'checksums': [{'checksum': 'c6ff85a8086fbfab1ad52d6c5bf10841',
   'type': 'md5'}],
 'created_time': '2020-01-15T16:25:14.063426',
 'description': None,
 'form': 'object',
 'id': 'dg.4503/3e691d82-7e39-43d3-8380-38e926396827',
 'mime_type': 'application/json',
 'name': '',
 'self_uri': 'drs://dg.4503:3e691d82-7e39-43d3-8380-38e926396827',
 'size': 18276728210,
 'updated_time': '2020-01-15T16:25:14.063432',
 'version': 'f54a954f'}

#### Get a url to pass to the compute

In [22]:
access_url = drs_client.getAccessURL(drs_id)

'https://nih-nhlbi-biodata-catalyst-1000-genomes-high-coverage.s3.amazonaws.com/CCDG_13607/Project_CCDG_13607_B01_GRM_WGS.cram.2019-02-06/Sample_HG00445/analysis/HG00445.final.cram?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAYXPGNV6OBS25EQMY%2F20220602%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220602T114938Z&X-Amz-Expires=3600&X-Amz-Security-Token=FwoGZXIvYXdzEHUaDNq1pApNJEB8N0013CLNAWQeEgljR8%2F3Uubp6exZRUAFo%2BkcL5bGPjRh8IdX%2B84ey76zOAQi1i7IpfrJpt5usd0U75W4HLXJEQQ8VrhskJ42DLubKi2tUHHYW%2Fgk6F45cqBTPXuHX0GYNoydfdhlshtscI7BNg%2FTFxoVDDMFFgt8F0HM0ZTj2zR7abRN8cibCDlKmAz92OIvthIZEtVzR%2Buzxr81c4us%2FGT0rS2I4DwMD2Ll%2BFj169k1q818REt8sv5eqph9DJjS0aQssOckuMuc16aj%2B7JBEy0%2Fm7co0sfilAYyLd3LragMU2%2FvMR3tixA2blMS%2F1M0OzoWFHSb4bFzyeepVuAMbp%2FB6YvgOzORgA%3D%3D&X-Amz-SignedHeaders=host&user_id=968&username=forei&X-Amz-Signature=537f1cb7cddc036df75e03f2443fe18efca592123eae3cde17e3ae2e56ee12ac'

In [None]:
### Set up to run

In [20]:
params['inputs']['bam'] = {
          "path": access_url,
          "name": "HG00445.final.cram",
          "secondaryFiles": [
            {
              "path": "to be found",
              "name": "HG00445.final.cram.crai",
              "class": "File"
            }
          ],
          "class": "File"
        }
