## Running samtools view via the WES API


In [2]:
#from fasp.workflow import sbWESClient
from fasp.workflow import sbcgcWESClient

cl = sbcgcWESClient('forei/ismb-tutorial', debug=True)

In [None]:
ids  = ['drs://cgc-ga4gh-api.sbgenomics.com/5832fef8507c17de5bfc5806',
'drs://cgc-ga4gh-api.sbgenomics.com/5772b6f8507c175267448709']

params = {
"project": "forei/ismb-tutorial",
"name": "Samtools View test via WES",
"inputs": {

        "in_alignments": {
          "path": ids[0],
          "basename": "G26234.HCC1187_1M.aligned.bam",
          "nameext": ".bam",
          "class": "File",
          "nameroot": "G26234.HCC1187_1M.aligned"
        }
      }
}

## Calling WES from Python

Now we have formulated the body in the way that it can be passed to a client function as follows.

In [26]:
import json
run_id= cl.runGenericWorkflow(
    workflow_url='sbg://forei/ismb-tutorial/samtools-view-1-9-cwl1-0',
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "v1.0",
    verbose=False
)
run_id

'ffd7bb7a-e66b-4fe5-a1fe-144420a34a3d'

In [5]:
run_id = 'ffd7bb7a-e66b-4fe5-a1fe-144420a34a3d'

In [6]:
cl.getTaskStatus(run_id)

'COMPLETE'

## Getting the results - via DRS
Once the run is complete, further steps can use DRS to obtain the file output from the workflow.

In [8]:
runLog = cl.getRunLog(run_id)
runLog['outputs']

{'reads_not_selected_by_filters': None,
 'alignement_count': None,
 'out_alignments': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/62b0a98f4e3edb6b1c23ecf4',
  'basename': 'G20479.HCC1143_1M.aligned.sam',
  'nameext': '.sam',
  'class': 'File',
  'nameroot': 'G20479.HCC1143_1M.aligned'}}

In [10]:
resultsDRSID = runLog['outputs']['out_alignments']['path']
resultsDRSID

'drs://cgc-ga4gh-api.sbgenomics.com/62b0a98f4e3edb6b1c23ecf4'

We'll pass over the question of how one would determine which DRS server that URI needs to be sent to because
* In this case it's fairly obvious - it's the CGC DRS Server
* We want to get something up and working

In [11]:
from fasp.loc import sbcgcDRSClient
drsClient = sbcgcDRSClient('~/.keys/sevenbridges_keys.json', 's3')

### DRS GetObject
Here's how we then get details of the file. Note that here only the id portion of the DRS URI is being passed. It is the job of a metaresolver to look at that URI and to determine where to send the id. As noted, we are passing up on the opportunity to use a metaresolver and putting in the id manually.

In [15]:
out_alignments = '62b0a98f4e3edb6b1c23ecf4'
fileDetails = drsClient.getObject(out_alignments)
fileDetails

{'id': '62b0a98f4e3edb6b1c23ecf4',
 'name': 'G20479.HCC1143_1M.aligned.sam',
 'size': 767901505,
 'checksums': [{'type': 'etag',
   'checksum': '4c32011a61a5e50f6247bf2359fbb824-1'}],
 'self_uri': 'drs://cgc-ga4gh-api.sbgenomics.com/62b0a98f4e3edb6b1c23ecf4',
 'created_time': '2022-06-20T17:08:31Z',
 'updated_time': '2022-06-20T17:08:31Z',
 'mime_type': 'application/json',
 'access_methods': [{'type': 's3',
   'region': 'us-east-1',
   'access_id': 'aws-us-east-1'}]}

In [19]:
url = drsClient.getAccessURL(out_alignments,'s3')

### Warning - the results files are approx 700-800Mb

### Downloading the file
Now we can use the url obtained to download the file. We'll create a small function to encapsulate the download.

In [15]:
import requests
import os
def download(url, file_path):
    with open(os.path.expanduser(file_path), "wb") as file:
        response = requests.get(url)
        file.write(response.content)

In [16]:
fullPath = '~/Downloads/' + fileDetails['name']
download(url, fullPath)