## Running samtools view via the WES API


In [1]:
#from fasp.workflow import sbWESClient
from fasp.workflow import sbcgcWESClient
project_name = 'forei/ismb-tutorial'
cl = sbcgcWESClient(project_name, debug=True)

In [2]:
ids  = ['drs://cgc-ga4gh-api.sbgenomics.com/5832fef8507c17de5bfc5806',
'drs://cgc-ga4gh-api.sbgenomics.com/5772b6f8507c175267448709']

params = {
"project": project_name,
"name": "Samtools View test via WES - header only",
"inputs": {
    "output_header_only": True,
    "include_header": True,
        "in_alignments": {
          "path": ids[0],
          "basename": "G26234.HCC1187_1M.aligned.bam",
          "nameext": ".bam",
          "class": "File",
          "nameroot": "G26234.HCC1187_1M.aligned"
        }
      }
}

## Calling WES from Python

Now we have formulated the body in the way that it can be passed to a client function as follows.

In [4]:
import json

sam_view_app = 'sbg://yasasvinip/test-1/samtools-view-1-9-cwl1-0'
#sam_view_app = 'sbg://forei/ismb-tutorial/samtools-view-1-9-cwl1-0'

run_id= cl.runGenericWorkflow(
    workflow_url=yash_app,
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "v1.0",
    verbose=False
)
run_id

'889b379f-a3f0-4610-aa94-d7f4f98105ee'

In [9]:
cl.getTaskStatus(run_id)

'COMPLETE'

## Getting the results - via DRS
Once the run is complete, further steps can use DRS to obtain the file output from the workflow.

In [10]:
runLog = cl.getRunLog(run_id)
runLog['outputs']

{'reads_not_selected_by_filters': None,
 'alignement_count': None,
 'out_alignments': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/62b0e74bf08fea4770471080',
  'basename': '_2_G20479.HCC1143_1M.aligned.header.sam',
  'nameext': '.sam',
  'class': 'File',
  'nameroot': '_2_G20479.HCC1143_1M.aligned.header'}}

In [11]:
resultsDRSID = runLog['outputs']['out_alignments']['path']
resultsDRSID

'drs://cgc-ga4gh-api.sbgenomics.com/62b0e74bf08fea4770471080'

We'll pass over the question of how one would determine which DRS server that URI needs to be sent to because
* In this case it's fairly obvious - it's the CGC DRS Server
* We want to get something up and working

In [12]:
from fasp.loc import sbcgcDRSClient
drsClient = sbcgcDRSClient('~/.keys/sbcgc_key.json', 's3')

### DRS GetObject
Here's how we then get details of the file. Note that here only the id portion of the DRS URI is being passed. It is the job of a metaresolver to look at that URI and to determine where to send the id. As noted, we are passing up on the opportunity to use a metaresolver and extracting the bare id as follows

In [13]:
# get the id part of the URI
out_alignments = resultsDRSID.split('/')[-1]
print(f"Getting {out_alignments} from DRS Client")
fileDetails = drsClient.getObject(out_alignments)
fileDetails

Getting 62b0e74bf08fea4770471080 from DRS Client


{'id': '62b0e74bf08fea4770471080',
 'name': '_2_G20479.HCC1143_1M.aligned.header.sam',
 'size': 3598,
 'checksums': [{'type': 'etag',
   'checksum': '159acae2ce81efee40f0f89ee58f2f95-1'}],
 'self_uri': 'drs://cgc-ga4gh-api.sbgenomics.com/62b0e74bf08fea4770471080',
 'created_time': '2022-06-20T21:31:55Z',
 'updated_time': '2022-06-20T21:31:55Z',
 'mime_type': 'application/json',
 'access_methods': [{'type': 's3',
   'region': 'us-east-1',
   'access_id': 'aws-us-east-1'}]}

In [14]:
url = drsClient.getAccessURL(out_alignments,'s3')

### Warning - the results files are approx 700-800Mb

### Downloading the file
Now we can use the url obtained to download the file. We'll create a small function to encapsulate the download.

In [15]:
import requests
import os
def download(url, file_path):
    with open(os.path.expanduser(file_path), "wb") as file:
        response = requests.get(url)
        file.write(response.content)

In [16]:
fullPath = '~/Downloads/' + fileDetails['name']
download(url, fullPath)