We will load and authenticate yourself on the CGC API. The authentication token is used to identify you on the CGC platform and requires registration. You can use the user console (top right corner) to get the authentication code by entering the Developer Panel within the CGC account.

In [None]:
import sevenbridges as sbg

In [None]:
api = sbg.Api(url='https://cgc-api.sbgenomics.com/v2', token='MY AUTH TOKEN')

In the next step, we will load In order to transfer data between the user’s S3 bucket and the CGC platform, the API uses volumes that behave similarly to mounted hard drives within a laptop. Volumes can be read only or read-write. In order to create volumes, the user will have to provide the AWS authentication code once the policy generator is applied to the user’s AWS console.


In [None]:
volume_import = api.volumes.create_s3_volume(name='my_import_volume', bucket='bucket-name',access_key_id='XXXXXXXXX',secret_access_key = 'XXXXX',access_mode='RO')


In [None]:
volume_export = api.volumes.create_s3_volume(name='my_export_volume', bucket='bucket-name',access_key_id='XXXXXXXXX',secret_access_key = 'XXXXXXXXXX',access_mode='RW')

Once the volumes are created, the files can be imported from the AWS bucket to the platform. For this, you will have to create a project on the CGC.

In [None]:
new_project_name = 'Protocol 4'                          
billing_groups = api.billing_groups.query()  
print((billing_groups[0].name + \
       ' will be charged for computation and storage (if applicable) for your new project'))
new_project = {
        'billing_group': billing_groups[0].id,
        'name': new_project_name
}

my_project = api.projects.create(name = new_project['name'], \
                  billing_group = new_project['billing_group'])
my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name][0]
    


Now, we can import each file from the AWS bucket to the platform. In this case, we will import the files based on their location within the bucket.


In [None]:
file_list = ['TCRBOA1-T-WEX.bam',
             'TCRBOA1-N-WEX.bam']
# Loop through selected files, start one job for each.
for f_name in file_list:
	import_job = api.imports.submit_import(volume=volume_import,
                                           project=my_project,
                                           location=f_name)
    
	print("File {} is in state {} \n"
          .format(f_name, import_job.reload().state))


The tumor-normal bam file can be analyzed for somatic variants based upon the VarScan2 workflow. The VarScan2 workflow is available within the CGC as a public app. The CGC has more than 240 popularly used tools and workflows currently. We are now ready to copy the public workflow for somatic variant calling using VarScan2 on to this project.

In [None]:
app_name = 'VarScan2 Workflow from BAM'
public_app = [a for a in api.apps.query(visibility='public', limit=100).all() \
                             if a.name == app_name][0]
new_app = public_app.copy(project = my_project.id)
my_app = [a for a in api.apps.query(project = my_project.id, limit=100).all() 
          if a.name == app_name][0]

VarScan2 also requires a reference file in addition to the tumor-normal bam file. We will use the hg19 reference file within the public files as the reference file for this task. The public reference files contain a number of typically used reference files and some example files for user testing purposes.


In [None]:
f_name = 'human_g1k_v37_decoy.fasta'
source_project_id = 'admin/sbg-public-data'
source_file = [f for f in api.files.query(limit = 100, project = source_project_id).all()
               if f.name == f_name][0]
my_new_file = source_file.copy(project = my_project.id,
                               name = source_file.name)

Now, we can start a task with the appropriate inputs. The input ports for these files are Tumor_BAM, Normal_BAM, and input_fasta_file. These input ports have to be set to the corresponding file in this project before the task can be started.


In [None]:
inputs = {}
inputs['input_fasta_file'] = my_new_file
all_files = list(api.files.query(project=my_project.id, limit=100).all())
tumor_bam_file = [curr_file for curr_file in all_files 
                  if curr_file.name =='TCRBOA1-T-WEX.bam'][0]
normal_bam_file = [curr_file for curr_file in all_files 
                   if curr_file.name =='TCRBOA1-N-WEX.bam'][0] 
inputs['Tumor_BAM'] = tumor_bam_file
inputs['Normal_BAM'] = normal_bam_file
task_name = 'VarScan2 with volumes API'

my_task = api.tasks.create(name=task_name, 
                           project=my_project.id, 
                           app=my_app.id, 
                           inputs=inputs,
                           run=True)


The API can then be used to check the status of the task every 30 seconds. As soon as the task completes, the API will come out of the loop.


In [None]:
details = my_task.get_execution_details()
print('Your task is in %s status' % (details.status))
loop_time = 30
flag = {'taskRunning': True}

while flag['taskRunning']:
    details = my_task.get_execution_details()
    print('Task is still running.')
    if details.status == 'COMPLETED':
        flag['taskRunning'] = False
        print('Task has completed, life is beautiful')
    elif details.status  == 'FAILED':  
        print('Task failed, can not continue')
        raise KeyboardInterrupt
    else:
        sleep(loop_time)

Once the task completes, the output files of the task can be exported to the volume. Before doing that, an export volume has to be created.

In [None]:
exports = []
my_task = api.tasks.get(id=my_task.id)
for curr_output in my_task.outputs:
    f = my_task.outputs[curr_output]
    export = api.exports.submit_export(file=f,
                                       volume=volume_export,
                                       location=f.name)
    exports.append(export)

for j in exports:
    print('File {} export stated {}; it is {}'
          .format(j.destination, j.started_on, j.state))
    print('\n')
