# How can I make, validate, and run many tasks with tumor-normal bam file matching?

In this tutorial, you will learn how to match bam files that belong to the same primary tumor and solid tissue normal samples that belong to the same TCGA case ID and run tasks with the tumor-normal matched files. To learn how to import these files using the Datasets API, please use the **Tumor Tissue Normal Matched TCGA.ipynb** tutorial.

## Objective
This tutorial introduces you to performing an analysis where you match the tumor-normal files for same patient using the API with the sevenbridges-python bindings library.

## Procedure
We are going to assume that you already contain a project with the appropriate bam files.

 1. Find an existing project.
 2. Copy app from public apps to the project.
 3. Print an app's input ports.
 4. Copy reference file from public files.
 5. Create tasks with tumor-normal matched bam files.

## Prerequisites
You need your authentication token and the API needs to know about it. See Setup_API_environment.ipynb for details.
You have imported the tumor-normal TCGA bam files within the platform to an existing project. The **Tumor Tissue Normal Matched TCGA.ipynb** tutorial provides a method to perform this input.

## Imports
We import the Api class from the official sevenbridges-python bindings below.


In [None]:
import sevenbridges as sbg
from sevenbridges.errors import Conflict

## Initialize the object
The Api object needs to know your auth_token and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see Setup_API_environment.ipynb

In [None]:
# [USER INPUT] Specify platform {cgc, sbg}
prof = 'default'


config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## 1) Find an existing project

Find the project within your project space using the project ID (project ID can be found in the URL). 

In [None]:
my_project = api.projects.get('username/project-name')

## 2) Find an app within the public tools and copy it to the project

We will find the VarScan2 Workflow from BAM app within the public tools and copy it to the project.

In [None]:
app_name = "VarScan2 Workflow from BAM"

public_app = [a for a in api.apps.query(visibility='public', limit=100).all() if a.name == app_name]

# Double-check that source app exists among the public apps
if not public_app:
    print('Target app (%s) not found, check spelling' % app_name)
    raise KeyboardInterrupt
else:
    public_app = public_app[0]

In [None]:
try:
    new_app = public_app.copy(project=my_project)
    print('App {} copied to Project {}.'.format(public_app.name, my_project.name))
except Conflict:
    new_app = [a for a in api.apps.query(project=my_project) if a.name == public_app.name][0]
    print('App already exists in the destination project, reusing existing app.')
        
# re-list apps in target project to verify the copy worked
my_app_names = [a.name for a in
                api.apps.query(project=my_project.id, limit=100).all()]

if app_name in my_app_names:
    print('Sucessfully copied or reused one app!')
else:
    print('Something went wrong...')

## 3) Getting details of the inputs for the app.

In this step, we will identify the input ports for the Varscan2 workflow so that we can get the appropriate ports set with input files for each task.

In [None]:
idx = 1
print("This app has {} input ports".format(len(new_app.raw["inputs"])))
for inp in  new_app.raw["inputs"]:
    print("Input port {} with input id {}".format(idx, inp["id"].lstrip('#')))
    idx += 1

## 4) Copying the reference fasta file for the tasks.

In this step, we will copy the ucsc.hg19.fasta file from the public reference files to the project. We then use this copied file as the reference file (input_fasta_file input port) for each task.

In [None]:
ref_file_name = 'ucsc.hg19.fasta'
source_project_id = 'admin/sbg-public-data'  
source_file = list(api.files.query(limit=100, project = source_project_id, names=[ref_file_name]))[0]
copied_file = source_file.copy(project=my_project.id)

## 5) Create tasks for each tumor-tissue normal matched file.

In this step, we will identify all the bam files that Primary tumor samples and Solid Tissue Normal samples that belong to the same patient (case ID in TCGA). Then we use these matched bam files as inputs to the Tumor_BAM and Normal_BAM input ports for VarScan2 and start multiple VarScan2 tasks.

In [None]:
all_files = list(api.files.query(project=my_project.id, limit=100).all())
tumor_bam_files = [curr_file for curr_file in all_files if curr_file.name.endswith(".bam") and curr_file.metadata["sample_type"] == "Primary Tumor"]
normal_bam_files = [curr_file for curr_file in all_files if curr_file.name.endswith(".bam") and curr_file.metadata["sample_type"] == "Solid Tissue Normal"]

In [None]:
inputs = {}
inputs["input_fasta_file"] = copied_file
all_tasks = []
for curr_tumor_file in tumor_bam_files:
    matched_normal_files = [curr_file for curr_file in normal_bam_files if curr_file.metadata["case_id"] == curr_tumor_file.metadata["case_id"]]
    for curr_matched_normal_file in matched_normal_files:
        inputs["Tumor_BAM"] = curr_tumor_file
        inputs["Normal_BAM"] = curr_matched_normal_file
        task_name = "VarScan2 with " + curr_tumor_file.name + " and " + curr_matched_normal_file.name
        my_task = api.tasks.create(name=task_name, project=my_project.id, 
                                   app=new_app.id, inputs=inputs, run=False)
        #my_task.run()
        all_tasks.append(my_task)