# How do I get started with RNA-seq?
### Overview
This tutorial will run an API version of the GUI [quickstart](http://docs.cancergenomicscloud.org/docs/quickstart) that we _assume_ you've **already completed**. Note, this works with **TCGA-controlled** data. We have written this example in Python, but the concepts can be adapted to your preferred programming language. We encourage you to try this analysis yourself. 

The flow of the CGC is from the *user* who owns or is a member of multiple *projects*. Each *project* contains multiple *files* and *apps*. Users can run *tasks* by selecting input *files* and *configuration parameters* for an *app* within thier project. 

<img src="images/CGC_overview-02.png"> 

### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="set_AUTH_TOKEN.ipynb">**set_AUTH_TOKEN.ipynb**</a> for details.
 2. You should have completed the GUI [quickstart](http://docs.cancergenomicscloud.org/docs/quickstart)
 
## Imports and Definitions
We will use a Python class (API) as a wrapper for API calls. All classes and methods defined in <a href="defs/apimethods.py" target="_blank">_defs/apimethods.py_</a>. 

In [None]:
from defs.apimethods import *

## Create a project
Projects are the foundation of any analysis on the CGC. We can either use a project that has already been created, or we can use the API to create one. Here we will create a new project, but first check that it doesn't exist to show both methods. The *project name*, Pilot Fund *billing group*, and a project *description* will be sent in our API call. 

#### PROTIPS
* The recipe for _creating a project_ is [here](../../Recipes/CGC/projects_makeNew.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/create-a-new-project)

In [None]:
# [USER INPUT] Set project name here:
new_project_name = 'Life is Beautiful'

# LIST all projects
existing_projects = API('projects')                            
        
# What are my funding sources?
billing_groups = API('billing/groups')
# Pick the first group (arbitrary)
print((billing_groups.name[0] + \
       ' will be charged for computation and storage (if applicable) for your new project'))

# Set up the information for your new project
new_project = {
        'billing_group': billing_groups.id[0],
        'description': """A project created by the API tutorial (quickstart.ipynb).
                      This also supports **markdown**
                      _Pretty cool_, right?
                   """,        
        'name': new_project_name, 
        'tags': ['tcga']
}
    
if new_project['name'] in existing_projects.name:
    # Your project (might) already exist
    print('A project with the same name already exists, you are good to go')
    p_index = existing_projects.name.index(new_project['name'])
    my_project = API(path=('projects/' + existing_projects.id[p_index])) 

else:
    # CREATE the new project
    my_project = API(method='POST', data=new_project, path='projects')
    # (re)list all projects, to check that new project posted
    existing_projects = API(path='projects')
    # get ADDITONAL new project details 
    my_project = API(path=('projects/' + existing_projects.id[0])) 
    
    print('Your new project %s has been created.' % (my_project.name))
    if hasattr(my_project, 'description'): # need to check if description has been entered
        print('Project description: %s \n' % (my_project.description))

## Add files
Here we will take advantage of the already created Quickstart project from the GUI tutorial. This code will look for our three input files from that project and copy them over. 

#### PROTIPS
* The recipe for _copying files to a project_ is [here](../../Recipes/CGC/files_copyFromMyProject.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/copy-a-file)

In [None]:
# [USER INPUT] Set project name; project (p_) and file (f_) indices here:
p_name = 'Quickstart'                     # source project name
input_ext = ['tar.gz',                    # input file types to copy
            'gtf',
            'fasta']                       

# LIST all files in the source and target project
if p_name in existing_projects.name:
    p_index = existing_projects.name.index(p_name)
else:
    print("""
    Project not found. 
    Please check that p_name matches the name you used in the GUI Quickstart.
    """)
    raise KeyboardInterrupt

my_files_source = API(path='files', \
                      query={'project':existing_projects.id[p_index], \
                            'limit':100})
my_files_target = API(path='files', \
                      query={'project': my_project.id})

# Loop through files in first project, 
#  check if they are needed AND don't exist in your project
for f_index, f_name in enumerate(my_files_source.name):
    flag = {'match': False}
    # find candidate files with the correct file extension
    for f_ext in input_ext:
        if f_name[-len(f_ext):] == f_ext:
            flag['match'] = True
            break
    if flag['match']:
        # find files that do not exist in your project
        if f_name not in my_files_target.name:
            print('File (%s) does not exist in Project (%s); copying now' % \
                  (f_name, my_project.id))

            # COPY the selected file from source to target project
            API(path=('files/' + my_files_source.id[f_index] + '/actions/copy'), \
                method='POST', \
                data={'project': my_project.id,\
                      'name': f_name}) 

            # re-list files in target project to verify the copy worked
            my_files_target = API(path='files', \
                                  query={'project': my_project.id})

            if f_name in my_files_target.name:
                print('Sucessfully copied one file!')
            else:
                print('Something went wrong...')
                
# We are done copying files, let's clean up a little
del my_files_source, my_files_target
my_files = API(path='files', query={'project': my_project.id})

## Add the _RNA-seq STAR_ workflow
There are more than 150 public apps available on the Seven Bridges CGC. Here we query all of them, then copy the target workflow to our project. 

#### PROTIPS
* The recipe for _copying apps from Public Reference apps_ is [here](../../Recipes/CGC/apps_copyFromPublicApps.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/copy-an-app-secondary-method)

In [None]:
# [USER INPUT] Set app name:
a_name = 'RNA-seq Alignment - STAR for TCGA PE tar'
       
# LIST all Public Apps using VISIBILITY and searching by NAME
my_apps_source = API(path='apps', query={'visibility': 'public', 'limit': 100})
my_apps_target = API(path='apps', query={'project': my_project.id})
if a_name not in my_apps_source.name:
    print('Target app (%s) does not exist in the public repository. Please double-check the spelling' \
          % (TARGET_APP))
else:
    a_index = my_apps_source.name.index(a_name)

# Check if app already exists in the second project
if my_apps_source.name[a_index] in my_apps_target.name:
    print('App already exists in second project, you are good to go')
else:
    print('App (%s) does not exist in Project (%s); copying now' % \
          (my_apps_source.name[a_index], my_project.id))
    
    # COPY the selected app from first to second project
    API(path=('apps/' + my_apps_source.id[a_index] + '/actions/copy'), \
        method='POST', \
        data={'project': my_project.id,\
              'name': my_apps_source.name[a_index]})

    # re-list the apps in secondProject to verify the copy worked
    my_apps_target = API(path='apps', query={'project': my_project.id})
    
    if my_apps_source.name[a_index] in my_apps_target.name:
        print('Sucessfully copied one app!')
    else:
        print('Something went wrong...')
    
# We are done copying files, let's clean up a little
del my_apps_source, my_apps_target
my_apps = API(path='apps', query={'project': my_project.id})

## Build a file processing list
Most likely, we will only have one input file and two reference files in the project. However, if multiple input files were imported, this will create a batch of *single-input-single-output tasks* - one for each file. This code builds the list of files

#### PROTIPS
* We don't have a recipe for this, but you can _follow your bliss_ here. Maybe you want to use to metadata ([get metadata](../../Recipes/CGC/files_detailOne.ipynb)) to decide which files fit in.

In [None]:
# Build .fileProcessing (inputs) and .fileIndex (references) lists [for workflow]
file_proc_list = ['Files to Process']
gtf_ind = None
fasta_ind = None

for ii,f_name in enumerate(my_files.name):
    # this conditional is for 'RNA seq STAR alignment' in Quickstart_API. 
    #  Adapt appropriately for other workflows. Also the order of 
    #  input_ext has been HARD-CODED
    if f_name[-len(input_ext[0]):] == input_ext[0]:
        file_proc_list.append(ii)
    elif f_name[-len(input_ext[1]):] == input_ext[1]:
        gtf_ind = ii
    elif f_name[-len(input_ext[2]):] == input_ext[2]:
        fasta_ind = ii

## Build & Start tasks
Next we will iterate through the File Processing List (file_proc_list) to generate one task for each input file (note the gtf and fasta files are _references_ and will be re-used for each task). Tasks will start running immediately.

#### PROTIPS
* The closest recipe for _creating and starting tasks_ is [here](../../Recipes/CGC/tasks_create.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/create-a-new-task)

In [None]:
my_task_list = [None]
for ii,f_ind in enumerate(file_proc_list[1:]):                  # Start at 1 because file_proc_list[0] is header
    new_task = {'description': 'APIs are awesome, look what they can make',
        'name': ('Task (#%i) created with quickstart.ipynb_' %  (ii)),
        'app': (my_apps.id[0]),                                  # ASSUMES only single workflow in project
        'project': my_project.id,
        'inputs': {
            'genomeFastaFiles': {                               # .fasta reference file
                'class': 'File',
                'path': my_files.id[fasta_ind],
                'name': my_files.name[fasta_ind]
            },
            'input_archive_file': {                             # File Processing List
                'class': 'File',
                'path': my_files.id[f_ind],
                'name': my_files.name[f_ind]
            },
            # .gtf reference file, !NOTE: this workflow expects a _list_ for this input
            'sjdbGTFfile': [
               {
                'class': 'File',
                'path': my_files.id[gtf_ind],
                'name': my_files.name[gtf_ind]
               }
            ]
        }
    }
    # Create and RUN tasks
    my_task = API(method='POST', data=new_task, path='tasks/', query = {'action': 'run'})
    my_task_list.append(my_task.id)
    # ALTERNATIVE: make a DRAFT task and start it later
#     myTask = API(method='POST', data=new_task, path='tasks/')        # task created in DRAFT state
#     myTask = API(method='POST', path=('tasks/' + myTask.id + '/actions/run'))
my_task_list.pop(0)

print("""
%i tasks have been created. Enjoy a break, treat yourself to a coffee, 
and come back to us once you've gotten an email that tasks are done.
(alternatively, use the task monitoring cells below)""" % (ii+1))

## Check task completion
These tasks may take a long time to complete, here are two ways to check in on them:
* Wait for email confirmation <sup>1</sup>
* Ping the task to see it's _status_. Here we use a 10 min interval, adjust it appropriately for longer or shorter workflows

<sup>1</sup> Emails will arrive regardless of whether the task was started by GUI or API.

#### PROTIPS
* The closest recipe for _monitoring tasks_ is [here](../../Recipes/CGC/tasks_monitorAndGetResults.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/perform-an-action-on-a-specific-task)

In [None]:
# [USER INPUT] Set loop time (seconds):
loop_time = 600

for t_id in my_task_list:
    # Check on one task at a time, 
    #  if ANY running, we are not done (no sense to query others)
    flag = {'taskRunning': True}
    while flag['taskRunning']:
        task = api_call(('tasks/' + t_id))
        if task['status'] == 'COMPLETED':
            flag['taskRunning'] = False
            print('Task has completed, life is beautiful')
        elif (task['status'] == 'FAILED') or (task['status'] == 'ABORTED'):
            print('Task (%s) failed, check it out' \
                  % (t_id))
            flag['taskRunning'] = False
        else:
            sleep(loop_time) 

## Check task outputs
Here we poll only the last completed task (adapt as needed)

In [None]:
for ii, t_id in enumerate(my_task_list):
    my_task = API(method='GET', path=('tasks/' + my_task.id))
    print('Your task (#%i) created %i outputs' % (str(ii), len(my_task.outputs.keys())))
    for f_name in my_task.outputs:
        print(' task output (%s) is the file (%s)' % (f_name, my_task.outputs[f_name]['name']))

### (optional) Download output files
You already have all of these files **saved in your project** (and an _email_ for each completed task). You may also download some files

#### PROTIPS
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/get-download-url-for-a-file)

In [None]:
# [USER INPUT] Set file extension(s) to download here:
output_ext = 'bam' 

dl_list = ["links to file downloads"]

my_files = API(path='files', query={'project': my_project.id})
for ii, f_name in enumerate(my_files.name):
    if (f_name[-len(output_ext):] == output_ext):
        dl_list.append(api_call(path=('files/' + my_files.id[ii] + '/download_info'))['url'])
        
download_files(dl_list)

We hope this tutorial has been helpful for you. If you have any feedback (especially _positive_), we would cherish it. Please share your thoughts on our [forum](http://docs.cancergenomicscloud.org/discuss).

**Good luck & have fun!**