# Quickstart for the Seven Bridges Platform
### Overview
To introduce you to the major features of the Seven Bridges Platform, this QuickStart will walk you through the process of RNA sequencing. 

### Prerequisites
 1. You need an account on the Seven Bridges Platform ([sign up](https://www.sbgenomics.com/login) here for free).
 2. You need your _authentication token_ and the API needs to know about it. See <a href="set_AUTH_TOKEN.ipynb">**set_AUTH_TOKEN.ipynb**</a> for details.
 
## Imports and Definitions
We will use a Python class (API) as a wrapper for API calls. All classes and methods defined in <a href="defs/apimethods.py" target="_blank">_defs/apimethods.py_</a>. 

In [None]:
from defs.apimethods import *

## User Input
We need you to pick a project name here

In [None]:
project_name = 'Michael Diamond'       # Name of new project
input_ext = ['fastq',                  # input file types to copy
            'gtf',
            'fasta']    

## Create a project
_Projects_ are the foundation of any analysis on the the platform. We can either work inside a project that has already been created or create a new project. Here we **create a new project**, but first **check that it doesn't already exist** to show both methods. The *project name*, Pilot Fund *billing group*, and a project *description* will be sent in our API call. 

We start by listing all of your projects and your billing groups. Next we create the JSON that will be passed to the API to create the project. The dictionary should include:
* **billing_group** *Billing group* that will be charged for this project
* **description**   (optional) Project description
* **name**   Name of the project, may be *non-unique*<sup>2</sup>
* **type**   Set this to 'v2' always. Other project types may summon a pale horse on the horizon

**After** creating the project, you can re-check the project list and get *additional* details assigned by the CGC, including:

* **id**     _Unique_ identifier for the project, generated based on Project Name
* **href**   Address<sup>3</sup> of the project.
* **flag**   (unimportant) this is set by the object constructor, here always 'longList':False 
* **tags**   List of tags, currently NOT used. 

<sup>2</sup> Please **don't** use non-unique *project names*. However, if you insist, the backend will allow it and assign a unique **id** to you project.

<sup>3</sup> This *address* is for the API, but will not work in a browser.

#### PROTIPS
* The recipe for _creating a project_ is [here](../../Recipes/SBPLAT/projects_makeNew.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/create-a-new-project)

In [None]:
# LIST all projects
existing_projects = API('projects')    

# what are my funding sources?
billing_groups = API('billing/groups')
# pick the first group (arbitrary)
print((billing_groups.name[0] + \
       ' will be charged for computation and storage (if applicable) for your new project'))

# set up the information for your new project
new_project = {
        'billing_group': billing_groups.id[0],
        'description': """A project created by quickstart.ipynb.
                          This also supports **markdown**
                          _Pretty cool_, right?
                       """,
        'name': project_name,
        'type': 'v2'
}
    
if new_project['name'] in existing_projects.name:
    # Your project (might) already exist
    print('A project with the same name already exists, moving right along')
    my_project = API(path=('projects/' \
                          + existing_projects.id[existing_projects.name.index(new_project)])) 
else:
    # CREATE the new project
    my_project = API(method='POST', data=new_project, path='projects')
    # (re)list all projects, to check that new project posted
    existing_projects = API(path='projects')
    # get ADDITONAL new project details 
    my_project = API(path=('projects/' + existing_projects.id[0])) 
    
    print('Your new project %s has been created.' % (my_project.name))
    if hasattr(my_project, 'description'): # need to check if description has been entered
        print('Project description: %s \n' % (my_project.description))

## Copy input files from the _Public Reference_
We will first list all our projects, then list the files within the Public Reference project<sup>4</sup>, and copy a file between the Public Reference to the target project. We've hard-coded a list of file names to copy based on the tutorial.

The critical information for this POST is the **file_id**. Note, you are allow to copy the same file as many times as you like. However, duplicates will be automatically have a prefix attached of (\_1\_, \_2\_, etc) depending on how many times you copy the file.

<sup>4</sup> Remember, files are only accessible **within** a project - here the Public Reference project

#### PROTIPS 
* The recipe for _copying files to a project_ is [here](../../Recipes/SBPLAT/files_copyFromPublicReference.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/copy-a-file)

In [None]:
# [USER INPUT] Set project and file names:
p_name = 'admin/sbg-public-data'
# Files to copy
files_list = ['Sample1_RNASeq_chr20.pe_1.fastq',
             'Sample1_RNASeq_chr20.pe_2.fastq'
]

# LIST all files in the source and target project
my_files_source = API(path='files', \
                      query={'project':p_name, 'limit':100})
my_files_target = API(path='files', \
                      query={'project': my_project.id})

for f_name in files_list:
    if f_name not in my_files_source.name:
        print('File (%s) not found. Where do we go from here?' % (f_name))
        raise KeyboardInterrupt
    else:
        f_index = my_files_source.name.index(f_name)
        if f_name not in my_files_target.name:
            print('File (%s) does not exist in Project (%s); copying now' % \
                  (f_name, my_project.id))

            # COPY the selected file from source to target project
            API(path=('files/' + my_files_source.id[f_index] + '/actions/copy'), \
                method='POST', \
                data={'project': my_project.id,\
                      'name': f_name}) 

            # re-list files in target project to verify the copy worked
            my_files_target = API(path='files', \
                                  query={'project': my_project.id})

            if f_name in my_files_target.name:
                print('Sucessfully copied one file!')
            else:
                print('Something went wrong...')
                
# We are done copying files, let's clean up a little
del my_files_source, my_files_target
my_files = API(path='files', query={'project': my_project.id})

## What is the meaning of this?
Files are great, but without **metadata** they can be hard to manage. So here were are going to add *metadata* to these files. We will add one field that is _needed for the task_ and one to show _generality_.

We've already listed all your files in the last cell. Here we will check the metadata for each one. A **detail**-call for files returns the following *attributes*:
* **created_on** File creation date
* **id**     _Unique_ identifier for the file
* **name**   Name of the file, note this **is** metadata and can be _changed_
* **href**   Address<sup>3</sup> of the file.
* **modified_on** File modification date
* **metadata** Dictionary of metadata
* **origin**  Will link back to a *task* if this is an output file  **@SENAD please verify**
* **project** Project the file is in
* **size** file size in bytes
* **flag**   (unimportant) this is set by the object constructor, here always 'longList':False 

The **metadata** dictionary is both _changeable_ and _expandable_, but initially rather sparse with:
* sample_id
* platform
* paired_end
* library_id

#### PROTIP
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/get-file-details)

In [None]:
for f_id in my_files.id:
    single_file = API(path=('files/' + f_id))
    print('You have selected file %s (size %s [bytes]).' % (single_file.name, single_file.size))
    print('The metadata in this file was: \n %s' % (single_file.metadata))
    metadata = {
        'platform_unit_id': '1',
        'hasFlair':'True'
    }

    API(path=('files/' + f_id + '/metadata'), method='PATCH', data = metadata)

    single_file = API(path=('files/' + f_id))
    print('After the update, file metadata is: \n %s \n' % (single_file.metadata))

## Copy reference files from the _Public Reference_
Equivalent to the operation in **Copy input files from the _Public Reference_**, we are just looking for other file names. 

In [None]:
# [USER INPUT] Set project and file names:
p_name = 'admin/sbg-public-data'
# Files to copy
input_files_list = ['ucsc.hg19.fasta',
                  'human_hg19_genes.gtf'
]

# LIST all files in the source and target project
my_files_source = API(path='files', \
                      query={'project':p_name, 'limit':100})
my_files_target = API(path='files', \
                      query={'project': my_project.id})

for f_name in input_files_list:
    if f_name not in my_files_source.name:
        print('File (%s) not found. Where do we go from here?' % (f_name))
        raise KeyboardInterrupt
    else:
        f_index = my_files_source.name.index(f_name)
        if f_name not in my_files_target.name:
            print('File (%s) does not exist in Project (%s); copying now' % \
                  (f_name, my_project.id))

            # COPY the selected file from source to target project
            API(path=('files/' + my_files_source.id[f_index] + '/actions/copy'), \
                method='POST', \
                data={'project': my_project.id,\
                      'name': f_name}) 

            # re-list files in target project to verify the copy worked
            my_files_target = API(path='files', \
                                  query={'project': my_project.id})

            if f_name in my_files_target.name:
                print('Sucessfully copied one file!')
            else:
                print('Something went wrong...')
                
# We are done copying files, let's clean up a little
del my_files_source, my_files_target
my_files = API(path='files', query={'project': my_project.id})

## Add an App to process our data
We will list the apps within the Public Reference project<sup>5</sup>, and copy an app between the Public Reference to our project. 

The critical information for this POST is the **app_id**. Note, you are **NOT** allowed to copy the same app **and** assign the same name more than once. If you change the name, it is ok.

<sup>5</sup> Unlike files, apps are accesssible **both** *within* a project (here the Public Reference project) **and** by a *visibility* property (which may be set to 'public')

#### PROTIP
* Here we also explicitly set _'limit':100_ inside the _query_. This helps speed up the auto-pagination feature within the object constructor.
* The recipe for _copying apps from Public Reference apps_ is [here](../../Recipes/SBPLAT/apps_copyFromPublicApps.ipynb)
* Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/copy-an-app-secondary-method)

In [None]:
# [USER INPUT] Set app name:
a_name = 'RNA-seq Alignment - STAR'
       
# LIST all Public Apps using VISIBILITY and searching by NAME
my_apps_source = API(path='apps', query={'visibility': 'public', 'limit': 100})
my_apps_target = API(path='apps', query={'project': my_project.id})
if a_name not in my_apps_source.name:
    print('Target app (%s) does not exist in the public repository. Please double-check the spelling' \
          % (TARGET_APP))
else:
    a_index = my_apps_source.name.index(a_name)

# Check if app already exists in the second project
if my_apps_source.name[a_index] in my_apps_target.name:
    print('App already exists in second project, you are good to go')
else:
    print('App (%s) does not exist in Project (%s); copying now' % \
          (my_apps_source.name[a_index], my_project.id))
    
    # COPY the selected app from first to second project
    API(path=('apps/' + my_apps_source.id[a_index] + '/actions/copy'), \
        method='POST', \
        data={'project': my_project.id,\
              'name': my_apps_source.name[a_index]})

    # re-list the apps in secondProject to verify the copy worked
    my_apps_target = API(path='apps', query={'project': my_project.id})
    
    if my_apps_source.name[a_index] in my_apps_target.name:
        print('Sucessfully copied one app!')
    else:
        print('Something went wrong...')
    
# We are done copying files, let's clean up a little
del my_apps_source, my_apps_target
my_apps = API(path='apps', query={'project': my_project.id})

## Build a file processing list
Most likely, we will only have one input file and two reference files in the project. However, if multiple input files were imported, this will create a batch of *single-input-single-output tasks* - one for each file. This code builds the list of files

#### PROTIPS
* We don't have a recipe for this, but you can _follow your bliss_ here. Maybe you want to use to metadata ([get metadata](../../Recipes/SBPLAT/files_detailOne.ipynb)) to decide which files fit in.

In [None]:
# Build .fileProcessing (inputs) and .fileIndex (references) lists [for workflow]
file_proc_list = ['Files to Process']
gtf_ind = None
fasta_ind = None

for ii,f_name in enumerate(my_files.name):
    # this conditional is for 'RNA seq STAR alignment' in Quickstart_API. 
    #  Adapt appropriately for other workflows. Also the order of 
    #  input_ext has been HARD-CODED
    if f_name[-len(input_ext[0]):] == input_ext[0]:
        file_proc_list.append(ii)
    elif f_name[-len(input_ext[1]):] == input_ext[1]:
        gtf_ind = ii
    elif f_name[-len(input_ext[2]):] == input_ext[2]:
        fasta_ind = ii
        
print(my_files.name)
print(file_proc_list)

## Build & Start tasks
Next we will iterate through the File Processing List (FileProcList) to generate one task for each input file. If the Flag *startTasks* is true, the tasks will start running immediately.

In [None]:
for ii,f_ind in enumerate(file_proc_list[1:]):
    if ii == 0:
        f_in = [
            {
                'class': 'File',
                'path': my_files.id[f_ind],
                'name': my_files.name[f_ind]           
            }
        ]
    else:
        f_in.append({
            'class': 'File',
            'path': my_files.id[f_ind],
            'name': my_files.name[f_ind]   
        })      

new_task = {
    'description': 'APIs are awesome',
    'name': ('task created with quickstart_RNAseq.ipynb'),
    'app': (my_apps.id[0]),                                   # App should be at index 0 since we just added it
    'project': my_project.id,
    'inputs': {
        'fastq': f_in,
            'genomeFastaFiles': {                               # .fasta reference file
                'class': 'File',
                'path': my_files.id[fasta_ind],
                'name': my_files.name[fasta_ind]
            },
            # .gtf reference file, !NOTE: this workflow expects a _list_ for this input
            'sjdbGTFfile': [
               {
                'class': 'File',
                'path': my_files.id[gtf_ind],
                'name': my_files.name[gtf_ind]
               }
            ]
    }
}
my_task = API(method='POST', data=new_task, path='tasks/', query={'action':'run'})

## Check task status
These tasks may take a long time to complete, here are two ways to check in on them:
* Wait for email confirmation
* Query the task status

In [None]:
my_task = API(method='GET', path=('tasks/' + my_task.id))
print('Your task is in %s status' % my_task.status)

In [None]:
# [USER INPUT] Set loop time (seconds):
loop_time = 120

flag = {'taskRunning': True}
while flag['taskRunning']:
    print('Pinging SBPLAT for task completion, will download summary files once all tasks completed.')
    my_task = API(method='GET', path=('tasks/' + my_task.id))
    if my_task.status == 'COMPLETED':
        flag['taskRunning'] = False
        print('Task has completed, life is beautiful')
    elif my_task.status  == 'FAILED':  
        flag['taskRunning'] = False
        print('Task (%s) failed, check it out' \
                  % (my_task.id))
    else:
        sleep(loop_time)

## What did this task make?
Here we poll the recently created task. 

In [None]:
my_task = API(method='GET', path=('tasks/' + my_task.id))
print('Your task created %i outputs' % (len(my_task.outputs.keys())))
for f_name in my_task.outputs:
    print(' task output (%s) is the file (%s)' % (f_name, my_task.outputs[f_name]['name']))

We hope this tutorial has been helpful for you. If you have any feedback (especially _positive_), we would cherish it. Please share your thoughts on our [forum](http://docs.sevenbridges.com/discuss).

**Good luck & have fun!**