# Quickstart for the Seven Bridges Platform
### Overview
To introduce you to the major features of the Seven Bridges Platform, this QuickStart walks you through the process of a [Whole Exome Sequencing Analysis](https://igor.sbgenomics.com/public/apps#workflow/sevenbridges/public-apps/whole-exome-sequencing-gatk-2-3-9-lite). This API tutorial mirrors the [tutorial for the visual interface](http://docs.sevenbridges.com/docs/quickstart).

This tutorial makes use of the [sevenbridges-python bindings](http://sevenbridges-python.readthedocs.io/en/latest/installation/).

### Prerequisites
 1. You need an account on one of the Seven Bridges Platforms. 
 2. You need your **authentication token**, and you need to pass this credential to the API. See <a href="Setup_API_environment.ipynb">**the tutorial on setting up the API environment**</a> for details.
 
## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg
from sevenbridges.errors import Conflict
import time

## Initialize the object
The `Api` object needs to know your **auth\_token** and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>.

In [None]:
# [USER INPUT] specify platform {cgc, sbg}
prof = 'sbpla'

config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## Create a project
_Projects_ are the foundation of any analysis on the the Platform. We can either work inside a project that has already been created or create a new project. Here we **create a new project**, but first **check that it doesn't already exist** to show both methods. The project name, billing group (we will use our free credits in the **Pilot Fund** billing group), and a project description will be sent in our API call. 

We start by listing all of your projects and your billing groups. Next, we create the JSON that will be passed to the API to create the project. The dictionary should include:
* **billing_group** *Billing group* that will be charged for this project
* **description**   (optional) Project description
* **name**   Name of the project, may be *non-unique*<sup>1</sup>

**After** creating the project, you can re-check the project list and get *additional* details assigned by the Platform, including (but not limited to):

* **id**     _Unique_ identifier for the project, generated based on Project Name
* **href**   Address<sup>2</sup> of the project.
* **name**   Project name.

<sup>1</sup> Please **don't** use non-unique *project names*. However, if you insist, the backend will allow it and assign a unique `id` to you project. This `id` is known as a [short name](http://docs.sevenbridges.com/docs/the-api#section-project-short-names)

<sup>2</sup> This is the address where, by using API you can get this resource.

#### PROTIPS
 * A detailed _recipe_ for creating projects is [here](../../Recipes/SBPLAT/projects_makeNew.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/create-a-new-project)

In [1]:
# [USER INPUT] Set project name here:
new_project_name = 'Azzurri'                          
      
# check if this project already exists. LIST all projects and check for name match
# Note that you can have more than one project with the same name. It is best practice to find things by ID.
my_project_exists = [p for p in api.projects.query(limit=100).all() 
              if p.name == new_project_name]      
              
if my_project_exists:    # exploit fact that empty list is False
    # If a project with the same name already exists, reuse the existing one
    my_project = my_project_exists[0]

    print('Project {} will be reused for next steps.'.format(my_project.id))
    if hasattr(my_project, 'description'): # need to check if description has been entered
        print('Project description: {} \n'.format(my_project.description)) 
    
else: 
    # What are my funding sources?
    billing_groups = api.billing_groups.query()  

    # Pick the first group (arbitrary)
    print((billing_groups[0].name +
           ' will be charged for computation and storage (if applicable) for your new project'))

    # Set up the information for your new project
    new_project = {
            'billing_group': billing_groups[0].id,
            'description': """A project created by the API recipe (projects_makeNew.ipynb).
                          This also supports **markdown**
                          _Pretty cool_, right?
                       """,
            'name': new_project_name
    }
    
    # CREATE the new project
    my_project = api.projects.create(
        name=new_project['name'], 
        billing_group=new_project['billing_group'],
        description=new_project['description'],
    )

    print('Your new project {} has been created.'.format(my_project.name))
    if hasattr(my_project, 'description'): # need to check if description has been entered
        print('Project description: {} \n'.format(my_project.description)) 

NameError: name 'api' is not defined

## Copy input files from the _Public Reference Files_ repository
[Public Reference Files](http://docs.sevenbridges.com/docs/file-repositories) is a repository of files maintained by the Seven Bridges Platform. It contains the latest and most frequently used reference genomes and annotation files, so you won't have to upload your own reference files every time you run a task. You can access this repository via the API as you would a project.

Below, we will first list all our projects, then we'll list the files within the Public Reference Files repository<sup>3</sup>, and copy a file from Public Reference Files to your target project. We've hard-coded a list of file names to copy based on the tutorial.

The critical information for this POST is the **file_id**. Note, you are allow to copy the same file as many times as you like. However, duplicates will be automatically have a prefix attached of (\_1\_, \_2\_, etc) depending on how many times you copy the file.

#### PROTIPS 
 * A detailed _recipe_ for copying Public Files is [here](../../Recipes/SBPLAT/files_copyFromPublicReference.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/copy-a-file)

<sup>3</sup> Remember, files are only accessible **within** a project - here the Public Reference Files project

In [None]:
# [USER INPUT] Set files to copy here:  
files_list = ['example_human_Illumina.pe_1.fastq',
              'example_human_Illumina.pe_2.fastq']

# Public reference and test files on the platform are available in the same project 'admin/sbg-public-data'
source_project_id = 'admin/sbg-public-data'  

# LIST all file names in your project and file objects from source project
# Note that listing files in a project does not list subfolders
my_file_names = [f.name for f in 
    api.files.query(limit=100, project=my_project).all()]

source_files = api.files.query(
    limit=100, project=source_project_id, names=files_list
)
    
# Copy files if they don't already exist in my_project
for f in source_files:
    if f.name in my_file_names:
        print('File {} already exists in second project, skipping'.format(f.name))
    else:
        print('File {} does not exist in {}; copying now'.format(
              f.name, my_project.name))

        new_file = f.copy(project=my_project, name=f.name)

        # re-list files in target project to verify the copy worked
        my_files = [f.name for f in api.files.query(
                limit=100,project=my_project).all()]

        if f.name in my_files:
            print('Sucessfully copied one file!')
        else:
            print('Something went wrong...')

## What is the meaning of this?
Files are great, but without **metadata** they can be hard to manage. So here were are going to add metadata to these files. We will add one field that is _needed for the task_ and one to show _generality_.

We've already listed all your files in the last cell. Here we will check the metadata for each one. A **detail**-call for files returns the following *attributes*:
* **created_on** File creation date
* **id**     _Unique_ identifier for the file
* **name**   Name of the file, note this **is** metadata and can be _changed_
* **href**   Address<sup>4</sup> of the file.
* **modified_on** File modification date
* **tags** File tags
* **metadata** Dictionary of metadata
* **origin**  Will link back to a *task* if this is an output file 
* **project** ID of the project the file is in
* **parent** ID of the folder the file is in
* **type** file or folder
* **size** File size in bytes
* **storage.type** Indicates whether the file is on platform built-in storage or on a connected cloud storage (Volume)
* **storage.hosted_on_locations** Shows which cloud regions the file is stored in

The **metadata** dictionary is both _changeable_ and _expandable_, but initially rather sparse with:
* sample_id
* platform
* paired_end
* library_id

<sup>4</sup> This is the address where, by using API, you can get this resource.

#### PROTIP
 * A detailed _recipe_ for detailing files is [here](../../Recipes/SBPLAT/files_detailOne.ipynb).
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/get-file-details).

In [None]:
my_files = api.files.query(project=my_project, limit=100, names=files_list)

for single_file in my_files.all():
    print('You have selected file {} (size {} [bytes]). \n'.format(
        single_file.name, single_file.size
    ))
    print('The metadata in this file was: \n %s \n' % (single_file.metadata))
    md = {
        'platform_unit_id': '1',
        'hasFlair':'True'
    }

    for k in md.keys():
        single_file.metadata[k] = md[k]
    
    single_file.save()
    
    print('After the update, file metadata is: \n %s \n' % (single_file.metadata))

## Copy reference files from the _Public Reference_
Equivalent to the operation in **Copy input files from the _Public Reference_**, we are just looking for other file names. 

In [None]:
# Files to copy
ref_files = ['GRCh38.GRAF.Genome_Intervals.v1.bed',
             'GRCh38.GRAF.Linear_Reference.v1.fa',
             'GRCh38.GRAF.Linear_Reference.v1.fa.fai',
             'GRCh38.GRAF.Pan_Genome_Reference.v1.vcf.gz',
             'GRCh38.GRAF.Pan_Genome_Reference.v1.vcf.gz.tbi']


# LIST all file names in your project and file objects from source project
# Note that listing files in a project does not list subfolders
my_file_names = [f.name for f in 
    api.files.query(limit=100, project=my_project).all()]

source_files = api.files.query(
    limit=100, project=source_project_id, names=ref_files
)

# Copy files if they don't already exist in my_project
for f in source_files:
    if f.name in my_file_names:
        print('File {} already exists in second project, skipping'.format(f.name))
    else:
        print('File {} does not exist in {}; copying now'.format(
              f.name, my_project.name))

        new_file = f.copy(project=my_project, name=f.name)

        # re-list files in target project to verify the copy worked
        my_files = [f.name for f in api.files.query(
                limit=100,project=my_project).all()]

        if f.name in my_files:
            print('Sucessfully copied one file!')
        else:
            print('Something went wrong...')

## Copy a Public App
Seven Bridges maintains a [repository of publicly available apps](http://docs.sevenbridges.com/docs/public-apps) suitable for many different types of data analysis. These public apps, including tools and workflows, can be accessed via the API as part of the Public Reference project. They are also accessibly by a *visibility* property which can be set to `public`. Below, we use the first method to find apps within Public Reference project

First, we'll list all our projects. Then, we'll list the apps within the Public Reference project. Lastly, we'll copy an app from the Public Reference project to the my\_project. Here we also explicitly set _'limit':100_ inside the _query_. This helps speed up the auto-pagination feature within the object constructor.

The critical information for this `POST` request is the **app_id**. Note, you are **not** allowed to copy the same app **and** assign the same name<sup>6</sup> more than once. If you change the name, it is ok. 

To make these results very obvious, use an empty project as your my\_project (e.g. your <a href=projects_makeNew.ipynb> cookbook example project</a>) or add the _name_ arguement to something ridiculous like 'Orange Mocha Frapachino Maker'. In this example, we will handle these situations with a predefined error `Conflict` for this situation.

<sup>6</sup> Note that setting the **name** of a copied app defines the app **id**, which must be unique on the platform. Saving over an existing ID will raise a `Conflict` error.

In [None]:
# [USER INPUT] Set app name here
# Note that you can have multiple apps or projects with the same name. It is best practice to reference entities by ID.
a_name = 'GRAF Germline Variant Detection Workflow'

for app in api.apps.query(visibility='public', limit=100).all():
    if app.name == a_name:
        public_app = app
        break

try:
    new_app = public_app.copy(project=my_project)
    print('App {} copied to Project {}.'.format(public_app.name, my_project.name))
except Conflict:
    new_app = [a for a in api.apps.query(project=my_project) if a.name == a_name][0]
    print('App already exists in the destination project, reusing existing app.')
        
# re-list apps in target project to verify the copy worked
my_app_names = [a.name for a in api.apps.query(project=my_project.id, limit=100).all()]

if a_name in my_app_names:
    print('Sucessfully copied or reused one app!')
else:
    print('Something went wrong...')

## Check app inputs

First we can check what inputs are expected in our app. The app property `raw` contains a `dict` that has all the app information ready for grabs. Going through the `raw['inputs]'` can show us expected input IDs and types.
Hints for input types: 
- The `[]` suffix, e.g. `File[]`, is a syntax denoting a list of items, in this case Files. 
- The `?` suffix, e.g. `File?`, is a syntax denoting an optional input, in this case an optional File. Another way to represent an optional type is, e.g. `['null', 'int']` for optional integer input.

In [None]:
print('Expected inputs for app {}:'.format(new_app.name))
for inp in new_app.raw['inputs']:
    print('id: {}{}type: {}'.format(inp['id'].lstrip('#'), ' ' * (30 - len(inp['id'])), inp['type']))

## Build & Start tasks
Here, we use the copied reference files and set inputs. Note that input files are passed as a _file_ (or a _list_ of _files_). Here, there are no string, number or boolean inputs. For Apps that require these inputs, they would be the values, such as:

```python
inputs = {
    'num_repititions' : 8,
    'mask' : False, 
    'file_name' : 'output_backup.txt'
}
```

In [None]:
task_name = 'task created with quickstart.ipynb'

# get the file objects and set them as file inputs
# this is how a single file is set
in_intervals = api.files.query(project=my_project, limit=100,
                       names = [ref_files[0]])[0]
in_linear_reference = api.files.query(project=my_project, limit=100,
                       names=[ref_files[1]])[0]
in_graph_reference = api.files.query(project=my_project, limit=100,
                       names=[ref_files[3]])[0]

# and here an array of files is set
in_reads = api.files.query(project=my_project, limit=100,
                       names=files_list)

inputs = {
    'in_intervals': in_intervals,
    'in_linear_reference': in_linear_reference,
    'in_graph_reference': in_graph_reference,
    'in_reads': in_reads
}

my_task = api.tasks.create(name=task_name, project=my_project,
                           app=new_app, inputs=inputs)

# Check for errors and warnings
if my_task.errors:
    print(my_task.errors)
# elif my_task.warnings:        # feature is in staging
#     print(my_task.warnings)
else:
    print('Your task is good to go, launching!')
    
    # Start the task
    my_task.run()

## Print task status
Here we poll the recently created task. With my_task.reload() we get latest information.

In [None]:
my_task.reload()
print('Your task is in %s status' % (my_task.status))

## Wait for task completion
Simple loop to ping for task completion.

In [None]:
# [USER INPUT] Set loop time (seconds):
loop_time = 30
flag = {'taskRunning': True}

print('Pinging SBPLAT for task completion.')
while flag['taskRunning']:
    my_task.reload()
    if my_task.status == 'COMPLETED':
        flag['taskRunning'] = False
        print('Task has completed, life is beautiful')
    elif my_task.status  == 'FAILED':  
        print('Task failed, can not continue')
        raise KeyboardInterrupt
    else:
        print('Task status: {}'.format(my_task.status))
        time.sleep(loop_time)

## Get task outputs
Here we reload the task and print task outputs. 

In [None]:
my_task.reload()
print(my_task.outputs)

That’s it! We've executed a data analysis and obtained some results. We encourage you to try this procedure for yourself before getting started on your own data analyses. You can also visit our [API documentation](http://docs.sevenbridges.com/v1.0/page/api) to learn more about the Seven Bridges Platform and bringing your own tools.