# Rough draft of getting _vcf_ files
### Overview
Different tertiary analysis software often works with _vcf_ or _gene expression_ files. Depending on the API they use, you may want to _point to the actual file_ **or** accept _an http download link_. We are going to show both approaches here, but adding a download step is trivial<sup>1</sup>.

There are **two** approaches we will show to get **vcf download links**. This may be something you would like to do locally or as a _push-button_ from within a tertiary analysis GUI.
 
 * Specify a **project** and grab **all** vcf files within that project (a.k.a. the _lazy way_)
 * Specify both a **task** and **project** and look for _outputs_ of that task which are vcf format.
 
Obvious extensions would be to keep track of previously imported file names, or only take tasks more recent than a certain date. 

### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="Setup_API_environment.ipynb">**Setup_API_environment.ipynb**</a> for details.
 2. You understand how to <a href="../../Recipes/SBPLAT/projects_listAll.ipynb" target="_blank">list</a> **projects** you are a member of (we will just use that call directly and pick one here).
 3. You understand how to <a href="../../Recipes/SBPLAT/files_listAll.ipynb" target="_blank">list</a> **files** within one of your projects.
 4. You understand how to <a href="../../Recipes/SBPLAT/tasks_monitorAndGetResults.ipynb" target="_blank">deal with</a> **tasks** within one of your projects. 

<sup>1</sup> the **.download()** method for a _file object_ will do this directly. You are welcome to use your favorite flavor of downloader.

## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The _Api_ object needs to know your **auth\_token** and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] specify platform {cgc, sbg}
prof = 'default'


config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

# Approach 1: Get all files from a project

___

**NOTE**

This tutorial does not list through subfolders in projects. Only the top level files are captured.

___

This is a more _brute-force_ approach to get all files in a project. This does not discriminate between _reference_ or _output_ vcf files. **Approach 1** consists of three steps:

 * Find my project
 * Find all files in my project
 * Get download links


## Find all files in my project
A **list**-call for files returns the following *attributes*:

 * **id**     _Unique_ identifier for each file
 * **name**   Name of file, maybe _non-unique_
 * **href**   Address<sup>3</sup> of the file.

<sup>3</sup> This *address* is for the API, but will not work in a browser.

In [None]:
# [USER INPUT] Set project name:
my_project = api.projects.get('username/project-name')

# [USER INPUT] Set the input file extension we are looking for
input_ext = 'vcf'

# LIST all files in the source and target project
my_files = [f for f in api.files.query(limit=100, project=my_project.id).all() if f.name.endswith(input_ext)]

print('Project {} has {} matching files:'.format(my_project.name, len(my_files)))
for f in my_files:
    print(f.name)

## Get download links
Files objects have two methods for downloading:

 * **.download_info()**
 * **.download()**
 
Here we use the first method to build and print a list of download links to pass to a tertiary analysis provider.

In [None]:
# Make download list
list_of_urls = []
for f in my_files:
    list_of_urls.append(f.download_info())

for url in list_of_urls:
    print(url)

# Approach 2: Get all files from a particular task in a project
This is a _targeted_ approach to get _specific_ output files from a _particular_ task. **Approach 2** consists of four steps:

 * Find my project
 * Find a **particular** task
 * Find all files in my project
 * Get download links

## Find a particular task
Here we will first return all tasks within the project selected above, then search for a particular one. 

#### NOTE
It would be **much cleaner** to work with tasks list (my_tasks) and a cutoff date. 

In [None]:
# [USER INPUT] Set the task we are looking for (ID can be found in task URL)
single_task = api.tasks.get('cdb84a30-8367-4219-a013-47032a632c21')

my_project = single_task.project


In [None]:
single_task.outputs

## Find specific output files in the task
First use the **.outputs** attribute of the task to find particular files you want to download. Then query all files and check for matching ids.

In [None]:
# [USER INPUT] Specify which task outputs you like
outputs = ['out_filtered_variants']

# get the file ids assigned to those outputs
my_files = []
for out in outputs:
    my_files.append(single_task.outputs[out])

my_file_names = [f.name for f in my_files]
print('There are {} files for your selected outputs:'
      .format(len(my_file_names)))
for f in my_file_names:
    print(f)

## Get download links
Files objects have two methods for downloading:

 * **.download_info()**
 * **.download()**
 
Here we also show the second method and download the first file to pass to a tertiary analysis provider.

In [None]:
# Make download list
list_of_urls = []
for f in my_files:
    list_of_urls.append(f.download_info())

for url in list_of_urls:
    print(url)
    
# Download first file to local directory
f = my_files[0]
f.download(path=f.name)

# Additional Information

Detailed documentation of this REST architectural style API is available [here](http://docs.sevenbridges.com/docs/the-api). Details of particular API calls are in the linked _recipes_. The sevenbridges-python bindings are available on [github](https://github.com/sbg/sevenbridges-python) along with binding documentation [here](http://sevenbridges-python.readthedocs.io/en/latest/quickstart).