# How can I make, validate, and run a batch task?
### Overview
Batching allows you to run identical analyses on different data, by entering multiple input files and grouping them with specified metadata criteria. For instance, you can group input files by File, Sample, Library, Platform unit, or File segment.  By using Batch Input, you can process multiple datasets with a single workflow containing the same parameter settings without having to set up the workflow multiple times. Batching creates one parent task containing multiple child tasks: one for each group of files.

Learn more about [performing a batch analysis](http://docs.sevenbridges.com/docs/perform-batch-analysis) from our Knowledge Center

### Objective
This tutorial introduces you to performing an analysis where you batch by file using the API with the `sevenbridges-python` bindings library.

### Cost
This will burn through some processing credits (**about \$0.48** per file). Note that when you signed up for the Seven Bridges Platform, your account was automatically credited with \$100 in free funds (your Pilot Funds) to use for data analyses. 

If you don't want to use your credits, you can [create a DRAFT task](http://docs.sevenbridges.com/docs/create-a-new-task) without running it just see how batching works. To do this, just comment out the following line in step 5 below: 

```python
    my_task.run()
```

### Procedure
We are going to start from scratch in this tutorial. Below, find a list of procedures with links to okAPI recipes containing example Python scripts and the relevant API requests from our API reference library.
 
 1. Create a project. [[recipe](../../Recipes/SBPLAT/projects_makeNew.ipynb)]  [[reference](http://docs.sevenbridges.com/docs/create-a-new-project)]
 2. (optional) Add members. [[recipe](../../Recipes/SBPLAT/projects_addMembers.ipynb)]  [[reference](http://docs.sevenbridges.com/docs/add-a-member-to-a-project)]
 3. Copy Whole Genome Sequencing (WGS) bam files from the [CCLE](https://igor.sbgenomics.com/u/sevenbridges/cancer-cell-line-encyclopedia-ccle/) public project. [[recipe](../../Recipes/SBPLAT/files_copyFromMyProject.ipynb)]  [[reference 1](http://docs.sevenbridges.com/docs/list-files-primary-method)] [[reference 2](http://docs.sevenbridges.com/docs/copy-a-file) ]
 4. Copy the workflow *CNVnator Analysis* from the Seven Bridges [Public Apps](http://docs.sevenbridges.com/docs/public-apps) repository. [[recipe](../../Recipes/SBPLAT/apps_copyFromPublicApps.ipynb)]  [[reference 1](http://docs.sevenbridges.com/docs/list-all-apps-available-to-you)] [[reference 2](http://docs.sevenbridges.com/docs/copy-an-app)]
 5. Create, check, and start a batch task:
  * Find task inputs. [[recipe](../../Recipes/SBPLAT/apps_detailOne.ipynb)]  [[reference](http://docs.sevenbridges.com/docs/get-raw-cwl-for-an-app-revision)]
  * Create a batch task where you batch by `item`. [[reference](http://docs.sevenbridges.com/docs/create-a-new-task)]
  * Check our draft task for errors.
  * Run the analysis. [[recipe](h../../Recipes/SBPLAT/tasks_create.ipynb)]  [[reference](http://docs.sevenbridges.com/docs/perform-an-action-on-a-specific-task)]
 
Throughout this tutorial, we will link back to different recipes in case you need more detail about the calls. We will also link to our API reference, a list of comprehensive API requests in our documentation, for each call. Both links will be under the **PROTIPS** section heading at the end of the markdown section.

### Prerequisites
1. You need your **authentication token** and the API needs to know about it. See <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a> for details.
2. You have cloned the Cancer Cell Line Encyclopedia (CCLE) public project. We will walk through that in the markdown of Step 3.


## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The _Api_ object needs to know your **auth\_token** and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] Specify platform {cgc, sbg}
prof = 'sbpla'


config_config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_config_file)

## 1) Create a  new project
We create a project using your first billing group. The project is described by a small dictionary including the following:
* **billing_group** *Billing group* that will be charged for this project. 
* **description**   (optional) Project description
* **name**   Name of the project, may be *non-unique*<sup>1</sup>
* **tags**   List of tags, currently _unused_. **cannot** be set while creating project

#### PROTIPS
 * A detailed recipe for creating projects is [here](../../Recipes/SBPLAT/projects_makeNew.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/create-a-new-project)

In [None]:
# [USER INPUT] Set project name here:
new_project_name = 'cici_pici'                          
      
    
# What are my funding sources?
billing_groups = api.billing_groups.query()  

# Pick the first group (arbitrary)
print((billing_groups[0].name + \
       ' will be charged for computation and storage (if applicable) for your new project'))

# Set up the information for your new project
new_project = {
        'billing_group': billing_groups[0].id,
        'description': """A project created by the API recipe (projects_makeNew.ipynb).
                      This also supports **markdown**
                      _Pretty cool_, right?
                   """,
        'name': new_project_name
}

# check if this project already exists. LIST all projects and check for name match
my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name]      
              
if my_project:    # exploit fact that empty list is False, {list, tuple, etc} is True
    print('A project with the name (%s) already exists, please choose a unique name' \
          % new_project_name)
    raise KeyboardInterrupt
else:
    # CREATE the new project
    my_project = api.projects.create(name = new_project['name'], \
                                     billing_group = new_project['billing_group'], \
                                     description = new_project['description'])
    
    # (re)list all projects, and get your new project
    my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name][0]

## 2) (optional) Add project members
Teamwork - it gets stuff done! You might want to add some members to your project. If so please follow the next cell. Otherwise, skip forward to step 3.

#### PROTIPS
 * A detailed recipe for adding members to project is [here](../../Recipes/SBPLAT/projects_addMembers.ipynb).
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/add-a-member-to-a-project).

In [None]:
# [USER INPUT] List names of members to add (prefilled with Jacqueline & Fede:
user_names =['jrosains',
            'ftorri']


# Permissions - here we are assigning all users the same permissions (could also be a list)
user_permissions = {'write': True,
                    'read': True,
                    'copy': True,
                    'execute': False,
                    'admin': False
                    }

for name in user_names:
    my_project.add_member(user = name, permissions = user_permissions)

## 3) Copy WGS bam files from the CCLE project
The Cancer Cell Line Encyclopedia (CCLE) public project contains Open Access sequencing data (in the form of reads aligned to the hg19 broad variant reference genome) for nearly 1000 cancer cell line samples. You can use the data contained within this project for your analyses on the Platform. Learn more about the [CCLE public project](http://docs.sevenbridges.com/docs/ccle).

For this tutorial, we will obtain our files from the CCLE public project. To do so, we will first clone this project on the visual interface. This step cannot be done with the API.

### Clone the project (GUI)
Log in to the Seven Bridges [Platform]() and click on **Public projects**. From the drop-down menu, select **Cancer Cell Line Encyclopedia (CCLE)**. Near the top of the screen, click **Copy this project**.

<img src = "images/CCLE_0.png" height="462" width="780"> 

A dialog box prompt you for the new project name. Rename the project or simply press the **Copy** button.

<img src = "images/CCLE_1.png" height="288" width="405"> 

You will be taken to your new project.

<img src = "images/CCLE_2.png" height="416" width="780"> 

### Search and copy files
Now that we have the project copied, we can access all of its files. We will search files within that project and copy the files containing:

 * an experimental strategy of **WXS**
 * a file extension of **bam**

#### PROTIPS
 * A detailed, related recipe for copying files from a project is [here](../../Recipes/SBPLAT/files_copyFromMyProject.ipynb).
 * Detailed documentation of these particular REST architectural style request is available [here (list files)](http://docs.sevenbridges.com/v1.0/docs/list-files-primary-method) and [here (copy files)](http://docs.sevenbridges.com/docs/copy-a-file).

In [None]:
# [USER INPUT] Set the source project name:
source_project_name = 'Copy of Cancer Cell Line Encyclopedia (CCLE)'  
files_to_copy = 10
reference_genome = 'HG19_Broad_variant.fasta'


# get details of your source project
source_project = [p for p in api.projects.query(limit=100).all() \
                  if p.name == source_project_name]

if not source_project:  # exploit fact that empty list is False, {list, tuple, etc} is True
    print('Source project (%s) not found, check spelling' % source_project_name)
    raise KeyboardInterrupt
else:
    source_project = source_project[0]
    
# list all files in source project that are WGS, filter out the BAM files
source_files = api.files.query(limit = 100, project = source_project, \
                              metadata = {'experimental_strategy' : 'WXS'})
source_files = [f for f in source_files.all() if \
               f.name[-3:] == 'bam']

# List the files you already have
my_file_names = [f.name for f in \
                 api.files.query(limit = 100, project = my_project.id).all()]

# Copy files to your project
bam_files = []    # will use this list later as an input
count = 0
for f in source_files:
    if f.name in my_file_names:
        print('file already exists in your project, skipping')
        bam_files.append(api.files.query(project=my_project, \
                                        names =[f.name])[0])
    else:
        print('File (%s) does not exist in Project (%s); copying now' % \
          (f.name, my_project.name))
        new_f = f.copy(project = my_project)
        bam_files.append(new_f)
    count += 1
    if count >= files_to_copy:
        break
            
# Get the reference_genome from the same project
ref_file = api.files.query(limit = 100, \
                           project = source_project, \
                           names = [reference_genome])[0]

if ref_file.name in my_file_names:
    print('file already exists in your project, skipping')
else:
    print('File (%s) does not exist in Project (%s); copying now' % \
      (ref_file.name, my_project.name))
    ref_genome = ref_file.copy(project = my_project)

## 4) Create a workflow from the Application JSON
We will load a tool from its JSON ([located here](files/CNVnator_WF.json)) because it has been modified from the version in **Public Apps**. This is _not_ the most common user-flow, but it may be useful to see. We need to import `json` here to do this correctly. Please be **careful** when exporting and importing Apps as normal copy-paste operations may induce JSON formatting errors.

#### PROTIPS
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/add-an-app-using-raw-cwl).

In [None]:
# Load the Application JSONs
import json

f = open('files/CNVnator_WF.json', 'r')
tool_raw = f.read()
tool = json.loads(tool_raw)

# Create the app
a_id = (my_project.id + '/cnvnator')
my_app = api.apps.install_app(id =a_id, raw = tool)

## 5) Create, check, and start a _batch_ of tasks
We need to take a few steps here to properly execute a batch task. 

 1. Get the task inputs from the raw CWL.
 2. Set up the task, feed a _list_ to one input, and set the task to be a **batch** task.
 3. Check for an _warnings_ or _errors_ in the created batch task.
 4. Start the batch task, child tasks will be created automatically.
 
#### PROTIPS
 * Detailed documentation of this particular REST architectural style request is available [here (get inputs)](http://docs.sevenbridges.com/docs/get-raw-cwl-for-an-app-revision), [here (create a draft task)](http://docs.sevenbridges.com/docs/create-a-new-task), and [here (run task)](http://docs.sevenbridges.com/docs/perform-an-action-on-a-specific-task).
 * Learn more about about what happens when you run a task from [our documentaton](http://docs.sevenbridges.com/blog/what-happens-when-i-run-a-task).

In [None]:
# Get tasks inputs
print("  Tasks (%s) inputs:" % (my_app.name))
for in_a in my_app.raw['inputs']:
    print in_a['id']

# Set up a task
task_name = 'task created with batch_o_tasks_standard.ipynb'
inputs = {
    'ref_genome' : ref_genome,
    'bam_files' : bam_files,   # we set this up a few cells ago
    'histogram' : 100,
    'evaluation' : 100,
    'calling' : 100,
    'partitioning' : 100,
    'no_gc_correction' : False,
    'statistics' : 100
}

my_task = api.tasks.create(name=task_name, project=my_project, \
                           app=my_app, inputs=inputs, \
                           batch_input = 'bam_files', \
                           batch_by = { "type": "ITEM" })

# Check for errors and warnings
if my_task.errors:
    print(my_task.errors)
# elif my_task.warnings:        # feature is in staging
#     print(my_task.warnings)
else:
    print('Your tasks are good to go, launching!')
    
    # Start the task
    my_task.run()

**Good luck, have fun!**