# How can I make, validate, and run a batch task?
### Overview
We are going to _start from scratch_ in this tutorial. Specifically, we will:
 
 1. Create a project
 2. (optional) Add members
 3. Copy WGS bam files from Public [CCLE project](https://igor.sbgenomics.com/u/sevenbridges/cancer-cell-line-encyclopedia-ccle/)
 4. Upload the workflow _CNVnator Analysis_ from a version modified in Public Apps
 5. Create a batch task
 6. Check for errors
 7. Spin it up
 
Throughout this **tutorial**, we will link back to different **recipes** in case you need more detail about the calls. We will also link to the **documentation** for each call. Both links will be under the **PROTIPS** section heading at the end of the markdown section.

### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="Setup_API_environment.ipynb">**Setup_API_environment.ipynb**</a> for details.
 2. You have cloned the Public Project _Cancer Cell Line Encyclopedia (CCLE)_. We will walk through that in the markdown of Step 3.

 
### WARNING
This will burn through some processing credits (**about \$0.48** per file). You can create _DRAFT_ tasks but not run them just see how it works. To do this, just comment out the line: 

```python
    my_task.run()      
```

## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The _Api_ object needs to know your **auth\_token** and the correct path. Here we assume you are using the .sbgrc file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] Specify platform {cgc, sbg}
prof = 'sbpla'


config_config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_config_file)

## 1) Create a  new project
We create a project using your first billing group. The project is described by a small dictionary:
* **billing_group** *Billing group* that will be charged for this project
* **description**   (optional) Project description
* **name**   Name of the project, may be *non-unique*<sup>1</sup>
* **tags**   List of tags, currently _unused_. **cannot** be set while creating project

#### PROTIPS
 * A detailed _recipe_ for creating projects is [here](../../Recipes/SBPLAT/projects_makeNew.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/create-a-new-project)

In [None]:
# [USER INPUT] Set project name here:
new_project_name = 'cici_pici'                          
      
    
# What are my funding sources?
billing_groups = api.billing_groups.query()  

# Pick the first group (arbitrary)
print((billing_groups[0].name + \
       ' will be charged for computation and storage (if applicable) for your new project'))

# Set up the information for your new project
new_project = {
        'billing_group': billing_groups[0].id,
        'description': """A project created by the API recipe (projects_makeNew.ipynb).
                      This also supports **markdown**
                      _Pretty cool_, right?
                   """,
        'name': new_project_name
}

# check if this project already exists. LIST all projects and check for name match
my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name]      
              
if my_project:    # exploit fact that empty list is False, {list, tuple, etc} is True
    print('A project with the name (%s) already exists, please choose a unique name' \
          % new_project_name)
    raise KeyboardInterrupt
else:
    # CREATE the new project
    my_project = api.projects.create(name = new_project['name'], \
                                     billing_group = new_project['billing_group'], \
                                     description = new_project['description'])
    
    # (re)list all projects, and get your new project
    my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name][0]

## 2) (optional) Add project members
Teamwork - it gets stuff done! You might want to add some members to your project, if so please follow the next cell.

#### PROTIPS
 * A detailed _recipe_ for adding members to project is [here](../../Recipes/SBPLAT/projects_addMembers.ipynb)
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/add-a-member-to-a-project)

In [None]:
# [USER INPUT] List names of members to add (prefilled with Jacqueline & Fede:
user_names =['jrosains',
            'ftorri']


# Permissions - here we are assigning all users the same permissions (could also be a list)
user_permissions = {'write': True,
                    'read': True,
                    'copy': True,
                    'execute': False,
                    'admin': False
                    }

for name in user_names:
    my_project.add_member(user = name, permissions = user_permissions)

## 3) Copy WGS bam files from the CCLE project
There is a helpful Public Project on the Seven Bridges Platform called CCLE. We are going to take all of our files from there. 

**@Jack, deleted the section about cloning below, you might have to adjust the code block below this text to match**

### Search and copy files
We have copy permissions for all of the files in the CCLE project. We will search files within that project and copy the ones which fit our criteria - listed here:

 * experimental strategy is **WXS**
 * file extension is **bam**

#### PROTIPS
 * A detailed, related _recipe_ for copying files from a project is [here](../../Recipes/SBPLAT/files_copyFromMyProject.ipynb)
 * Detailed documentation of these particular REST architectural style request is available [here (list files)](http://docs.sevenbridges.com/v1.0/docs/list-files-primary-method) and [here (copy files)](http://docs.sevenbridges.com/docs/copy-a-file)

In [None]:
# [USER INPUT] Set the source project name:
source_project_name = 'Copy of Cancer Cell Line Encyclopedia (CCLE)'  
files_to_copy = 10
reference_genome = 'HG19_Broad_variant.fasta'


# get details of your source project
source_project = [p for p in api.projects.query(limit=100).all() \
                  if p.name == source_project_name]

if not source_project:  # exploit fact that empty list is False, {list, tuple, etc} is True
    print('Source project (%s) not found, check spelling' % source_project_name)
    raise KeyboardInterrupt
else:
    source_project = source_project[0]
    
# list all files in source project that are WGS, filter out the BAM files
source_files = api.files.query(limit = 100, project = source_project, \
                              metadata = {'experimental_strategy' : 'WXS'})
source_files = [f for f in source_files.all() if \
               f.name[-3:] == 'bam']

# List the files you already have
my_file_names = [f.name for f in \
                 api.files.query(limit = 100, project = my_project.id).all()]

# Copy files to your project
bam_files = []    # will use this list later as an input
count = 0
for f in source_files:
    if f.name in my_file_names:
        print('file already exists in your project, skipping')
        bam_files.append(api.files.query(project=my_project, \
                                        names =[f.name])[0])
    else:
        print('File (%s) does not exist in Project (%s); copying now' % \
          (f.name, my_project.name))
        new_f = f.copy(project = my_project)
        bam_files.append(new_f)
    count += 1
    if count >= files_to_copy:
        break
            
# Get the reference_genome from the same project
ref_file = api.files.query(limit = 100, \
                           project = source_project, \
                           names = [reference_genome])[0]

if ref_file.name in my_file_names:
    print('file already exists in your project, skipping')
else:
    print('File (%s) does not exist in Project (%s); copying now' % \
      (ref_file.name, my_project.name))
    ref_genome = ref_file.copy(project = my_project)

## 4) Create a workflow from the Application JSON
We will load a tool from it's JSON here because it has been modified from the version in _Public Apps_. This is _not_ the most common user-flow, but maybe is useful to see. We need to import _json_ here to do this correctly. Please be **careful** when exporting and importing Apps as normal _copy-paste_ operations may induce JSON formatting errors.

#### PROTIPS
 * Detailed documentation of this particular REST architectural style request is available [here](http://docs.sevenbridges.com/docs/add-an-app-using-raw-cwl)

In [None]:
# Load the Application JSONs
import json

f = open('files/CNVnator_WF.json', 'r')
tool_raw = f.read()
tool = json.loads(tool_raw)

# Create the app
a_id = (my_project.id + '/cnvnator')
my_app = api.apps.install_app(id =a_id, raw = tool)

## 5) Create, check, and start a _batch_ of tasks
We need to take a few steps here to properly execute a batch task. 

 1. Get the task inputs
 2. Set up the task, feed a _list_ to one input, and set the task to be a **batch** task
 3. Check for an _warnings_ or _errors_ in the created batch task
 4. Start the batch task, _child tasks_ will be created automatically
 
#### PROTIPS
 * Detailed documentation of this particular REST architectural style request is available [here (get inputs)](http://docs.sevenbridges.com/docs/get-task-inputs), [here (create task)](http://docs.sevenbridges.com/docs/create-a-new-task), and [here (run task)](http://docs.sevenbridges.com/docs/perform-an-action-on-a-specific-task)

In [None]:
# Get tasks inputs
print("  Tasks (%s) inputs:" % (my_app.name))
for in_a in my_app.raw['inputs']:
    print in_a['id']

# Set up a task
task_name = 'task created with batch_o_tasks_standard.ipynb'
inputs = {
    'ref_genome' : ref_genome,
    'bam_files' : bam_files,   # we set this up a few cells ago
    'histogram' : 100,
    'evaluation' : 100,
    'calling' : 100,
    'partitioning' : 100,
    'no_gc_correction' : False,
    'statistics' : 100
}

my_task = api.tasks.create(name=task_name, project=my_project, \
                           app=my_app, inputs=inputs, \
                           batch_input = 'bam_files', \
                           batch_by = { "type": "ITEM" })

# Check for errors and warnings
if my_task.errors:
    print(my_task.errors)
# elif my_task.warnings:        # feature is in staging
#     print(my_task.warnings)
else:
    print('Your tasks are good to go, launching!')
    
    # Start the task
    my_task.run()

**Good luck, have fun!**