# RNA sequencing workflow example

In this notebook, we'll walk through an RNA sequencing workflow in Ovation for Service Labs. Although the activities in the workflow can be accomplished using the web app (https://lab.ovation.io) to download & upload files, this notebook illustrates the API interactions to complete the workflow using existing bioinformatics tools. 

## Setup

In [None]:
import urllib
import texttable
import os
import glob

import ovation.lab.workflows as workflows
import ovation.lab.download as download
import ovation.lab.upload as upload

from ovation.session import connect
from importlib import reload
from tqdm import tqdm as tqdm
from pprint import pprint

## Connection

This interactive notebooks starts with an interactive `Session` connection. If you already have a (long-lived) API token, you can create a session with:

    s = ovation.session.Session(token, api='https://lab-services.ovation.io', token='/api/v1/sessions')

In [None]:
s = connect(input('Email: '), api='https://services-staging.ovation.io', token='/api/v1/sessions')

## Workflow

We'll need to know which workflow to post data to.

In [None]:
workflow_id = input('Workflow ID: ')

In [None]:
r = s.get(s.entity_path('workflows', workflow_id))
workflow = r.workflow

Here's the full workflow: 
![title](workflow.png)

The burnt-orange activities are most easily accomplished in the web app, so we'll assume that they're completed in the app. The secions below show the API calls for the light-orange colored activities.

What samples are in the pool?

In [None]:
samples = s.get(workflow.links.samples)

table = texttable.Texttable()
table.set_deco(texttable.Texttable.HEADER)
table.add_rows([["Identifier", "Date received"]] + [[s.identifier, s.date_received] for s in samples])
print(table.draw())

What activities are in the workflow? `workflow.relationships.keys` gives the label name for each activity:

In [None]:
pprint(list(workflow.relationships.keys()))

### Downloading files

In many activities, you'll want to download files from previous activities (e.g. the `fastq` files from demultiplexing in the Sequencing QC Prep activities). You can use `ovation.lab.download.download_resources` to get the resources from a labeled activity. For example:

    # Download the `xml-file` from the sequencing activity to the current working directory
    download.download_resources(s, workflow, 'sequencing', 'xml-file', output=cwd, progress=tqdm)

### Create batch, pool

*Complete in web app*

### Sequencing

You can create the activity wholly via the API...

In [None]:
activity_label = 'sequencing'
metadata = {
    'clusterDensity': 2,
    'q30Read1': 1.5,
    'errorRateRead1': 0.23,
} #read2, q30Read2, errorRateRead2 required for paired-end

# Resources for Illumina sequencer
resources = {'nextseq-run-info': ['RunInfo.xml'], 
             'nextseq-run-completion-status': ['RunCompletionStatus.xml']
            }

In [None]:
sequencing = workflows.create_activity(s,
                                       workflow_id, 
                                       activity_label, 
                                       activity=metadata, 
                                       resources=resources, 
                                       progress=tqdm)

Or assume that the activity has been completed by the lab team. Let's find the run type (single or paired) and the flowcell Id...

In [None]:
sequencing = workflows.get_activity(s, workflow, activity_label)

# Is this a single or paired-end read?
if sequencing.custom_attributes.singleRead:
    print('Single read')
else:
    print('Paired-end read')
    
# What's the flow cell ID?
print("Flow cell: {}".format(sequencing.custom_attributes.flowCellId))

 and upload the `RunInfo` and `RunCompletionStatus` files...

In [None]:
upload.upload_resources(s, sequencing, resources, progress=tqdm)

### Demultiplex

In [None]:
# Download the `xml-file` from the sequencing activity to the current working directory
download.download_resources(s, workflow, 'sequencing', 'xml-file', output=cwd, progress=tqdm)

In [None]:
activity_label = 'demultiplex'
metadata = {}

# Optional:  upload fastq files and associate them with the correct samples. Fastq files can be automatically provided
#            for project analysis.
resources = {'sample-sheet': ['files/sample-sheet.txt'],
             'reports-tar': ['Reports.tar.gz'],
             'fastq-file': [glob.glob("*.fastq")] # (Optional)
            }

In [None]:
demultiplex = workflows.create_activity(s, 
                                        workflow_id, 
                                        activity_label, 
                                        activity=metadata, 
                                        resources=resources, 
                                        progress=tqdm)

### Sequencing QC Prep — SortME RNA

In [None]:
activity_label = 'sequencing_qc_prep_sortmerna'
metadata = {}
resources = {'sortmerna-report': ['/Users/barry/Dropbox (Ovation.io)/Ovation.io Team Folder/Cofactor/sample files/sequencing/sortmerna_single_end.xls'],
             'sortmerna-log-tar':['/Users/barry/Dropbox (Ovation.io)/Ovation.io Team Folder/Cofactor/sample files/sequencing/sortmerna_single_end.xls']}

In [None]:
seq_qc_prep_sortmerna = workflows.create_activity(s, 
                                                  workflow_id, 
                                                  activity_label, 
                                                  activity=metadata, 
                                                  resources=resources,
                                                  progress=tqdm)

### Sequencing QC Prep — FastQC

In [None]:
activity_label = 'sequencing_qc_prep_fastqc'
metadata = {'singleRead': False} # True for paired-end
resources = {'fastqc-report': ['files/fastqc_single_end.xls']}

# Resource groups represent folders. Here we're uploading the "Lib-Sample" fastqc output folder. Ovation automatically parses the
# file name to associate each folder with the correct sample, assuming <sample>_fastqc or <sample>_[12]_fastqc
resource_groups = {'fastqc-output': ['files/Lib-Sample_fastqc']}

In [None]:
seq_qc_prep_fastqc = workflows.create_activity(s, 
                                               workflow_id, 
                                               activity_label, 
                                               activity=metadata,
                                               resources=resources,
                                               resource_groups=resource_groups,
                                               progress=tqdm)

### Sequencing QC

*Complete in web app*

### Trimmomatic

In [None]:
activity_label = 'alignment_prep_trimmomatic'
metadata = {}
resources = {'trimmomatic-report': ['trimmomatic.xls'],
            'trimmomatic-log-tar': ['trimmomatic-log.tar.gz']}

In [None]:
trimmomatic = workflows.create_activity(s,
                                        workflow_id, 
                                        activity_label, 
                                        activity=metadata, 
                                        resources=resources,
                                        progress=tqdm)

### SortME RNA

In [None]:
activity_label = 'alignment_prep_sortmerna'
metadata = {}
resources = {'sortmerna-report': ['sequencing-sortmerna.xls'],
            'sortmerna-log-tar':['sortmerna-log.tar.gz']}

In [None]:
trimmomatic = workflows.create_activity(s,
                                        workflow_id, 
                                        activity_label, 
                                        activity=metadata, 
                                        resources=resources,
                                        progress=tqdm)

### FastQC

In [None]:
activity_label = 'alignment_prep_fastqc'
metadata = {}
resources = {'fastqc-report': ['files/fastqc_single_end.xls']}
# Resource groups represent folders. Here we're uploading the "Lib-Sample" fastqc output folder. Ovation automatically parses the
# file name to associate each folder with the correct sample, assuming <sample>_fastqc or <sample>_[12]_fastqc
resource_groups = {'fastqc-output': ['files/Lib-Sample_fastqc']}

In [None]:
fastqc = workflows.create_activity(s,
                                   workflow_id, 
                                   activity_label, 
                                   activity=metadata, 
                                   resources=resources,
                                   resource_groups=resource_groups,
                                   progress=tqdm)

### Alignment

In [None]:
activity_label = 'alignment'
metadata = {}
resources = {'alignment-stats': ['star_alignmentRates.xls'],
             'stats-tar':['stats.tar.gz'],
             'bam-file': glob.glob("*.bam") # Optional
            }

In [None]:
alginment = workflows.create_activity(s,
                                      workflow_id, 
                                      activity_label, 
                                      activity=metadata, 
                                      resources=resources,
                                      progress=tqdm)

### BAM QC Prep

In [None]:
activity_label = 'bam_qc_prep'
metadata = {}
resources = {'rnaseqc-metrics': ['rnaseqc_combined_metrics.xls'],
             'rnaseqc-tar': ['rnaseqc.tar.gz']}

In [None]:
bamqc_prep = workflows.create_activity(s,
                                      workflow_id, 
                                      activity_label, 
                                      activity=metadata, 
                                      resources=resources,
                                      progress=tqdm)

### BAM QC

*Complete in web app*