
#Transrate
##Exploring the new quality assessment for de novo transcriptome assemblies. 

<img src="images/Logo.png">

Website here: http://hibberdlab.com/transrate/index.html

Transrate trys to address the major issues that arise during de novo assembly. 

<img src="images/Misassembly.png">

###Basic pipeline
####Input assembly contigs, paired-end reads
<img src="images/Pipeline.png">


###Downloading transrate
It is an easy enough download. There are two download options:
1. Install from source. 
2. Install with gem (sort of like pip but for ruby). 

I found that option 1 worked and option 2 did not work. As an aside, you have to run transrate on a 64 bit system, so make sure that you have the correct system setup. 

``` bash 
wget https://bintray.com/artifact/download/blahah/generic/transrate-1.0.1-osx.tar.gz
tar -xvzf transrate-1.0.1-osx.tar.gz 
cd transrate-1.0.1-osx

```
Now all you need to do is edit your PATH to include the transrate folder. This can be done by editing either your .bash_profile or .bashrc file in your home directory. Adding the folowing to the end should suffice:

``` bash 
export PATH="/PATH/TO/FOLDER/transrate-1.0.1-osx:$PATH"

```
Transrate has a nice feature which will install of the dependencies for you. To do this run: 

``` bash 
sudo transrate --install-deps read

```
It should return the following message. 
``` bash 
Checking dependencies
All dependences installed
```

###Running transrate
There are three levels of transrate metrics:
1. Input *only* the assembly and get basic stats back (akin to TrinityStats.pl... so we will skip it)
2. Input the assembly and your reads and get a more detailed quantitative explanation back
3. Input the assembly and a "closely related" genome 

###Let's work with the transcriptome example from transrate


In [6]:
%%bash 
transrate --assembly transcripts.fa --left left.fq --right right.fq 

[ INFO] 2015-07-13 13:05:36 : Loading assembly: /Users/harrietalexander/Analysis/Transrate/example_data/transcripts.fa
[ INFO] 2015-07-13 13:05:36 : Analysing assembly: /Users/harrietalexander/Analysis/Transrate/example_data/transcripts.fa
[ INFO] 2015-07-13 13:05:36 : Results will be saved in /Users/harrietalexander/Analysis/Transrate/example_data/transrate_results/transcripts
[ INFO] 2015-07-13 13:05:36 : Calculating contig metrics...
[ INFO] 2015-07-13 13:05:36 : Contig metrics:
[ INFO] 2015-07-13 13:05:36 : -----------------------------------
[ INFO] 2015-07-13 13:05:36 : n seqs                           15
[ INFO] 2015-07-13 13:05:36 : smallest                        849
[ INFO] 2015-07-13 13:05:36 : largest                        2396
[ INFO] 2015-07-13 13:05:36 : n bases                       28562
[ INFO] 2015-07-13 13:05:36 : mean len                    1904.13
[ INFO] 2015-07-13 13:05:36 : n under 200                       0
[ INFO] 2015-07-13 13:05:36 : n over 1k            

##What does the output mean? 



###Basic stats on transcriptome contigs (ignoring mapping)

<img src="images/Table.png">

##Transrate-specific metrics

###**Contig score**: 

Considers... 
1. Whether each base has been called correctly 
2. Whether each base is truly part of the transcript
3. The probability that the contig is derived from a single transcript (rather than pieces of two or more transcripts)
4. The probability that the contig is structurally complete and correct.

###**Transrate score**: 

Geometric mean of all of the contig scores

###Stats on transcriptome contigs (considering mapping)


<img src="images/Table2.png">


###What files were produced? 

In [12]:
%%bash
ls -l 


total 5112
-rw-r--r--  1 harrietalexander  staff    10343 Jul 14 09:03 TransrateLesson.ipynb
drwxr-xr-x  9 harrietalexander  staff      306 Jul 14 08:45 images
drwxr-xr-x  2 harrietalexander  staff       68 Jul 13 11:38 ipythonNotebook
-rw-r--r--@ 1 harrietalexander  staff  1283063 Feb  9  2014 left.fq
-rw-r--r--@ 1 harrietalexander  staff  1283063 Feb  9  2014 right.fq
-rw-r--r--@ 1 harrietalexander  staff    29213 Feb  9  2014 transcripts.fa
drwxr-xr-x  4 harrietalexander  staff      136 Jul 13 13:03 transrate_results
-rw-r--r--  1 harrietalexander  staff        0 Jul 13 11:32 untitled.txt


In [16]:
%%bash
ls -l transrate_results/

total 8
-rw-r--r--   1 harrietalexander  staff  813 Jul 13 13:05 assemblies.csv
drwxr-xr-x  14 harrietalexander  staff  476 Jul 13 13:05 transcripts


In [15]:
%%bash
ls -l transrate_results/transcripts/

total 2880
-rw-r--r--  1 harrietalexander  staff     250 Jul 13 13:05 assembly_score_optimisation.csv
-rw-r--r--  1 harrietalexander  staff    1652 Jul 13 13:05 bad.transcripts.fa
-rw-r--r--  1 harrietalexander  staff    2163 Jul 13 13:05 contigs.csv
-rw-r--r--  1 harrietalexander  staff   27093 Jul 13 13:05 good.transcripts.fa
-rw-r--r--  1 harrietalexander  staff       6 Jul 13 13:05 left.fq-right.fq-read_count.txt
-rw-r--r--  1 harrietalexander  staff  650074 Jul 13 13:05 left.fq.right.fq.transcripts.assigned.bam
-rwxr-----  1 harrietalexander  staff  766295 Jul 13 13:05 left.fq.right.fq.transcripts.bam
drwxr-xr-x  3 harrietalexander  staff     102 Jul 13 13:05 libParams
drwxr-xr-x  4 harrietalexander  staff     136 Jul 13 13:05 logs
drwxr-xr-x  6 harrietalexander  staff     204 Jul 13 13:05 transcripts
-rw-r--r--  1 harrietalexander  staff     818 Jul 13 13:05 transcripts.fa_bam_info.csv
-rw-r--r--  1 harrietalexander  staff     967 Jul 13 13:05 transcripts.fa_quant.sf


In [30]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
contigs=pd.read_csv("transrate_results/transcripts/contigs.csv")
contigs


Unnamed: 0,contig_name,length,prop_gc,gc_skew,at_skew,cpg_count,cpg_ratio,orf_length,linguistic_complexity_6,in_bridges,p_good,p_bases_covered,p_seq_true,score,p_not_segmented,eff_length,eff_count,tpm,coverage
0,NM_001168316,2283,0.487516,0.044025,0.097436,177,1.307349,533,0.37915,0,1.0,0.968463,1.0,0.889815,0.918791,2283,133.201,10886.6,2.92
1,NM_174914,2385,0.485535,0.044905,0.093725,182,1.297416,567,0.390625,0,0.998062,1.0,0.999889,0.954779,0.956739,2385,1577.15,123389.0,33.06
2,NR_031764,1853,0.507285,0.053191,0.082147,154,1.295481,221,0.323975,0,1.0,0.981112,1.0,0.941532,0.959658,1853,53.6536,5402.76,1.45
3,NM_004503,1681,0.527662,-0.127396,0.06801,180,1.563721,235,0.299072,0,1.0,0.998215,0.999913,0.900239,0.901927,1681,332.157,36869.6,9.88
4,NM_006897,1541,0.552239,-0.043478,0.115942,222,1.89312,260,0.287109,0,1.0,0.999351,1.0,0.930615,0.931219,1541,664.001,80400.5,21.54
5,NM_014212,2037,0.566519,-0.067591,-0.023783,272,1.671849,304,0.347168,0,1.0,0.860579,1.0,0.851496,0.989445,2037,55.0092,5038.92,1.35
6,NM_014620,2300,0.508261,-0.055603,0.036251,225,1.519449,264,0.366455,0,0.998382,1.0,0.999815,0.911184,0.91283,2300,609.519,49448.4,13.25
7,NM_017409,1959,0.533435,-0.100478,0.094092,227,1.645488,342,0.340576,0,1.0,0.873915,1.0,0.867795,0.992997,1959,47.0093,4477.57,1.2
8,NM_017410,2396,0.57596,-0.05942,0.037402,339,1.712082,330,0.388916,0,1.0,0.769199,1.0,0.765217,0.994823,2396,42.0094,3271.54,0.88
9,NM_018953,1612,0.591811,-0.121593,0.051672,244,1.754636,229,0.297852,0,1.0,0.998139,1.0,0.92538,0.927105,1612,228.007,26392.2,7.07


##Looking at pre-existing data from NCBI

<img src="images/Spearman.png">

<img src="images/Comparison4.png">

<img src="images/AllData.png">

