Skip to content

Commit

Permalink
Merge branch 'ha_work' of github.com:ngs-docs/2016-metagenomics-sio i…
Browse files Browse the repository at this point in the history
…nto work
  • Loading branch information
ctb committed Oct 12, 2016
2 parents 611d85d + 6d0b552 commit d33e0b1
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 9 deletions.
4 changes: 2 additions & 2 deletions circos_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Circos runs within Perl and as such does not need to be compiled to run. So, we
::
export PATH=~/circos/circos-0.69-3/bin:$PATH

Circos does, however, require quite a few additional perl modules to opperate correctly. To see what modules are missing and need to be downloaded type the following:
Circos does, however, require quite a few additional perl modules to operate correctly. To see what modules are missing and need to be downloaded type the following:
::
circos -modules > modules

Expand Down Expand Up @@ -64,7 +64,7 @@ And with that, circos should be up and ready to go. Run the example by navigatin

This will take a little bit to run but should generate a file called ``circos.png``. Open it and you can get an idea of the huge variety of things that are possible with circos and a lot of patience. We will not be attempting anything that complex today, however.

Compairing our assembly
Comparing our assembly
=======================
Create a reference database for blastn:
::
Expand Down
2 changes: 1 addition & 1 deletion prokka_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Now it is time to run Prokka! There are tons of different ways to specialize the

This will generate a new folder called ``prokka_annotation`` in which will be a series of files, which are detailed `here <https://github.com/tseemann/prokka/blob/master/README.md#output-files>`__.

In particular, we will be using the ``*.ffn`` file to assess
In particular, we will be using the ``*.ffn`` file to assess the relative read coverage within our metagenomes across the predicted genomic regions.

References
===========
Expand Down
51 changes: 45 additions & 6 deletions salmon_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,19 @@
Gene Abundance Estimation with Salmon
======================================

Salmon is one of a breed of new, very fast RNAseq counting packages. Like Kallisto and Sailfish, Salmon counts fragments without doing up-front read mapping. Salmon can be used with edgeR and others to do differential expression analysis.
Salmon is one of a breed of new, very fast RNAseq counting packages. Like Kallisto and Sailfish, Salmon counts fragments without doing up-front read mapping. Salmon can be used with edgeR and others to do differential expression analysis (if you are quantifying RNAseq data).

Today we will use it to get a handle on the relative distribution of genomic reads across the predicted protein regions.

The goals of this tutorial are to:

* Install salmon
* Use salmon to estimate gene coverage in our metagenome dataset

Installing salmon
Installing Salmon
==================================================

Download and extract the latest version of Salmon and add it to your path:
Download and extract the latest version of Salmon and add it to your PATH:
::
wget https://github.com/COMBINE-lab/salmon/releases/download/v0.7.2/Salmon-0.7.2_linux_x86_64.tar.gz
tar -xvzf Salmon-0.7.2_linux_x86_64.tar.gz
Expand All @@ -29,13 +31,50 @@ Make a new directory for the quantification of data with Salmon:

Grab the nucleotide (``*ffn``) predicted protein regions from Prokka and link them here. Also grab the trimmed sequence data (``*fq``)
::
ln -fs annotation/prokka_annotation *ffn .
ln -fs annotation/prokka_annotation/*ffn .
ln -fs data/*.abundtrim.subset.pe.fq.gz .

Create the salmon index:
::
~/Salmon-0.7.2_linux_x86_64/bin/salmon index -t PROKKA_10112016.ffn -i transcript_index --type quasi -k 31
salmon index -t metag_10112016.ffn -i transcript_index --type quasi -k 31

Salmon requires that paired reads be separated into two files. We can split the reads using the XXX script XXX: *CHECK ME!*
::
for file in *.abundtrim.subset.pe.fq.gz
do
split-reads.py $file
done

Now, we can run our reads against this reference:
::
for file in *1.fq
do
BASE=${file/.1.fq/}
salmon quant -i transcript_index --libType IU \
-1 $BASE.1.fq -2 $BASE.2.fq -o $BASE.quant;

(Note that --libType must come before the read files!)

This will create a bunch of directories named after the fastq files that we just pushed through. Take a look at what files there are within one of these directories: **FIX**
::
find SRR1976948.quant -type f

Working with count data
=======================

Now, the ``quant.sf`` files actually contain the relevant information about expression – take a look:
::
head -10 SRR1976948.quant/quant.sf

The first column contains the transcript names, and the fourth column is what we will want down the road - the normalized counts (TPM). However, they’re not in a convenient location / format for use; let's fix that.

Download the gather-counts.py script:
::
curl -L -O https://github.com/ngs-docs/2016-aug-nonmodel-rnaseq/raw/master/files/gather-counts.py
and run it:

python ./gather-counts.py
This will give you a bunch of .counts files, which are processed from the quant.sf files and named for the directory from which they emanate.

References
===========
Expand Down

0 comments on commit d33e0b1

Please sign in to comment.