Skip to content

Commit

Permalink
Merge branch 'ha_work' of github.com:ngs-docs/2016-metagenomics-sio
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Oct 12, 2016
2 parents 61c4956 + 6ece531 commit ab7d9ba
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 3 deletions.
59 changes: 59 additions & 0 deletions gather-counts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#! /usr/bin/env python
"""
This script gathers & converts Salmon output counts into something that
edgeR can read ("counts files").
Run it in a directory above all of your Salmon output directories, and
it will create a bunch of '.counts' files that you can load into R.
See https://github.com/ngs-docs/2015-nov-adv-rna/ for background info.
C. Titus Brown, 11/2015
"""
import os, os.path
import sys
import csv

def process_quant_file(root, filename, outname):
"""
Convert individual quant.sf files into .counts files (transcripts\tcount).
"""
print >>sys.stderr, 'Loading counts from:', root, filename
outfp = open(outname, 'w')
print >>outfp, "transcript\tcount"

d = {}
full_file = os.path.join(root, filename)
for line in open(full_file):
if line.startswith('Name'):
continue
name, length, eff_length, tpm, count = line.strip().split('\t')

print >>outfp, "%s\t%s" % (name, float(tpm))


def main():
"""
Find all the quant.sf files, convert them into properly named .counts
files.
Here, "proper name" means "directory.counts".
"""
quantlist = []

start_dir = '.'
print >>sys.stderr, 'Starting in:', os.path.abspath(start_dir)
for root, dirs, files in os.walk('.'):
for filename in files:
if filename.endswith('quant.sf'):
dirname = os.path.basename(root)
outname = dirname + '.counts'
process_quant_file(root, filename, dirname + '.counts')
quantlist.append(outname)

break

print ",\n".join([ "\"%s\"" % i for i in sorted(quantlist)])

if __name__ == '__main__':
main()
4 changes: 2 additions & 2 deletions prokka_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ Make a new directory for the annotation:
mkdir annotation
cd annotation

Gunzip the metagenome assembly file into this directory:
Link the metagenome assembly file into this directory:
::
gunzip subset_assembly.fa.gz
ln -fs /mnt/assembly/combined/final.contigs.fa

Now it is time to run Prokka! There are tons of different ways to specialize the running of Prokka. We are going to keep it simple for now, though. It will take a little bit to run.
::
Expand Down
3 changes: 2 additions & 1 deletion salmon_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Create the salmon index:

Salmon requires that paired reads be separated into two files. We can split the reads using the XXX script XXX: *CHECK ME!*
::
for file in *.abundtrim.subset.pe.fq.gz
for file in *.abundtrim.subset.pe.fq
do
split-reads.py $file
done
Expand Down Expand Up @@ -74,6 +74,7 @@ Download the gather-counts.py script:
and run it:

python ./gather-counts.py

This will give you a bunch of .counts files, which are processed from the quant.sf files and named for the directory from which they emanate.

References
Expand Down

0 comments on commit ab7d9ba

Please sign in to comment.