Manual for Tn-seq: A pipeline for processing next-generation DNA sequence files (fastq files) generated by Tn-seq methods (transposon-insertion sequencing), a powerful technique for quantitatively profiling complex populations of transposon mutant bacteria (e.g., Gallagher et al. 2011. mBio.00315-10)
Overview
The Tn-seq pipeline is run using two master scripts. The first step generates lists of reads per location and can be run seperately for multiple Tn-seq runs. The second step annotates the locations and tabulates hits per gene and can incorporate input files from multiple Tn-seq runs for comparison.
Software requirements
Python - we use version 2.6.6 but any 2.x version above that should work.
Read mapping software: BWA (tested with version 0.7.4) or Bowtie (tested with version 0.12.7)
Running the scripts
Step 1: mapping (process_map.py)
Usage: process_map.py [options] firstend_fastq_1 index_fastq_1 secondend_fastq_1 ...
Options:
-h, --help show this help message and exit
-r, --reffile path to reference genome fasta
-j, --tn_verify_by_read1 use 1st-end read to verify transposon end (default: False)
-i, --tn_verify_by_index use index read to verify transposon end (default: False)
-t, --tn_end expected transposon end sequence
-d, --demux_index use index read to demultiplex (default: False)
-e, --demux_read2 use 2nd end read to demultiplex (default: False)
-b, --barcodefile path to file listing expected barcode sequences
-c, --chastity run chastity filter (default: False)
-n, --normfactor read count normalization factor (default: 10,000,000)
(0 = don't normalize)
-s, --merge_slipped merge slipped reads (default: False)
-u, --use_bowtie map reads using Bowtie (default: use BWA)
-w, --workingdir working directory for input and output files (default: work)
Step 2: annotating (process_annotate_tablulate.py)
Usage: process_annotate_tabulate.py [options] reads_list_1 reads_list_2 ...
Options:
-h, --help show this help message and exit
-r, --reffile path to reference genome fasta
-a, --annofiles path to reference .ptt annotation file(s) (comma-separated list if
using more than one; order must match sequences in reference fasta)
-o, --outfile_anno path to final annotated output file (default: work/AnnotatedHits.txt)
-p, --outfile_tab path to final counts tabulated by gene (default: work/HitsPerGene.txt)
-w, --workingdir working directory for input and output files (default: work)
Examples
python process_map.py --barcodefile barcodes.txt --chastity --demux_read2 --tn_verify_by_index --reffile combined.fna --tn_end AGACAG --workingdir work r1.fq ind.fq r2.fq
python process_annotate_tabulate.py --annofiles CP000086.ptt,CP000085.ptt --reffile combined.fna --workingdir work work/r1_ch_iPass_ACGTGA_sum_norm.txt work/r1_ch_iPass_CTAGTG_sum_norm.txt work/r1_ch_iPass_GATCAC_sum_norm.txt work/r1_ch_iPass_TGCACT_sum_norm.txt
Additional notes
Before running the scripts for the first time, check common.py to make sure paths and constants are correct for your environment. For example, you may need to change the path to the BWA executable.
The comma-separated list of .ptt annotation files should not have spaces between the files (only a comma).
If your reference genome contains multiple replicons, combine their fasta files into a single fasta before running this software. The order of the sequences should be the same as the order of the comma-separated list of .ptt files.
For best results, the header lines of your combined fasta file should have simple names, such as the accession number of the replicon. BWA parses these headers and a simple header will be the most compatible.