Skip to content

Commit

Permalink
Merge branch 'devel' into devel-merge
Browse files Browse the repository at this point in the history
Conflicts:
- bcftbx/JobRunner.py: fixed version number in devel version.
  • Loading branch information
pjbriggs committed Apr 15, 2015
2 parents 621b766 + 223e9eb commit 39ff057
Show file tree
Hide file tree
Showing 99 changed files with 6,853 additions and 442 deletions.
133 changes: 114 additions & 19 deletions NGS-general/README.markdown
Expand Up @@ -3,48 +3,130 @@ NGS-general

General NGS scripts that are used for both ChIP-seq and RNA-seq.

* `boxplotps2png.sh`: make PNGs of PS plots from `qc_boxplotter.sh`
* `explain_sam_flag.sh`: decodes bit-wise flag from SAM file
* `qc_boxplotter.sh`: generate QC boxplot from SOLiD qual file
* `extract_reads.py`: write out subsets of reads from input data files
* `fastq_edit.py`: edit FASTQ files and data
* `fastq_sniffer.py`: "sniff" FASTQ file to determine quality encoding
* `manage_seqs.py`: handling sets of named sequences (e.g. FastQC contaminants file)
* `SamStats`: counts uniquely map reads per chromosome/contig
* `splitBarcodes.pl`: separate multiple barcodes in SOLiD data
* `remove_mispairs.pl`: remove "singleton" reads from paired end fastq
* `remove_mispairs.py`: remove "singleton" reads from paired end fastq
* `sam2soap.py`: convert from SAM file to SOAP format
* `separate_paired_fastq.pl`: separate F3 and F5 reads from fastq
* `split_fasta.py`: extract individual chromosome sequences from fasta file
* `trim_fastq.pl`: trim down sequences in fastq file from 5' end
* `uncompress_fastqgz.sh`: create ungzipped version of a compressed FASTQ file


boxplotps2png.sh
----------------
Utility to generate PNGs from PS boxplots produced from `qc_boxplotter.sh`.
explain_sam_flag.sh
-------------------
Convert a decimal bitwise SAM flag value to binary representation and
interpret each bit.

Usage:

boxplotps2png.sh BOXPLOT1.ps [ BOXPLOT2.ps ... ]
extract_reads.py
----------------

Outputs:
Usage: `extract_reads.py OPTIONS infile [infile ...]`

PNG versions of the input postscript files as BOXPLOT1.png, BOXPLOT2.png etc.
Extract subsets of reads from each of the supplied files according to
specified criteria (e.g. random, matching a pattern etc). Input files can be
any mixture of FASTQ (.fastq, .fq), CSFASTA (.csfasta) and QUAL (.qual).
Output file names will be the input file names with '.subset' appended.

Options:

explain_sam_flag.sh
-------------------
Convert a decimal bitwise SAM flag value to binary representation and
interpret each bit.
--version show program's version number and exit
-h, --help show this help message and exit
-m PATTERN, --match=PATTERN
Extract records that match Python regular expression
PATTERN
-n N Extract N random records from the input file(s)
(default 500). If multiple input files are specified,
the same subsets will be extracted for each.


qc_boxplotter
fastq_edit.py
-------------
Generates a QC boxplot from a SOLiD .qual file.

Usage: `fastq_edit.py [options] <fastq_file>`

Perform various operations on FASTQ file.

Options:

--version show program's version number and exit
-h, --help show this help message and exit
--stats Generate basic stats for input FASTQ
--instrument-name=INSTRUMENT_NAME
Update the 'instrument name' in the sequence
identifier part of each read record and write updated
FASTQ file to stdout


fastq_sniffer.py
----------------

Usage: `fastq_sniffer.py <fastq_file>`

"Sniff" FASTQ file to try and determine likely format and quality encoding.

Attempts to identify FASTQ format and quality encoding, and suggests likely datatype
for import into Galaxy.

Use the `--subset` option to only use a subset of reads from the file for the type
determination (using a smaller set speeds up the process at the risk of not being able
to accuracy determine the encoding convention).

See [http://en.wikipedia.org/wiki/FASTQ_format]() for information on the different
quality encoding standards used in different FASTQ formats.

Options:

--version show program's version number and exit
-h, --help show this help message and exit
--subset=N_SUBSET try to determine encoding from a subset of consisting of
the first N_SUBSET reads. (Quicker than using all reads
but may not be accurate if subset is not representative
of the file as a whole.)


manage_seqs.py
--------------

Read sequences and names from one or more INFILEs (which can be a mixture of
FastQC 'contaminants' format and or Fasta format), check for redundancy (i.e.
sequences with multiple associated names) and contradictions (i.e. names with
multiple associated sequences).

Usage:

qc_boxplotter.sh <solid.qual>
manage_seqs.py OPTIONS FILE [FILE...]

Outputs:
To append a

Options:

--version show program's version number and exit
-h, --help show this help message and exit
-o OUT_FILE write all sequences to OUT_FILE in FastQC 'contaminants'
format
-a APPEND_FILE append sequences to existing APPEND_FILE (not compatible
with -o)
-d DESCRIPTION supply arbitrary text to write to the header of the output
file

Intended to help create/update files with lists of "contaminant" sequences to
input into the `FastQC` program (using `FastQC`'s `--contaminants` option).

Two files (PostScript and PDF format) with the boxplot, called
`<solid.qual>_seq-order_boxplot.ps` and `<solid.qual>_seq-order_boxplot.pdf`
To create a contaminants file using sequences from a FASTA file do e.g.:

% manage_seqs.py -o custom_contaminants.txt sequences.fa

To append sequences to an existing contaminants file do e.g.

% manage_seqs.py -a my_contaminantes.txt additional_seqs.fa


SamStats
Expand Down Expand Up @@ -119,3 +201,16 @@ by the user.
Usage:

trim_fastq.pl <single end FASTQ> <desired length>


uncompress_fastqz.sh
--------------------
Create uncompressed copies of fastq.gz file (if input is fastq.gz).

Usage:

uncompress_fastqgz.sh <fastq>

`<fastq>` can be either fastq or fastq.gz file.

The original file will not be removed or altered.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
17 changes: 7 additions & 10 deletions utils/uncompress_fastqgz.sh → NGS-general/uncompress_fastqgz.sh
Expand Up @@ -15,18 +15,15 @@ function usage() {
echo "The original file will not be removed or altered."
echo ""
}
export PATH=$PATH:$(dirname $0)/../share
function import_functions() {
if [ -z "$1" ] ; then
echo ERROR no filename supplied to import_functions >2
else
if [ -f $1 ] ; then
# Import local copy
echo Sourcing `pwd`/$1
. $1
else
# Import version in share
echo Sourcing `dirname $0`/../share/$1
. `dirname $0`/../share/$1
echo Sourcing $1
. $1
if [ $? -ne 0 ] ; then
echo ERROR failed to import $1 >2
fi
fi
}
Expand All @@ -36,10 +33,10 @@ function import_functions() {
#===========================================================================
#
# General shell functions
import_functions functions.sh
import_functions bcftbx.functions.sh
#
# NGS-specific functions
import_functions ngs_utils.sh
import_functions bcftbx.ngs_utils.sh
#
#===========================================================================
# Main script
Expand Down
File renamed without changes.
11 changes: 4 additions & 7 deletions QC-pipeline/fastq_screen.sh
Expand Up @@ -58,13 +58,10 @@ if [ "$datadir" == "." ] ; then
fi
#
# Set up environment
QC_SETUP=`dirname $0`/qc.setup
if [ -f "${QC_SETUP}" ] ; then
echo Sourcing qc.setup to set up environment
. ${QC_SETUP}
else
echo WARNING qc.setup not found in `dirname $0`
fi
export PATH=$(dirname $0)/../share:${PATH}
. bcftbx.functions.sh
. bcftbx.ngs_utils.sh
import_qc_settings
#
# Set the programs
# Override these defaults by setting them in qc.setup
Expand Down
14 changes: 4 additions & 10 deletions QC-pipeline/fastq_stats.sh
Expand Up @@ -12,15 +12,9 @@
# specified <stats_file>
#
# Import function libraries
if [ -f functions.sh ] ; then
# Import local copies
. functions.sh
. lock.sh
else
# Import versions in share
. `dirname $0`/../share/functions.sh
. `dirname $0`/../share/lock.sh
fi
export PATH=$(dirname $0)/../share:${PATH}
. bcftbx.functions.sh
. bcftbx.lock.sh
#
# Local functions
#
Expand Down Expand Up @@ -118,4 +112,4 @@ else
fi
fi
##
#
#
25 changes: 15 additions & 10 deletions QC-pipeline/filtering_stats.sh
Expand Up @@ -12,15 +12,20 @@
# specified <stats_file>
#
# Import function libraries
if [ -f functions.sh ] ; then
# Import local copies
. functions.sh
. lock.sh
else
# Import versions in share
. `dirname $0`/../share/functions.sh
. `dirname $0`/../share/lock.sh
fi
export PATH=$PATH:$(dirname $0)/../share
function import_functions() {
if [ -z "$1" ] ; then
echo ERROR no filename supplied to import_functions >2
else
echo Sourcing $1
. $1
if [ $? -ne 0 ] ; then
echo ERROR failed to import $1 >2
fi
fi
}
import_functions bcftbx.functions.sh
import_functions bcftbx.lock.sh
#
# Local functions
#
Expand Down Expand Up @@ -98,4 +103,4 @@ else
echo Unable to get lock on ${stats_file}
fi
##
#
#
27 changes: 9 additions & 18 deletions QC-pipeline/illumina_qc.sh
Expand Up @@ -34,18 +34,15 @@ function usage() {
echo " available to the script (default"
echo " is N=1)"
}
export PATH=$PATH:$(dirname $0)/../share
function import_functions() {
if [ -z "$1" ] ; then
echo ERROR no filename supplied to import_functions >2
else
if [ -f $1 ] ; then
# Import local copy
echo Sourcing `pwd`/$1
. $1
else
# Import version in share
echo Sourcing `dirname $0`/../share/$1
. `dirname $0`/../share/$1
echo Sourcing $1
. $1
if [ $? -ne 0 ] ; then
echo ERROR failed to import $1 >2
fi
fi
}
Expand All @@ -67,13 +64,13 @@ fi
#===========================================================================
#
# General shell functions
import_functions functions.sh
import_functions bcftbx.functions.sh
#
# NGS-specific functions
import_functions ngs_utils.sh
import_functions bcftbx.ngs_utils.sh
#
# Program version functions
import_functions versions.sh
import_functions bcftbx.versions.sh
#
#===========================================================================
# Main script
Expand Down Expand Up @@ -127,13 +124,7 @@ done
datadir=`dirname $FASTQ`
#
# Set up environment
QC_SETUP=`dirname $0`/qc.setup
if [ -f "${QC_SETUP}" ] ; then
echo Sourcing qc.setup to set up environment
. ${QC_SETUP}
else
echo WARNING qc.setup not found in `dirname $0`
fi
import_qc_settings
#
# Get the data directory i.e. location of the input file
datadir=`dirname $FASTQ`
Expand Down
File renamed without changes.
6 changes: 4 additions & 2 deletions QC-pipeline/qcreporter.py
Expand Up @@ -15,7 +15,7 @@
"""

__version__ = "0.2.1"
__version__ = "0.2.2"

#######################################################################
# Import modules that this module depends on
Expand Down Expand Up @@ -124,15 +124,17 @@
status = qcreporter.verify()
if not status:
logging.error("QC failed for one or more samples in %s" % d)
sys.exit(1)
else:
print "QC verified for all samples in %s" % d
else:
logging.error("QC failed: no samples identified in %s" % d)
sys.exit(1)
else:
if qcreporter.samples:
# Generate report
print "Generating report for %s" % d
qcreporter.zip()
else:
logging.error("No samples identified in %s" % d)

sys.exit(1)
2 changes: 2 additions & 0 deletions QC-pipeline/run_qc_pipeline.py
Expand Up @@ -247,6 +247,8 @@ def SendEmail(subject,recipient,message):
elif options.runner == 'ge':
if options.ge_args:
ge_extra_args = str(options.ge_args).split(' ')
else:
ge_extra_args = None
runner = JobRunner.GEJobRunner(queue=options.ge_queue,
ge_extra_args=ge_extra_args)
elif options.runner == 'drmaa':
Expand Down

0 comments on commit 39ff057

Please sign in to comment.