-
Notifications
You must be signed in to change notification settings - Fork 13
Home
To checkout the code, you will need git http://git-scm.com/
On Ubuntu, run: sudo apt-get install git
To clone the repository: git clone git@github.com:mozack/ubu.git
To build, you will need maven: http://maven.apache.org/
On Ubuntu, run: sudo apt-get install maven2
NOTE: The HEAD may be unstable. Unless you have special circumstances please download a pre-built jar
- sam-xlate - Translate from genome to transcriptome coordinates
- sam-diff - Diff two SAM/BAM files outputting discrepant reads in corresponding SAM/BAM files
- sam-filter - Filter reads from a paired end SAM or BAM file (only outputs paired reads)
- sam-summary - Output summary statistics per reference for a SAM/BAM file (Aligned reads only).
- sam-convert - Convert SAM/BAM file content (i.e. convert quality from phred64 to phred33)
- sam-junc - Count splice junctions in a SAM or BAM file
- fastq-format - Format a single FASTQ file (clean up read ids and/or convert quality scoring)
- sam2fastq - Convert a SAM/BAM file to FASTQ. (Tested against paired end Mapsplice output)
Run java -jar ubu.jar for an up to date list of actions available.
Run java -jar ubu.jar [action] for usage details on each action.
512 MB of RAM should be more than enough for the above commands with the exception of sam-xlate. sam-xlate has been tested against hg19 using 3GB of RAM.
Output reads specific to first.bam in out1.bam. Output reads specific to second.bam in out2.bam
java -Xmx512M -jar ubu.jar sam-diff --in1 first.bam --in2 second.bam --out1 out1.bam --out2 out2.bam
Output summary information about alignments.bam (Aligned bases, NM count, error rate, reads, zero mapping quality stats)
java -Xmx512M -jar ubu.jar sam-summary --header --in alignments.bam --out summary.txt
Filter reads from alignments.bam with mapping quality < 1, insert length > 10000 or containing indels. Output remaining reads in filtered.bam
java -Xmx512M -jar ubu.jar sam-filter --in alignments.bam --mapq 1 --max-insert 10000 --strip-indels --out filtered.bam
Translate from genome to transcriptome coordinates
java -Xmx3G -jar ubu.jar sam-xlate --bed hg19.bed --order hg19_M_rCRS_ref.transcripts.fa --in sorted_by_ref_and_name.bam --xgtags
--reverse --out translated.bam
Sample bed file: https://raw.github.com/mozack/ubu/master/src/test/java/edu/unc/bioinf/ubu/sam/testdata/unc_hg19.bed
Convert phred64.bam from phred64 to phred33 storing output in phred33.bam
java -Xmx512M -jar ubu.jar sam-convert --in phred64.bam --out phred33.bam --phred64to33
Convert from phred33 to phred64, strip whitespace (and following characters) from read id, and append /1 to read id (if necessary). Input is read from 1.fastq and output is written to formatted_1.fastq.
java -Xmx512M -jar ubu.jar fastq-format --in 1.fastq --out formatted_1.fastq --phred33to64 --strip --suffix /1
Convert the input BAM file (sorted by name) to FASTQ. Sort by name is necessary to handle multi-mapped reads.
java -Xmx512M -jar ubu.jar sam2fastq --in sorted_by_name.bam --fastq1 1.fastq --fastq2 2.fastq --end1 /1 --end2 /2
- SamReadPairReader - Provides ability to iterate over paired reads in a BAM file. Accounts for multiple mappings.
- SamMultiMappingReader - Provides the ability to iterate over lists of reads in a BAM file containing multiple mappings.
lmose at unc dot edu