Skip to content

jpritt/boiler

Repository files navigation

compress-alignments

You can find the full manual as well as a simple tutorial at http://boiler.readthedocs.io/.

'boiler.py' is the main script that runs compression and decompression. Python 3 is required to run Boiler. The input SAM file must be sorted by read start position. To compress, run the following:.

./boiler.py compress [--frag-len-z-cutoff 0.125] [--split-discordant] [--split-diff-strands] [--preprocess tophat | stringtie] path/to/alignments.sam path/to/compressed.bl

--frag-len-z-cutoff sets the z-score for paired-end read lengths at which to set the cutoff for placing mates in different bundles. 0.125 seems to be a good z-score. Alternatively, you can use --frag-len-cutoff to set the cutoff directly. If --split-discordant is present, discordant reads will be treated as unpaired reads. If --split-diff-strands is present, reads with contradicting XS values will be treated as unpaired reads.

To decompress, run the following:

./boiler.py decompress [--force-xs] path/to/compressed.bl path/to/expanded.sam

--force-xs will assign XS tags to all spliced reads, as required by Cufflinks. If spliced reads are found with XS tags, they will be assigned at random. The decompressed SAM file will appear in the given directory, named expanded.sam.

To sort and convert to BAM, run:

samtools view -bS expanded.sam | samtools sort - expanded

To compare 2 cufflinks files, run:

./compareGTFs.py transcripts1.gtf transcripts2.gtf

To query a compressed file for bundles, coverage, or reads:

./boiler.py query [--bundles | --coverage | --reads] --chrom c [--start s] [--end e] path/to/compressed.bl path/to/output

If no output argument is provided, standard output will be used If --start or --end is absent, Boiler will use the beginning or end of the chromsome as bounds on the query.

About

Boiler: a software tool for highly efficient, lossy compression of RNA-seq alignments

Resources

License

Stars

Watchers

Forks

Packages

No packages published