Skip to content

semi-reference-based short read compression

License

Notifications You must be signed in to change notification settings

jhidalgo-lopez/quark

 
 

Repository files navigation

Quark

semi-reference-based short read compression

Assumption

The read files are in gzipped format i.e. they should be like .. 1.fastq.gz and 2.fastq.gz

The software is tested on paired end and single end data on bash compatible shell (redirection might not work with fish kind of ad on), single end support will be added to the "quark.sh" script soon.

Dependency

Quark depends on plzip for downstream compression. More information about Plzip and installation guide can be found here.

Compile

$git clone www.github.com/COMBINE-lab/quark.git
$cd quark
$mkdir build
$cd build
$cmake ..
$make
$cd ..

##Running Quark

To see the options

$./quark.sh -h

To build the index with kmer size k

snakemake -s quark.snake make_index --config out="<output dir>" fasta="<fasta file>" kmer=<#k>

To Encode

Single End

snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" r="<mate>" p=<#threads> lib="single" quality=0

Paired end

snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" m1="<mate1>" m2="<mate2>" p=<#threads> lib="paired" quality=0

To Decode

snakemake -s quark.snake decode --config in="<in dir>" out="<out dir>" lib="paired/single" quality=0

To check the encoded and decoded sequences are same !! (it is lossless)

$./check_pair.sh <original left end> <original right end> <quark left end> <quark right end>

Link to the preprint

Quark enables semi-reference-based compression of RNA-seq data by Hirak Sarkar, Rob Patro

About

semi-reference-based short read compression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 96.7%
  • C 1.7%
  • Other 1.6%