Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
documentation Added directory for documentation (PDFs, etc.) Oct 29, 2013
seeds updating old seed filenames with "-number-weight.in" Nov 16, 2017
CMakeLists.txt adding the CMakeLists.txt of Karel Nov 27, 2014
Makefile updating the compiler version Nov 14, 2017
Makefile.avx2 updating the compiler version Nov 14, 2017
Makefile.avx2.icc updating the compiler version Nov 14, 2017
Makefile.avx512bw.icc updating the compiler version Nov 14, 2017
Makefile.sse updating the compiler version Nov 14, 2017
Makefile.sse.icc updating the compiler version Nov 14, 2017
Makefile.sse2 updating the compiler version Nov 14, 2017
Makefile.sse2.icc updating the compiler version Nov 14, 2017
Makefile.win32.sse2 adding some crosscompiling Makefiles : http://sourceforge.net/project… Mar 6, 2015
Makefile.win64.sse2 adding some crosscompiling Makefiles : http://sourceforge.net/project… Mar 6, 2015
README updating samtools examples and command lines Nov 16, 2017
alignment.c updating some static, inline, and double variable that produce some w… Dec 3, 2014
alignment.h updating three line for _mm_cvtsi64_m64 possible replacement on buggy… Mar 6, 2015
alignment_simd.c avx512 score selection by mask (update 2/2) Jan 3, 2016
alignment_simd.h moving some SIMD defined constants in alignment_simd.h, adding one fo… Jan 22, 2015
codes.c updating some static, inline, and double variable that produce some w… Dec 3, 2014
codes.h 0.0093 Jul 22, 2014
doxygenconfig doxygen update v Dec 6, 2014
genome_map.c genome_map cleaning Dec 6, 2014
genome_map.h genome_map cleaning Dec 6, 2014
hit_map.c disabling the star symbol for a set of consecutive mappings for the s… Jan 22, 2015
hit_map.h updating hitmap comments and atomic part Jan 21, 2015
index.c warnings on unused variables, initialisation of structures Nov 30, 2014
index.h warnings on unused variables, initialisation of structures Nov 30, 2014
load_data.c output to stderr warnings/messages/info (second pass) Sep 23, 2014
load_data.h 0.0093 Jul 22, 2014
main.c replacing index,rindex by strchr,strrchr Mar 6, 2015
read_quality.c stop checking maximal bound for quality: assuming being always best o… Dec 6, 2014
read_quality.h adding two 'extern' on some variable in .h files : icc complains abou… Nov 26, 2014
reads_data.c modifying the psort __attribute(weak) to have this behavior on APPLE … Mar 6, 2015
reads_data.h adding a tag to avoid recomputing the maximum/minimum common subword … Sep 12, 2014
reference_data.c output the error/warnings/... messages to stderr only Sep 23, 2014
reference_data.h maj doxygen inside files Jul 23, 2014
run.c cache level update Jan 3, 2016
run.h comments added Dec 6, 2014
seed.c enabling some AVX512BW in run.c ... correct Makefile and correct CPU … Dec 11, 2015
seed.h updating old seed filenames with "-number-weight.in" Nov 16, 2017
stats.c ajout storm 0.0091 Oct 2, 2013
stats.h ajout storm 0.0091 Oct 2, 2013
util.c updating small parts of the code to support non OpenMP compiling Nov 27, 2014
util.h main.c cleaning Nov 27, 2014

README

SToRM - Seed-based Read Mapping tool for SOLiD or Illumina sequencing data

[ABOUT]

 SToRM is a read mapping tool designed first for the SOLiD technology, but
 also available now for the classical Illumina technology. First, SToRM is 
 based on seeding techniques adapted to the statistical characteristics of
 reads: the seeds and the alignment algorithms are designed to comply with
 the properties of the SOLiD color encoding or Illumina substitution model 
 as well as the observed reading error distribution along the read.

 More information at  http://bioinfo.cristal.univ-lille.fr/storm/storm.php


[CONTENTS OF THIS ARCHIVE]

    README
    Makefile
    *.c, *.h - source code
    seeds/   - several seed families adapted for SOLiD/Illumina reads


[COMPILING THE SOURCES]

 Requirements: 
    gcc (>=4.2), OpenMP support, Linux or MacOS

 Command line: 
    "make" / "make storm-color" / "make storm-nucleotide" (multithreaded)



[RUNNING SToRM]

  Command line: 
    "./storm -M 1 -g <path_2_reference.fas> -r <path_2_reads.fastq> [opts]"
    "./storm -h" for full parameter list

   A classic command line:
    "./storm-color      -M 1 -g reference.fas -r reads.cs -o out.sam"
    "./storm-nucleotide -M 1 -g reference.fas -r reads.fq -o out.sam"

  Advanced command line: 

    for "faster" /or/ "more sensitive" mapping :

     [0] obviously, use -M <int> to have all mapable reads being mapped !!

         Otherwise,  only the best read is kept per reference position,
         which implies that not all the reads (duplicate, lower quality
         ones at the same reference position) are mapped.

     [1] use "larger" /or/ "smaller" scoring thresholds
      ...
      -z 150 -t 150
      -z 120 -t 120
      -z 100 -t 100
      ...
      By default, setting the same value for both -z/-t is a good idea,
      unless you have ">15 indels" or "SOLiD color correction" reasons.
      ...
      -z 105 -t 120
      -z 85 -t 100
      -z 75 -t 90
      ...
     
      note that E-value "may be somehow" checked with the ALP tool:
         http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html_ncbi/html/software/program.html?uid=6


     [2] "increase" /or/ "decrease" the number of indels :
      ...
      -i 1
      -i 3 (default value)
      -i 7
      -i 15 
         for full SIMD support.
      ...
      -i 20
      ...
         for final alignment support only : in that case, the SIMD code
         generates a warning explaining that it will compute at most 15
         diagonals : you can thus change the SIMD threshold -z <int> to
         a smaller value than the final alignment threshold -t <int>.
         (see [1] before)

     [3] use "less" /or/ "more" seeds of "large" /or/ "small" weight:  
      ...
      -s "####-##-#-####;###-#--##-##--###;####-#---#----#--####"
      -s "##-##-###-###;###--#-#--#--####;####--#--#---#---###
      ...

      note first that iedera can help you in the seed design :
         http://bioinfo.cristal.univ-lille.fr/yass/iedera.php
             ./iedera -spaced -n 3 -w 11,11 -s 11,22 -r 10000 -k -l 40
             ./iedera -spaced -n 3 -w 10,10 -s 10,20 -r 10000 -k -l 40
      => you can send requests to Laurent Noé for specific seeding

      note also that "chain processing" can be now applied in SToRM :
        unmmaped reads from the first step can be mapped in extra steps
        ...
        "./storm-nucleotide -M 1 -s <...   big seed(s) ...> -g ref.fas -r reads.fq -o out.sam -O reads2.fq"
        "./storm-nucleotide -M 1 -s <... small seed(s) ...> -g ref.fas -r reads2.fq -o out2.sam -O reads3.fq"
        ...

        e.g. an example for single seed:
          "###-##--#-##-#-###" (weight 12)
               -then-> 
          "####-#-#--##-###"   (weight 11)
               -then->
          "#####-##-###"       (weight 10)
            
        theses 3 seeds are generated with :

          ./iedera -spaced -l 50 -w 12,12 -s 12,24 -r 10000 -k   
             ->  "###-##--#-##-#-###" 
          ./iedera -spaced -l 45 -w 11,11 -s 11,22 -r 10000 -k -mx "###-##--#-##-#-###"
             ->  "####-#-#--##-###"
          ./iedera -spaced -l 40 -w 10,10 -s 10,20 -r 10000 -k -mx "###-##--#-##-#-###,####-#-#--##-###"
             ->  "#####-##-###"

     [4] "use" the -l option with smaller value (e.g. -l 10) /or/ "use"
          with bigger value (e.g. -l 1000)/"disable" with "<= 0" value

     [5] check your quality and change the -b 33 to -b 0 to disable the 
         score scaling algorithm, or to -b 64 (for Illumina 1.5 to 1.7) 
         accordingly ...


[READING THE RESULT]

 SToRM generates its output in the SAM format. To visualise the result, we 
 recommend use of the SAMtools package (http://samtools.sourceforge.net/ - http://www.htslib.org/).

 For both a SToRM output file "out.sam" and a reference file "ref.fas", the 
 samtools command lines usually are:


     samtools faidx ref.fas

     samtools view -bT ref.fas out.sam > out.bam

     ##
     ## Only when needed, merge several bam files into one single : 
     samtools merge -f out.bam   out{1}.bam out{2}.bam ... out{42}.bam

     ##
     ## Only if the "-M <int>" option is set: the "out.sam"/"out.bam"
     ## are "read sorted" and not "ref sorted" in that case, so : 
     samtools sort out.bam -o out_sorted.bam

     samtools index out_sorted.bam

     samtools tview out_sorted.bam ref.fas


 Please refer to the SAMtools documentation for more usage details.