Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

sguizard/slaMEM

 
 

Repository files navigation

slaMEM

slaMEM is a tool used to efficiently retrieve MEMs (Maximal Exact Matches) between a reference genome sequence and one or more query sequences, similarly to these software tools:

slaMEM relies on an FM-Index together with a new data structure called SSILCP (Sampled Search Intervals from Longest Common Prefixes) to store information about parent intervals in a time- and space-efficient way.

slaMEM also includes an useful feature to display the locations of the found MEMs, generating images like the one below.

MEMs of 57 E.coli strains

Reference

If you use slaMEM, please cite:

Fernandes, Francisco and Ana T. Freitas. slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics 30.4 (2014): 464-471.

Manual

Install

make

Usage

./slaMEM (<options>) <reference_file> <query_file(s)>
Options:
  • mem : find MEMs: any number of occurrences in both ref and query (default)
  • mam : find MAMs: unique in ref but any number in query
  • l : minimum match length (default=20)
  • o : output file name (default="*-mems.txt")
  • b : process both forward and reverse strands
  • n : discard 'N' characters in the sequences
  • m : minimum sequence size (e.g. to ignore small scaffolds)
  • r : load only the reference(s) whose name(s) contain(s) this string
Extra:
  • v : generate MEMs map image from this MEMs file
Example:
./slaMEM -b -l 10 ./ref.fna ./query.fna
./slaMEM -v ./ref-mems.txt ./ref.fna ./query.fna

About

Finding Maximal Exact Matches (MEMs) using a Sampled LCP Array

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 99.0%
  • Other 1.0%