Skip to content

Bioinformatics perl module for minimizers

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

lskatz/Bio--Minimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

Bio::Minimizer - minimizer package

Based on the ideas put forth by Roberts et al 2004: https://academic.oup.com/bioinformatics/article/20/18/3363/202143

SYNOPSIS

my $minimizer = Bio::Minimizer->new($sequenceString);
my $kmers     = $minimizer->{kmers};     # hash of minimizer => kmer
my $minimizers= $minimizer->{minimizers};# hash of kmer => [minimizer1, minimizer2,...]

# hash of minimizer => [start1,start2,...] 
# Start coordinates are on the fwd strand even when
# matched against the rev strand.
my $starts    = $minimizer->{starts}; 

# With more options
my $minimizer2= Bio::Minimizer->new($sequenceString,{k=>31,l=>21});

DESCRIPTION

Creates a set of minimizers from sequence

EXAMPLES

example: Sort a fastq file by minimizer, potentially shrinking gzip size.

This is implemented in this package's scripts/sort*.pl scripts.

use Bio::Minimizer

# Read fastq file via stdin, in this example
while(my $id = <>){
  # Grab an entry
  ($seq,$plus,$qual) = (scalar(<>), scalar(<>), scalar(<>)); 
  chomp($id,$seq,$plus,$qual); 

  # minimizer object
  $MINIMIZER = Bio::Minimizer->new($seq,{k=>length($seq)}); 
  # The only minimizer in this entry because k==length(seq)
  $minMinimizer = (values(%{$$MINIMIZER{minimizers}}))[0]; 

  # combine the minimum minimizer with the entry, for
  # sorting later.
  # Save the entry as a string so that we don't have to
  # parse it later.
  my $entry = [$minMinimizer, "$id\n$seq\n$plus\n$qual\n"];
  push(@entry,$entry);
}

for my $e(sort {$$a[0] cmp $$b[0]} @entry){
  print $$e[1];
} 

METHODS

Bio::Minimizer->new()
Arguments:
  sequence     A string of ACGT
  settings     A hash
    k          Kmer length
    l          Minimizer length (some might call it lmer)
    numcpus    Number of threads to use. (not used)

About

Bioinformatics perl module for minimizers

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages