Skip to content

jandot/locustree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocusTree – Ruby library to search genomic loci

The LocusTree library helps inworking with large numbers of genomic features. It
has two clear uses.

Fast searching

The naive way of fetching all features in a given genomic locus is to go through
all of them and check whether they are in the target locus or not. This works
fine for smaller datasets, but is very slow for very large ones.

There are several ways for speeding this up. One is using a binary search on the
sorted features; another is to use a tree. LocusTree uses this second approach.
A whole chromosome is split up in, say, 100 chunks. Each of these chunks is split
again in 100 parts, and so on. To get all features in a given region, we can
do a top-down search. We check all 100 of the nodes at the top level that overlap
with the target locus. For those that match we check the 100 subnodes, and so on.

Aggregation

For some purposes we don’t want to get the results at the highest resolution.
This is especially true for genome browsers. Suppose you want to display 15
mission SNPs on a genome and your display is 800 pixels wide. The simplest way
of doing this is to take each of the 15 million SNPs, calculate it’s pixel
position and draw it. But this means doing 15 million fetches/calculations to
draw only 800 datapoints. Using LocusTree, we can select that level of the tree
that has about 800 nodes. If, while building the tree, we added a value to each
node that states how many SNPs are within that node and its subnodes, we only
have to fetch 800 datapoints and show the SNP density for each of them.

More information

For more information, see the project wiki page

Usage

require 'locus_tree'
container = LocusTree::Container.new(100, 'genes.txt.index100', 'genes.txt')
positive_nodes = container.query('2', 143570750, 143570790, 10)
positive_nodes.each do |node|
  puts node.to_s + "\t" + node.value.to_s
end
puts container.query_single_bin('24', 49_050, 49_500).to_s

About

Ruby library to search genomic loci

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published