Skip to content

Fast Index search across a large collection of genomic files (BigWigs or BigBeds)

License

Notifications You must be signed in to change notification settings

epiviz/EpivizQuindex

Repository files navigation

Project generated with PyScaffold

EpivizQuindex

Genomic analysis pipelines and workflows often use specialized file formats for manipulating and quickly finding data on potential genomic regions of interest. These file formats contain an index as part of the specification and allows users to perform random access queries. When we have a collection of these files, it's time consuming to read every single file and extract the data for a region of interest. The goal with Quindex approach is to "index the index" from these files and provide fast access to large collections of genomic data across files.

Usage

To import the package, simply run:

from epivizquindex import EpivizQuindex

Create the index

Define the genome range, and set the path to a folder where you want to hold the index:

base_path should be a folder. If the path does not exist, Quindex will create the path.

Add files to index with a simple function call:

f1 = "/path_to_your_file/some.bigwig"
f2 = "/path_to_your_file/someOther.bigwig"
# adding file to index
index.add_to_index(f1)
index.add_to_index(f2)

Performe in-memory query

Once the index is created, invoke the query in a specific chromosome and range:

index.query("chr2", 0, 900000)

You can also specify which file you are looking for:

index.query("chr2", 0, 900000, file = f1)

Store and load computed index to disk

Store the index to disk and load index to memory with to_disk() and from_disk(). The path is the base_path parameter when creating the index.

# storing the precomputed index
index.to_disk()
# reading a precomputed index
index = EpivizQuindex.EpivizQuindex(genome, base_path=base_path)
index.from_disk()

Perform search without loading

We can also perform search without loading the index to memory:

memory = False
index = EpivizQuindex.EpivizQuindex(genome, base_path=base_path)
index.from_disk(load = memory)
index.query("chr2", 0, 900000, in_memory = memory)

Note

This project has been set up using PyScaffold 4.2.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

About

Fast Index search across a large collection of genomic files (BigWigs or BigBeds)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published