Genomic analysis pipelines and workflows often use specialized file formats for manipulating and quickly finding data on potential genomic regions of interest. These file formats contain an index as part of the specification and allows users to perform random access queries. When we have a collection of these files, it's time consuming to read every single file and extract the data for a region of interest. The goal with Quindex approach is to "index the index" from these files and provide fast access to large collections of genomic data across files.
To import the package, simply run:
from epivizquindex import EpivizQuindex
Define the genome range, and set the path to a folder where you want to hold the index:
base_path should be a folder. If the path does not exist, Quindex will create the path.
Add files to index with a simple function call:
f1 = "/path_to_your_file/some.bigwig"
f2 = "/path_to_your_file/someOther.bigwig"
# adding file to index
index.add_to_index(f1)
index.add_to_index(f2)
Once the index is created, invoke the query in a specific chromosome and range:
index.query("chr2", 0, 900000)
You can also specify which file you are looking for:
index.query("chr2", 0, 900000, file = f1)
Store the index to disk and load index to memory with to_disk()
and from_disk()
. The path is the base_path
parameter when creating the index.
# storing the precomputed index
index.to_disk()
# reading a precomputed index
index = EpivizQuindex.EpivizQuindex(genome, base_path=base_path)
index.from_disk()
We can also perform search without loading the index to memory:
memory = False
index = EpivizQuindex.EpivizQuindex(genome, base_path=base_path)
index.from_disk(load = memory)
index.query("chr2", 0, 900000, in_memory = memory)
This project has been set up using PyScaffold 4.2.3. For details and usage information on PyScaffold see https://pyscaffold.org/.