Obgraph is a simple Python package for creating sequence-graph from a reference and a set of variants.
An Obgraph is a graph with "dummy" nodes for all deleted sequences, meaning that every insertion and deletion is represented with two nodes: One empty node and one node for the sequence (inserted sequence or reference sequence in case of a deletion). This means that any variant always can be represented with two nodes, which is a format suitable for e.g. genotyping.
- Clone the repository
cd obgraph
python3 -m pip install .
obgraph make -r ref.fa -c [chromosome] -v variants.vcf -o mygraph
This changes the graph inplace, and does not create a new graph.
obgraph add_allele_frequencies -c [chromosome] -v variants.vcf -g mygraph
Note: Make sure graphs are sent as input in correct chromosome order.
obgraph merge_graphs -o whole_genome -g graph_chr1 graph_chr2 ...
This package can also make a genotype matrix from a vcf file containing genotype information. The matrix is a basically a numpy matrix with variants as columns and individuals as rows, and can be used to easily query individuals with given genotypes at variant sites.
obgraph make_genotype_matrix -t N_THREADS -g graph -v variants.vcf -o genotype_matrix -n N_INDIVIDUALS -m N_VARIANTS
from obgraph import Graph
graph = Graph.from_file("graph")
# Get edges
edges = graph.ged_edges(1) # node id 1, note that all node IDs are numeric