GitHub - lh3/sdg: EXPERIMENTAL implementation of side graph

Side graph

Informally, for a base in a DNA sequence, a side is either the 5'-end or the 3'-end of the base. Each base has two sides. When we join sides in a set of sequences with edges, we get a side graph. A side graph is (implicitly) bidirected, encoding both strands of DNA sequences. It is equivalent to a generic bidirected/string graph with a coordinate system invariable to the addition of new sequences.

Graph format and examples

This repo explores the storage and exchange of side graphs. For now, it only has one functionality "reformat": read a side graph into RAM and write the graph out. It takes a format like the following:

S	R1	    100       # reference R1
J	R1:20:3 R1:23:5   # a deletion
S	R2	    120
J	R2:30:5 R1:50:3   # translocation
S	A       2         # an MNP n R1
J	A:0:5   R1:10:3
J	A:1:3   R1:12:5
S	B       2
J	B:0:5   A:1:3
J	B:1:3   R1:14:5

where a "Sequence" line gives the sequence name and length and a "Join" line gives the two sides that are joined together. In the input, a "J" line may use a sequence that has not been defined on an "S" line. In the output, a "J" line never uses sequences that have not appeared on "S" lines before.

The reformat tool optionally takes a graph in the "insert" representation like:

S	R1      100
S	R2      120
J	R1:50:3 R2:30:5
J	R1:20:3 R1:23:5
I	A       2        R1:10:3  R1:12:5
I	B       2        A:1:3    R1:14:5

The reformat of this graph is the same as the first example.

Performance

So far, the reformat tool has only been tested on toy examples. It probably has bugs that may show up on other graphs. Nonetheless, the underlying implementation is very efficient and should work with huge real-world graphs in principle.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cmd.c		cmd.c
io.c		io.c
kbtree.h		kbtree.h
khash.h		khash.h
kseq.h		kseq.h
sdg.c		sdg.c
sdg.h		sdg.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Side graph

Graph format and examples

Performance

About

Releases

Packages

Languages

lh3/sdg

Folders and files

Latest commit

History

Repository files navigation

Side graph

Graph format and examples

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages