# construct a compact De Bruijn graph for the Acidobacterium genome

Here, we're using a real genome that's been chunked out into 8 equal sized chunks of about 500,000 bases.  Labels are assigned based on these chunks, and then the signatures computed from those chunks can be used to search the catlas and correlate w/those labels.

In [1]:
!rm -fr acido
!../walk-dbg.py --label -x 1e9 -k 31 ../data/acido-chunk*.fa.gz -o acido


placing output in directory: acido
gxt will be: acido/acido.gxt
mxt will be: acido/acido.mxt

building graphs and loading files
finding high degree nodes
... find_high_degree_nodes: 10000
... find_high_degree_nodes: 20000
... find_high_degree_nodes: 30000
... find_high_degree_nodes: 40000
... find_high_degree_nodes: 50000
... find_high_degree_nodes: 60000
... find_high_degree_nodes: 70000
... find_high_degree_nodes: 80000
... find_high_degree_nodes: 90000
... find_high_degree_nodes: 100000
... find_high_degree_nodes: 110000
... find_high_degree_nodes: 120000
... find_high_degree_nodes: 130000
... find_high_degree_nodes: 140000
... find_high_degree_nodes: 150000
... find_high_degree_nodes: 160000
... find_high_degree_nodes: 170000
... find_high_degree_nodes: 180000
... find_high_degree_nodes: 190000
... find_high_degree_nodes: 200000
... find_high_degree_nodes: 210000
... find_high_degree_nodes: 220000
... find_high_degree_nodes: 230000
... find_high_degree_nodes: 240000
... find_high_

# construct a catlas for acido


In [2]:
!../build-catlas.py acido 5

Project acido in acido
Found acido.gxt in acido
Graph contains loops. Removing loops for further processing.
Loaded graph with 3150 vertices, 3627 edges and 1 components
Found acido.mxt in acido
Loaded minhashes for graph

Domset computation

Augmenting 1 2 3 4 5 

Catlas computation

Mapping graph vertices to dominators
Processing node 3150/3150
Building catlasses for connected components
Processing component 0/1
Catlas done
0 1
1 6
2 16
3 28
4 66
5 133
6 294


# Search the catlas with a minhash

In [4]:
# search for minhash of first sequence with label #1
for chunk_id in range(1, 9):
    !../search-for-domgraph-nodes.py --quiet acido 5 ../data/acido-chunk{chunk_id}.fa.sig.dump.txt {chunk_id}


search strategy: bestnode 0
sensitivity: 30.0
specificity: 92.5

search strategy: bestnode 0
sensitivity: 45.5
specificity: 75.6

search strategy: bestnode 0
sensitivity: 38.1
specificity: 88.6

search strategy: bestnode 0
sensitivity: 33.9
specificity: 69.9

search strategy: bestnode 0
sensitivity: 15.1
specificity: 98.6

search strategy: bestnode 0
sensitivity: 13.6
specificity: 97.2

search strategy: bestnode 0
sensitivity: 22.6
specificity: 95.0

search strategy: bestnode 0
sensitivity: 26.0
specificity: 88.6


In [6]:
for chunk_id in range(1, 9):
    !../search-for-domgraph-nodes.py --quiet acido 5 ../data/acido-chunk{chunk_id}.fa.sig.dump.txt {chunk_id} --strategy gathermins --searchlevel 3


search strategy: gathermins 3
sensitivity: 100.0
specificity: 46.7

search strategy: gathermins 3
sensitivity: 97.0
specificity: 24.9

search strategy: gathermins 3
sensitivity: 100.0
specificity: 29.5

search strategy: gathermins 3
sensitivity: 94.9
specificity: 29.5

search strategy: gathermins 3
sensitivity: 98.8
specificity: 23.6

search strategy: gathermins 3
sensitivity: 91.5
specificity: 22.7

search strategy: gathermins 3
sensitivity: 95.7
specificity: 30.8

search strategy: gathermins 3
sensitivity: 98.4
specificity: 20.4


In [7]:
for chunk_id in range(1, 9):
    !../search-for-domgraph-nodes.py --quiet acido 5 ../data/acido-chunk{chunk_id}.fa.sig.dump.txt {chunk_id} --strategy gathermins2 --searchlevel 3


search strategy: gathermins2 3
sensitivity: 100.0
specificity: 45.3

search strategy: gathermins2 3
sensitivity: 92.1
specificity: 53.9

search strategy: gathermins2 3
sensitivity: 90.5
specificity: 65.7

search strategy: gathermins2 3
sensitivity: 96.6
specificity: 42.6

search strategy: gathermins2 3
sensitivity: 95.3
specificity: 51.9

search strategy: gathermins2 3
sensitivity: 84.7
specificity: 55.7

search strategy: gathermins2 3
sensitivity: 83.9
specificity: 65.2

search strategy: gathermins2 3
sensitivity: 89.8
specificity: 44.9


In [8]:
for chunk_id in range(1, 9):
    !../search-for-domgraph-nodes.py --quiet acido 5 ../data/acido-chunk{chunk_id}.fa.sig.dump.txt {chunk_id} --strategy searchlevel --searchlevel 3


search strategy: searchlevel 3
sensitivity: 100.0
specificity: 15.0

search strategy: searchlevel 3
sensitivity: 100.0
specificity: 9.3

search strategy: searchlevel 3
sensitivity: 100.0
specificity: 17.1

search strategy: searchlevel 3
sensitivity: 99.2
specificity: 9.7

search strategy: searchlevel 3
sensitivity: 100.0
specificity: 5.3

search strategy: searchlevel 3
sensitivity: 93.2
specificity: 9.7

search strategy: searchlevel 3
sensitivity: 96.8
specificity: 14.4

search strategy: searchlevel 3
sensitivity: 100.0
specificity: 1.8
