# Networkx + graphviz for DAG dependency analysis

Explorations on how to use networkx + graphviz to parse and subset dags

The aim is to find and subset only the parents of a sink node, and print only the relevant subset of the graph

## prereqs

`brew install graphviz`

`pip install networkx matplotlib pygraphviz`

In [34]:
import networkx as nx

In [35]:
dg = nx.DiGraph()

In [36]:
dg.add_node(1, **{"name":"a","type":"T1"})
dg.add_node(2, **{"name":"b","type":"T2"})
dg.add_node(3, **{"name":"c","type":"T3"})
dg.add_node(4, **{"name":"d","type":"T1"})
dg.add_node(5, **{"name":"e","type":"T1"})
dg.add_node(6, **{"name":"e","type":"T1"})


In [37]:
dg.add_edge(1,2)
dg.add_edge(2,3)
dg.add_edge(3,6)
dg.add_edge(2,4)
dg.add_edge(4,5)

In [38]:
dg.nodes

NodeView((1, 2, 3, 4, 5, 6))

Write a Graphviz dot file for visualization

In [39]:
nx.nx_agraph.write_dot(dg,"dg.dot")

In [40]:
! dot -Tjpg dg.dot -o dg.jpg

For the DFS to work to find the "parent" deps, the DAG needs to be reverted

In [41]:
parents_5 = nx.dfs_predecessors(dg.reverse(), 5)
parents_5

{4: 5, 2: 4, 1: 2}

Create a subgraph from 5 and it's predecessors

In [None]:
subg = nx.subgraph(dg,[5]+list(parents_5.keys()))
subg.edges

OutEdgeView([(1, 2), (2, 4), (4, 5)])

In [43]:
nx.nx_agraph.write_dot(subg,"subg.dot")

In [44]:
! dot -Tjpg subg.dot -o subg.jpg