# Extracting paths from temporal networks

[Run notebook in Google Colab](https://colab.research.google.com/github/pathpy/pathpy/blob/master/doc/tutorial/path_extraction.ipynb)  
[Download notebook](https://github.com/pathpy/pathpy/raw/master/doc/tutorial/path_extraction.ipynb)

This short tutorial demonstrates (and explains) how to calculate time-respecting path frequencies in a temporal network.

pip install git+git://github.com/pathpy/pathpy.git

In [None]:
import pathpy as pp
import io
import numpy as np

We first generate a maximally simple temporal network with three (instantaneous) time-stamped edges:

In [None]:
tn = pp.TemporalNetwork()
tn.add_edge('a', 'b', timestamp=1)
tn.add_edge('b', 'c', timestamp=2)
tn.add_edge('b', 'd', timestamp=5)
tn.plot()

As a first step, we can turn this temporal network into a time-unfolded directed acyclic graph. For this, we have to specify the maximum time difference delta between any two time-stamped edges that shall constitute a time-respecting or causal path. In addition to occuring within the maximum time difference, time-stamped edges also have to occur in the correct temporal ordering.

In the resulting time-unfolded directed acyclic graph, each time-unfolded node `v_t` represents a node `v` at a given time stamp `t`. Each edge (`v_t`, `w_{t'}`) between such time-unfolded nodes represents a possible causal influence (i.e. a time-respecting of causal path) by which node `v` at time `t` can influence node `w` at `t'>t`.

By definition, each time-stamped edge (`v`, `w`, t) is a causal path of length one by which node `v` at time `t` can influence node `w` at the next timestamp `t+1` (i.e. we assume that it takes one unit of time for influence to traverse an edge). For a maximum time difference of one between two edges, the only causal path of length two connects node `a` (at time 1) via node `b` (at time 2) to node `c` at time 3. We can see this in the resulting time-unfolded directed acyclic graph:

In [None]:
dag = pp.DirectedAcyclicGraph.from_temporal_network(tn, delta=1)
dag.plot()

If we increase the maximum time difference to `delta=2` three additional time-respecting paths of length one emerges (one from `b` at time 5 to `d` at time 7,  one from `a` at time 1 to `b` at time 3, and one from node `b` at time 2 to node `c` at time 4). This further implies one additional time-respecting path of length two, which is represented in the DAG below:

In [None]:
dag = pp.DirectedAcyclicGraph.from_temporal_network(tn, delta=2)
dag.plot()

If we set the delta to a maximum value of `infinity`, only the time-ordering of time-stamped edges is considered, i.e. any time gap between edges is allowed. In the example above, this implies that the state of node `a` at time `t=1` can influence any other node at any later time. In the directed acyclic graph this is represented as:

In [None]:
dag = pp.DirectedAcyclicGraph.from_temporal_network(tn, delta=np.inf)
dag.plot()

Thanks to its acyclicity, a directed acyclic graph can be used to calculate a finite set of all paths from any root node `(the potential start node/time of a causal path in a temporal network) to any leaf node (the potential end node/time of a causal path) in the DAG. We can use the `roots` and `leafs` properties of the `DirectedAcyclicGraph` class to return those:

In [None]:
print([v.uid for v in dag.roots])
print([v.uid for v in dag.leafs])

In [None]:
paths = pp.PathCollection()
paths = dag.routes_from(dag.nodes['a_1'], paths)
for p in paths:
    print(' -> '.join([v.uid for v in p.nodes]))


In [None]:
paths = pp.algorithms.path_extraction.extract_path_collection(tn, delta=2)
for p in paths:
    print(' -> '.join([v.uid for v in p.nodes]))
    print(paths[p.uid]['frequency'])
    print('---')

In [None]:
pp.io.infomap.to_state_file(paths, 'test.state', weight='frequency')

In [None]:
with io.open('test.state', 'r') as f:
    print(f.read())

In [None]:
pc = pp.PathCollection()
a = pp.Node('a')
b = pp.Node('b')
c = pp.Node('c')
d = pp.Node('d')

e1 = pp.Edge(a, b, uid='a-b')
e2 = pp.Edge(b, c, uid='b-c')
e3 = pp.Edge(c, d, uid='c-d')

# 15 observations: a-b
# this path of length one should be ignored as there's no associated state node (empty memory)
pc.add(pp.Path(e1, frequency=15))

# 42 observations: a-b-c
# this leads to state nodes a-b and b-c connected by a link with weight 42
pc.add(pp.Path(e1, e2, frequency=42))

# 41 observations: a-b-c-d
# this leads to state nodes a-b-c and b-c-d connected by a link with weight 41
pc.add(pp.Path(e1, e2, e3, frequency=41))
print(pc)

In [None]:
pp.io.infomap.to_state_file(pc, 'test.state', weight='frequency')
with open('test.state', 'r') as f:
    print(f.read())