# Random Walks in Static Networks

[Run notebook in Google Colab](https://colab.research.google.com/github/pathpy/pathpy/blob/master/doc/tutorial/random_walks.ipynb)  
[Download notebook](https://github.com/pathpy/pathpy/raw/master/doc/tutorial/random_walks.ipynb)

Using the general interafce provided by the abstract class `pathpy.processes.BaseProcess` it is easy to simulate random walks. In the following, we demonstrate this in both a toy example and an empirical network. Let us first import pathoy and create a simple toy network.

In [None]:
pip install git+git://github.com/pathpy/pathpy.git

In [1]:
import pathpy as pp
from pprint import pprint

Our toy example is a directed network with four nodes and five weighted edges:

In [2]:
n = pp.Network(directed=True)
n.add_edge('a', 'b', weight=1, uid='a-b')
n.add_edge('b', 'c', weight=1, uid='b-c')
n.add_edge('c', 'a', weight=2, uid='c-a')
n.add_edge('c', 'd', weight=1, uid='c-d')
n.add_edge('d', 'a', weight=1, uid='d-a')
n.plot()

To simulate a random walk in this network, we first create a `RandomWalk` instance. The constructor will generate a (sparse) transition matrix and use it to initialize the random walk sampling process for a specific network, which is why we need to pass the network instance as the first parameter. Specifying the optional second `weight` attribute will lead to a biased random walk, where the transition probabilities are weighted by the respective numerical property:

In [3]:
rw = pp.processes.RandomWalk(n, weight='weight')

We can inspect the sparse transition matrix of the random walk process by using the function `rw.transition_matrix()`:

In [4]:
print(rw.transition_matrix)

  (0, 1)	1.0
  (1, 2)	1.0
  (2, 0)	0.6666666666666666
  (2, 3)	0.3333333333333333
  (3, 0)	1.0


The function `transition_matrix_pd` returns the matrix as a (nicely formatted and properly labelled) pandas DataFrame, which makes it easy to read it:

In [5]:
print(rw.transition_matrix_pd())

          a    b    c         d
a  0.000000  1.0  0.0  0.000000
b  0.000000  0.0  1.0  0.000000
c  0.666667  0.0  0.0  0.333333
d  1.000000  0.0  0.0  0.000000


To compute the stationary visitation probabilites of the random walk, we can call:

In [6]:
rw.stationary_state()

array([0.3, 0.3, 0.3, 0.1])

Following the general design of the class `BaseProcess`, we can use the iterator function `simulation_run` to iterate through the steps of a random walk with a given length. If we specify `seed`, the walk will start from the given node uid.

In each step, the iterator will yield the current time, as well as a tuple containing those nodes whose state has changed. In the random walk process, this is the currently visited node (the first entry in the tuple) and the previous node, that is now not visited anymore (the second entry in the tuple). 

In each step, we can use properties and methods to access the current state of the random walk, e.g. we can output the current node visitation frequencies as well as the total variation distance to the stationary visitation probabilities. Note that the first iteration yields the status after the first transition, i.e. in the following example, the random walk is initialized in node `a` at time 0.

In [7]:
for time, updated_nodes in rw.simulation_run(steps=10, seed='a'):
    print('time = {0}, current node = {1}'.format(time, updated_nodes[0]))
    pprint(rw.visitation_frequencies)
    print(rw.total_variation_distance)

time = 1, current node = b
array([0.5, 0.5, 0. , 0. ])
0.39999999999999997
time = 2, current node = c
array([0.33333333, 0.33333333, 0.33333333, 0.        ])
0.10000000000000017
time = 3, current node = a
array([0.5 , 0.25, 0.25, 0.  ])
0.19999999999999996
time = 4, current node = b
array([0.4, 0.4, 0.2, 0. ])
0.19999999999999998
time = 5, current node = c
array([0.33333333, 0.33333333, 0.33333333, 0.        ])
0.10000000000000017
time = 6, current node = d
array([0.28571429, 0.28571429, 0.28571429, 0.14285714])
0.042857142857142684
time = 7, current node = a
array([0.375, 0.25 , 0.25 , 0.125])
0.09999999999999976
time = 8, current node = b
array([0.33333333, 0.33333333, 0.22222222, 0.11111111])
0.07777777777777753
time = 9, current node = c
array([0.3, 0.3, 0.3, 0.1])
2.2898349882893854e-16
time = 10, current node = d
array([0.27272727, 0.27272727, 0.27272727, 0.18181818])
0.08181818181818165


We can use the function `run_experiment` to generate data on multiple runs of a random walker starting in different nodes. This method will return a data frame, that contains all node changes that ocurred during the simulations. To generate two runs of a random walk with 10 steps each, starting from node `a` and node `b` we thus call:

In [8]:
data = rw.run_experiment(steps=10, runs=['a', 'b'])
print(data)

    run_id seed  time node  state
0        0    a     0    d  False
1        0    a     0    c  False
2        0    a     0    a   True
3        0    a     0    b  False
4        0    a     1    b   True
5        0    a     1    a  False
6        0    a     2    c   True
7        0    a     2    b  False
8        0    a     3    d   True
9        0    a     3    c  False
10       0    a     4    a   True
11       0    a     4    d  False
12       0    a     5    b   True
13       0    a     5    a  False
14       0    a     6    c   True
15       0    a     6    b  False
16       0    a     7    a   True
17       0    a     7    c  False
18       0    a     8    b   True
19       0    a     8    a  False
20       0    a     9    c   True
21       0    a     9    b  False
22       0    a    10    d   True
23       0    a    10    c  False
24       1    b     0    d  False
25       1    b     0    c  False
26       1    b     0    a  False
27       1    b     0    b   True
28       1    

While this data frame can be easily exported or visualised, we often want to retrieve `Path` objects that capture the trajectories of random walks. This allows, for instance, to fit higher-order models based on observed random walks. We can use the `get_path` function to retrieve a single path from the data recorded during an experiment. We can specify the `run_id` of the path that shall be extracted, a zero-based counter that is automatically generated during the experiment:

In [11]:
p = rw.get_path(data, run_id=1)
print(p)

Uid:		0x247ff9232b0
Type:		Path
Directed:	True
Nodes:		{'b': Empty b, 'c': Empty c, 'a': Empty a, 'd': Empty d}
Relations:	('b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b')



If we omit the `run_id`, the first run (with ID 0) will be returned as a path:

In [12]:
p = rw.get_path(data)
print(p)

Uid:		0x247ff923c88
Type:		Path
Directed:	True
Nodes:		{'a': Empty a, 'b': Empty b, 'c': Empty c, 'd': Empty d}
Relations:	('a', 'b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'd')



We can use the `get_paths` function of the `RandomWalk` class to generate a `PathCollection` containing all paths captured by a list of run_ids (or all runs in a data frame if we omit the `run_ids` argument):

In [13]:
pc = rw.get_paths(data, run_ids=[0, 1])
print(pc)

{Path ('b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b'), Path ('a', 'b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'd')}


We can use the data frame returned by `run_experiment` to generate an interactive visualisation of the random walk process. We just have to pass the data frame to the `plot` function of the random walk instance. The result is a temporal network visualization, where we can use a slider bar to move forward and backward through time.

The `plot` function accepts a `run_id` argument that defines which of the recorded random walks in `data` shall be visualised. If we omit this parameter, the first random walk will be visualised.

In [14]:
rw.plot(data, run_id=0)

Apart from specifying a list of start nodes, we can also give a number of runs that shall be simulated. In this case, the seed nodes of the individual simulation runs are chosen uniformly at random.

In [15]:
pc = rw.get_paths(rw.run_experiment(steps=10, runs=20))
print(pc)

{Path ('b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'), Path ('c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd'), Path ('c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b', 'c'), Path ('b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b'), Path ('a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a'), Path ('b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b'), Path ('c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a'), Path ('c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b'), Path ('d', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c'), Path ('a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'a'), Path ('d', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b'), Path ('d', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a'), Path ('d', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'b', 'c')}


If `runs` is a list of `Node` objects, a single random walk will be generated for each start node. To generate exactly one random walk starting in each node of the network, we can simply pass the node uids of a network:

In [16]:
pc = rw.get_paths(rw.run_experiment(steps=10, runs=n.nodes.uids))
for p in pc:
    print(p)

Uid:		0x247ff98b400
Type:		Path
Directed:	True
Nodes:		{'d': Empty d, 'a': Empty a, 'b': Empty b, 'c': Empty c}
Relations:	('d', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c')

Uid:		0x247ff98b4a8
Type:		Path
Directed:	True
Nodes:		{'c': Empty c, 'd': Empty d, 'a': Empty a, 'b': Empty b}
Relations:	('c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c')

Uid:		0x247ff98b390
Type:		Path
Directed:	True
Nodes:		{'a': Empty a, 'b': Empty b, 'c': Empty c, 'd': Empty d}
Relations:	('a', 'b', 'c', 'd', 'a', 'b', 'c', 'a', 'b', 'c', 'a')

Uid:		0x247ff98b550
Type:		Path
Directed:	True
Nodes:		{'b': Empty b, 'c': Empty c, 'a': Empty a, 'd': Empty d}
Relations:	('b', 'c', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a')



We close this tutorial with a random walk simulation in an empirical network.

In [17]:
n = pp.io.graphtool.read_netzschleuder_network('game_thrones')
rw = pp.processes.RandomWalk(n, weight='weight')
rw.plot(rw.run_experiment(steps=100))