# Basic pathpyG Concepts

## Motivation and Learning Objectives

This first step of our multi-stage introductory tutorial introduces foundational concepts of `pathpyG`. While `pathpyG` is particularly useful for the GPU-accelerated analysis and graph learning in time series data on graphs, it also provides great tools to represent, analyze and visualize data on static networks. The `Graph` class that we will use for this purpose is implemented based on the `Data` object in `pyG`, which comes with the advantage that we can directly apply `pyG` transforms.

In this basic tutorial you will learn how to use `pathpyG` to represent simple static graphs. We start with basic features to create directed and undirected graphs with node-, edge-, and graph-level attributes and inspect how graph data is internally stored. We further discuss how we can analyze the centrality of nodes, and how we can read graph data from the `netzschleuder` database. We will finally show how we can implement graph algorithms.

To get started with `pathpyG`, we first import the modules `torch` and `pathpyG`. By setting the device used by `torch`, we can further specify whether we want to run our code on the CPU or on the GPU. To run your code on the CPU, set the `torch.device` configuration to `cpu`. Set the device to `cuda` if you want to run it on the CPU instead.

In [1]:
import torch
import torch_geometric as pyG

import pathpyG as pp
pp.config['torch']['device'] = 'cuda'

## Creating Graphs from Tensors and Edge Lists
 
Let's start by generating a simple, directed graph with three nodes `a`, `b`, `c` and three edges `(a,b)`, `(b,c)` and `(a,b)`. We will represent those nodes by integer indices, where index `0` represents node `a`, index `1` represents node `b` and index `2` represents node `c`. In line with the efficient tensor-based representation of sparse graphs in `pyG`, we use an `edge_index` tensor with shape `(2,m)` to represent the `m` edges of the graph.

The following snippet generates a graph with three nodes, which are referred to by their indices 0, 1, 2, and three edges (0,2), (1,2), (0,1).

In [2]:
g = pp.Graph.from_edge_index(pyG.EdgeIndex([[0,1,0], [2,2,1]]))
print(g)

Graph with 3 nodes and 3 edges

Graph attributes
	num_nodes		<class 'int'>



Let's use the generator functions `nodes` and `edges` to iterate through the nodes and edges of this graph:

In [3]:
for v in g.nodes:
    print(v)

for e in g.edges:
    print(e)

0
1
2
(0, 2)
(0, 1)
(1, 2)


While the representation of sparse graphs as integer tensors is easy and efficient, it is often more convenient to use string identifiers to refer to nodes. To simplify the handling of graph data, `pathpyG` provides a transparent mapping of string identifiers to node indices. 

If we want to associate node indices with string IDs, we can create an `IndexMap`. To map the nodes with indices 0, 1, and 2 to string IDs `a`, `b`, and `c`, we can use the following mapping:

In [4]:
m = pp.IndexMap(['a', 'b', 'c'])
g.mapping = m
print(g.mapping)

a -> 0
b -> 1
c -> 2



If we now iterate through the nodes and edges of the graph, we get:

In [5]:
for v in g.nodes:
    print(v)

for e in g.edges:
    print(e)

a
b
c
('a', 'c')
('a', 'b')
('b', 'c')


We can achieve the same result if we pass the `IndexMap` object in the constructor of a graph. This transparently applies the mapping in all future function calls.

In [6]:
g = pp.Graph.from_edge_index(torch.tensor([[0,1,0], [2,2,1]]), mapping=m)
print(g)
print(g.mapping)

Graph with 3 nodes and 3 edges

Graph attributes
	mapping		<class 'pathpyG.core.IndexMap.IndexMap'>
	num_nodes		<class 'int'>




Alternatively, we can construct the same graph based on an edge list that uses string identifiers for nodes. This will automatically generate the sparse integer tensor representation of the edge index, as well as the associated `IndexMap`:

In [7]:
g = pp.Graph.from_edge_list([['a','b'], ['b','c'], ['a','c']])
print(g)

Graph with 3 nodes and 3 edges

Graph attributes
	num_nodes		<class 'int'>



In [8]:
print(g.mapping)

a -> 0
b -> 1
c -> 2



## Traversing Graphs

If we want to implement graph algorithms that require to traverse the graph, we can use the `successors` and `predecessors` functions of the `Graph` object: 

In [9]:
for v in g.successors('a'):
    print(v)

b
c


In [10]:
for v in g.predecessors('c'):
    print(v)

b
a


We can also easily check, whether an edge exists in the graph:

In [11]:
g.is_edge('a', 'b')

True

Alternatively, we can use the following code to check whether node `b` is a successor of `a`

In [12]:
'b' in g.successors('a')

True

By default, a graph object in `pathpyG` is directed, i.e. for the graph above, the edge `(b,a)` does not exist, which we can verify as follows:

In [13]:
'a' in g.successors('b')

False

To check the (directed) in- and out-degrees of nodes, we can use the properties `in_degrees` and `out_degrees`, which return a dictionary that maps node IDs to their degrees:

In [14]:
g.in_degrees

{'a': 0, 'b': 1, 'c': 2}

In [15]:
g.in_degrees['b']

1

In [16]:
g.in_degrees['c']

2

Importantly, irrespective of how we have generated the graph object, the actual node and edge data are always stored as a `pyG` data object, which we can access as follows:

In [17]:
g.data

Data(edge_index=[2, 3], num_nodes=3)

This allows us to use the full power of `torch` and `pyG`, including, e.g., the application of transforms, splits, or any easy migration between CPU and GPU-based computation. In general, `pathpyG` will use the device specified in the `torch.device` configuration (see above) whenver it internally creates a torch tensors. Since above, we have specified the `cuda` device, the data object of the graph generated above will reside in GPU memory:

In [18]:
g.data.is_cuda

True

If we instead set the device to `cpu`, the `Data` object will internally be created in main memory instead.

In [19]:
pp.config['torch']['device'] = 'cpu'

g = pp.Graph.from_edge_list([['a','b'], ['b','c'], ['a','c']])
g.data.is_cuda

False

## Node-, Edge- or Graph-Level Attributes

Real-world graphs commonly have node-, edge-, or graph-level attributes. In `pathpyG`, we can simply add attributes as tensors, either by directly assigning them to the `pyG` data object of an existing graph (or by adding them as keyword arguments in the constructor). Following the `pyG` semantics of attribute names, we must use the prefixes `node_` and `edge_` to refer to node- and edge-level attributes. Attributes with other names will be assumed to refer to graph-level attributes.  

In [20]:
g.data['node_class'] = torch.tensor([[0], [0], [1]])
g.data['edge_weight'] = torch.tensor([[1], [2], [3]])
g.data['graph_feature'] = torch.tensor([3, 2])

Once we have added attributes to nodes, edges, or the graph, those attributes, along with their type and shape will be shown when you print a string representation of the graph object:

In [21]:
print(g)

Graph with 3 nodes and 3 edges

Node attributes
	node_class		<class 'torch.Tensor'> -> torch.Size([3, 1])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([3, 1])

Graph attributes
	num_nodes		<class 'int'>
	graph_feature		<class 'torch.Tensor'> -> torch.Size([2])



To simplify the access to attribute values, the `Graph` class in `pathpyG` provides item getter and setter functions that allow an indexed access based on the node IDs. To access the feature `node_feature` of node `a`, we can write:

In [22]:
g['node_class', 'a']

tensor([0])

In [23]:
g['edge_weight', 'a', 'b']

tensor([1])

In [24]:
g['graph_feature']

tensor([3, 2])

We can use the setter function to change attributes:

In [25]:
g['node_class'] = torch.tensor([[7], [2], [3]])

In [26]:
g['node_class', 'a']

tensor([7])

To create a sparse adjacency matrix representations of the topology of a graph, we can use the following function:

In [27]:
print(g.get_sparse_adj_matrix())

  (0, 1)	1.0
  (0, 2)	1.0
  (1, 2)	1.0


This returns a `scipy.sparse.coo_matrix` object, which can be turned into a dense `numpy` matrix as follows: 

In [28]:
print(g.get_sparse_adj_matrix().todense())

[[0. 1. 1.]
 [0. 0. 1.]
 [0. 0. 0.]]


By passing the name of the attribute, we can also use edge attributes in the creation of the adjacency matrix. To create a sparse, weighted adjacency matrix that uses the `edge_weight` attribute of our graph object we can simply write:

In [29]:
print(g.get_sparse_adj_matrix(edge_attr='edge_weight').todense())

[[0 1 2]
 [0 0 3]
 [0 0 0]]


To easily apply GNN models to a graph, we can add attributes based on one-hot-encodings of nodes and edges:

In [30]:
g.add_node_ohe(attr_name='node_ohe_feature_1')
g.add_node_ohe(attr_name='node_ohe_feature_2', dim=4)
g.add_edge_ohe(attr_name='edge_ohe_feature_1', dim=5)
print(g)

print(g.data['node_ohe_feature_1'])
print(g.data['node_ohe_feature_2'])
print(g.data['edge_ohe_feature_1'])

Graph with 3 nodes and 3 edges

Node attributes
	node_class		<class 'torch.Tensor'> -> torch.Size([3, 1])
	node_ohe_feature_1		<class 'torch.Tensor'> -> torch.Size([3, 3])
	node_ohe_feature_2		<class 'torch.Tensor'> -> torch.Size([3, 4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([3, 1])
	edge_ohe_feature_1		<class 'torch.Tensor'> -> torch.Size([3, 5])

Graph attributes
	num_nodes		<class 'int'>
	graph_feature		<class 'torch.Tensor'> -> torch.Size([2])

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.]])
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.]])


By default, all graphs in `pathpyG` are directed. To represent undirected graphs, we must add all edges in both directions. We can use the `to_undirected()` function to make a directed graph undirected, i.e. to add all (missing) edges that point in the opposite direction. This will automatically duplicate and assign the corresponding edge attributes to the newly formed (directed) edges, i.e. edges are assumed to have the same attributes in both directions.

In [31]:
g_u = g.to_undirected()
print(g_u)

Graph with 3 nodes and 6 edges

Node attributes
	node_class		<class 'torch.Tensor'> -> torch.Size([3, 1])
	node_ohe_feature_1		<class 'torch.Tensor'> -> torch.Size([3, 3])
	node_ohe_feature_2		<class 'torch.Tensor'> -> torch.Size([3, 4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([6, 1])
	edge_ohe_feature_1		<class 'torch.Tensor'> -> torch.Size([6, 5])

Graph attributes
	num_nodes		<class 'int'>
	graph_feature		<class 'torch.Tensor'> -> torch.Size([2])



By default, the `Graph` object can contain multiple identical edges, so the following is possible: 

In [32]:
g = pp.Graph.from_edge_list([('a', 'b'), ('b', 'c'), ('c', 'a'), ('a', 'b')])
print(g.data.edge_index)

EdgeIndex([[0, 0, 1, 2],
           [1, 1, 2, 0]], sparse_size=(3, ?), nnz=4, sort_order=row)


It is often convenient, to coalesce multi-edges into weighted single-edges, i.e. in the example above we may prefer a graph where each edge occurs once in the edge index, but the edge `a->b` has a weight attribute of two, while the two other edges have one.

In `pathpyG` we can do this as follows:

In [33]:
g_w = g.to_weighted_graph()
print(g_w.data.edge_index)
print(g_w['edge_weight', 'b', 'c'])
print(g_w['edge_weight', 'a', 'b'])
print(g_w['edge_weight', 'c', 'a'])

EdgeIndex([[0, 1, 2],
           [1, 2, 0]], sparse_size=(3, 3), nnz=3, sort_order=row)
tensor(1.)
tensor(2.)
tensor(1.)


## Graphs and pyG Data


We can easily create a graph from a pyG `Data` object: 

In [35]:
from torch_geometric.data import Data
d = Data(edge_index=torch.LongTensor([[0,0,1,3],[1,2,2,2]]), node_feature_1=torch.Tensor([0,0,0,1]))
d.to(pp.config['torch']['device'])


g = pp.Graph(d, mapping=pp.IndexMap(['a', 'b', 'c', 'd']))
print(g)

Graph with 4 nodes and 4 edges

Node attributes
	node_feature_1		<class 'torch.Tensor'> -> torch.Size([4])

Graph attributes
	num_nodes		<class 'int'>



As we will see in a separate notebook focussing on the advanced (temporal) graph visualization features of `pathpyG`, it is easy to generate (interactive) HTML plots of graphs, that are embedded into jupyter notebooks. You can simply call the `pp.plot` function on the Graph object:

Note that, for the time being, visualizations are generally undirected even if the underlying graph object is directed.

In [36]:
pp.plot(g);

## Node Centralities

To calculate node centralities, we can use a `networkx` delegate mechanism implemented in the module `pathpyG.algorithms.centrality`. Simply speaking, you can call any function implented in the networkx centrality module that starts with the string `centrality_`. The `pathpyG` will be internally converted to a `networkx.DiGraph` object, the corresponding centrality function (with all of its parameters) will be called, and the result will be mapped to the nodes based on their IDs. 

In order to calculate the closeness centralities of all nodes for the graph above, we can call:

In [37]:
pp.algorithms.centrality.closeness_centrality(g)

{'a': 0.0, 'b': 0.3333333333333333, 'c': 1.0, 'd': 0.0}