<center><img src=img/MScAI_brand.png width=70%></center>

# Graphs

As we know, a *graph* is the word often used in maths and computer science for a *network* -- a collection of objects with connections among them. 

Graph theory is a beautiful and fascinating branch of mathematics with many applications in computer science and artificial intelligence.

Don't confuse a graph (network) with a graph (plot).

Some of the disparate terminology we use:

object | connection
-------|-----------
node   | edge
vertex | arc
point  | line

<center><img src=img/Konigsberg_bridges.png width=40%></center>
<font size=1><a href="https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg">Wiki</a></font>

The *founding problem of graph theory*: is it possible to traverse all the bridges of Konigsberg and end back at the starting-point without re-crossing any? Here the *nodes* are pieces of land and *edges* are bridges.

<center><img src=img/tigh_neachtain_map.png width=50%></center>

A practical problem: what is the shortest path from one location to another, via a network of roads?

<center><img src="img/social_network.jpg" width=50%><center>

<font size=1>Credit: http://bitcoinwiki.co/wp-content/uploads/censorship-free-social-network-akasha-aims-to-tackle-internet-censorship-with-blockchain-technology.jpg </font>

A classic social networks problem: who are the most influential people? How many steps  or "degrees of separation" from one to the next?

<center><img src=img/pacman_ghost_fsm.svg width=40%></center><font size=1>Inspired by Yannakakis and Togelius, <em>Artificial Intelligence and Games</em></font>

A finite state machine is a graph, where states are nodes, state transitions are *directed edges* and input symbols are edge labels.

### The web

* Web pages (URLs) are nodes; hyperlinks are directed edges.

<center><img src=img/PageRank-hi-res.png width=45%></center>
<font size=1><a href=https://en.wikipedia.org/wiki/PageRank>Wiki</a></font>

The *PageRank* problem: which pages are the most trusted? A page is trusted if other highly trusted pages link to it. (But isn't this circular?)

### Edge properties

In all the examples we've seen, nodes have unique names or just integer labels.

Usually edges are just plain connections with no properties, names or labels (sometimes direction). Sometimes edges do have properties such as a numerical *weight* per edge. 

E.g. in a **water grid**, each edge might represent a pipe, and each pipe might have a different capacity represented as a number in litres per second.

###  Collaboration graphs

E.g. in scientific research: authors are nodes; there is an edge where two authors have co-authored a paper.

Here edges are *undirected* because saying "$a$ co-authored with $b$" is the same as saying "$b$ co-authored with $a$".

A classic problem of social networks: can we automatically detect communities in the co-authorship graph?

### Electrical grids

* The electricity network: power stations, transformers, and consumers are nodes; electricity wires are edges.

<center><img src=img/power_plant.svg width=50%></center>

A classic logistics problem: on a small island, we have a power station and a set of consumers. Every meter of electricity wires costs money. What wires should we build to save money but ensure everyone is connected?

### Family trees

A *tree* is a graph with no cycles. Your family tree is a directed tree *rooted* at you, with an edge from each person to each of their parents.

<center><img src=img/jon-snow-family-tree.jpg width=50%></center>
<font size=1><a href="https://time.com/5560753/game-of-thrones-family-tree/">Time.com</a></font>

By the way, it's not *really* a tree, if we go back far enough.

### Abstract syntax tree

An AST is an internal representation for computer code. The Python interpreter, the C++ compiler, and all other interpreters and compilers translate our source code into an AST before executing it.

```python
def fib(n):
    if n < 2:
        return n
    else:
        return fib(n-1) + fib(n-2)
```

<center><img src=img/ast.svg width=40%><center>

Tangent: Python has some nice built-in tools for *introspection* including looking at the AST of a piece of Python code. It's not quite as clean as the example above because it includes extra technical details.

```python
import ast
s = open("code/fib2.py").read()
n = ast.parse(s)
print(ast.dump(n))
```

```python
"Module(body=[FunctionDef(name='fib', args=arguments(args=[arg(arg='n', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[If(test=Compare(left=Name(id='n', ctx=Load()), ops=[Lt()], comparators=[Num(n=2)]), body=[Return(value=Name(id='n', ctx=Load()))], orelse=[Return(value=BinOp(left=Call(func=Name(id='fib', ctx=Load()), args=[BinOp(left=Name(id='n', ctx=Load()), op=Sub(), right=Num(n=1))], keywords=[]), op=Add(), right=Call(func=Name(id='fib', ctx=Load()), args=[BinOp(left=Name(id='n', ctx=Load()), op=Sub(), right=Num(n=2))], keywords=[])))])], decorator_list=[], returns=None)])"
```

(The details of this are not examinable.)

### Node and graph terminology and properties

* Order: the number of nodes in the graph
* Size: the number of edges in the graph.

### Cycles

* Cycle: a sequence of 2+ nodes with edges allowing travel from start, along sequence, leading back to start
* Directed cycle: in a directed graph, a cycle exists only if this travel is consistent with edge direction
* A graph with no cycles is called *acyclic*.


* The *degree* of node *n* is the number of neighbours
* In a *directed graph*, each node has an *out-degree* and *in-degree*
* In a graph with *weighted edges*, the degree is the sum of relevant edge weights.

### Graph representations

There are several possible representations for a graph:

* Adjacency matrix
* List of edges
* Adjacency lists
* dict-of-dicts-of-dicts, as used by NetworkX, is a variant of adjacency-list.


### Adjacency matrix

In an adjacency matrix, we represent a graph of $n$ nodes as an $n\times n$ matrix. In a binary adjacency matrix, a 0 in location $(i, j)$ indicates no edge present between nodes $i$ and $j$; a 1 indicates the edge is present. 

If our edges have weights, we just put those instead of the "1"s.

### Adjacency lists

The adjacency lists representation is really more like a dictionary, where every node maps to a list of its neighbours, i.e. the nodes which are *adjacent* to it. Here is a graph in adjacency-lists format: 
```python
G = {
    0: [1, 2, 3], 
    1: [0, 3], 
    2: [0], 
    3: [0, 1], 
    4: []}
```


### List of edges

We could also represent a graph as a list of edges, where each edge is a 2-tuple. 

### Conclusion

Graph theory is the study of things that can be represented as graphs. And a *lot* of things can be represented as graphs.

Classroom MScAI students can study more in the module Web and Network Science, CT5113, Dr Conor Hayes, Semester 2.

### Exercises

* Recall this graph in adjacency-lists format: 
```python
G = {
    0: [1, 2, 3], 
    1: [0, 3], 
    2: [0], 
    3: [0, 1], 
    4: []}
```

Draw this on paper. Write down its adjacency matrix on paper. Write it down in a list of edges format.

* Calculate the sum of row 0 in the adjacency matrix. What does it mean?
* What do you observe about the diagonal, and about symmetry?
* Suppose you saw a 1 on the diagonal. How could this be represented on paper? Think of a real-world situation where this would be useful.


Consider again the list of edges format, where each edge is a 2-tuple. What would be the advantages of this, for a very large graph? Are there any disadvantages? (Hint: consider the same graph as before.)

### Solutions

Adjacency matrix
```
0 1 1 1 0
1 0 0 1 0
1 0 0 0 0
1 1 0 0 0
0 0 0 0 0
```

<center><img src=img/exercise_graph.svg width=40%></center>

* The row-sum of a node (e.g. row 0 for node 0) is its degree -- the number of edges from it.
* The diagonal is always 0 and the matrix is symmetric.
* If there was a 1 on the diagonal, it would indicate an edge from node $i$ to itself -- a *self-loop*. Usually these are disallowed. But e.g. if a web page linked to itself, we could represent that as a self-loop.

* In an undirected graph, the adjacency matrix is symmetric. In a directed graph, it may not be.

* The advantage of the list-of-edges representation is that in a *sparse graph* (few edges, relative to the number there *could be*) it saves a lot of space compared to an explicit adjacency matrix which has to store all those zeros. The possible disadvantage is that there could be isolated nodes (nodes with no neighbours, i.e. degree 0), which would be omitted entirely if we only list the edges. So we might have to maintain both a list of edges and a list of nodes.