# Introduction to Graphs

## 1. Graphs: What and Why

Graphs are a generic way of representing relationships (edges) between entities (nodes).

This makes them useful in a wide variety of applications, including modelling biological pathways, social networks, or even images.

Graphs do not have a fixed structure and the relationships (edges) can be time-varying.


### 1.1. Graph Jargon

Entities are **nodes or vertices** and their relationships are **edges**.

Nodes that are connected are **adjacent** and are called **neighbours**.

Graphs are the general case of trees. A graph is broadly something with nodes and edges. A tree is a graph that has *no cycles* and all nodes are *connected*.

A **directed graph** shows where relationships can be directional, e.g. social network followers.

Edges can be **weighted** or unweighted.

A **path** is the sequence of edges used to get from one vertex to another.

## 2. Implementing a Graph From Scratch

Graphs are an abstract data structure, so there are multiple ways we can represent them in memory.

Some common options are:

1. Objects and pointers
2. Adjacency lists AKA edge lists
3. Adjacency matrix


Each approach has its own trade-offs in terms of memory usage, ease of implementation, and efficiency based on the specific characteristics of the graph you're dealing with.

We'll implement the following simple graph using each approach:


```{mermaid}

flowchart LR

A[1] <---> B[2]
B[2] <---> C[3]
```

### 2.1. Objects and Pointers

In this approach, we represent each node in the graph as an object, and you use pointers to connect these nodes. Each object typically contains a list of pointers to other nodes it's connected to.

For a *connected* graph, if we have access to one node we can traverse the graph to find all other nodes. This is not true for graphs with unconnected nodes.

This approach will be familiar to computer scientists - it bears a resemblance to linked lists.
In practice this isn't used much in practical ML because traversing the graph is more cumbersome.


Pros:

- Allows for flexible representation of complex relationships.
- Ideal for graphs where nodes have additional properties beyond just connectivity.

Cons:

- Memory overhead due to object instantiation and pointer storage.
- Can be complex to implement and manage.
- If the graph is not connected, an additional data structure is needed to keep track of isolated nodes.

Python Implementation:

In [1]:
class Node:
    def __init__(self, value):
        self.value = value
        self.neighbors = []

    def __repr__(self) -> str:
        return f"Node {self.value} with neighbors {[k.value for k in self.neighbors]}"

def add_edge(node1, node2):
    node1.neighbors.append(node2)
    node2.neighbors.append(node1)

node1 = Node(1)
node2 = Node(2)
node3 = Node(3)

add_edge(node1, node2)
add_edge(node2, node3)

In [2]:
print(node1)

Node 1 with neighbors [2]


In [3]:
print(node2)

Node 2 with neighbors [1, 3]


In [4]:
print(node3)

Node 3 with neighbors [2]


### 2.2. Adjacency List

In this approach, we store a list of edges per node.

We use a list or dictionary where the index or key represents a node, and the value is a list of its adjacent nodes.

Pros:

- Efficient memory usage for sparse graphs.
- Easy to implement and understand.
- Suitable for graphs with varying degrees of connectivity.

Cons:

- Slower for dense graphs.
- Retrieving edge information may require linear search.

Python Implementation:

In [5]:
class Graph:
    def __init__(self):
        self.adj_list = {}

    def __repr__(self) -> str:
        return str(self.adj_list)

    def add_edge(self, node1, node2):
        if node1 not in self.adj_list:
            self.adj_list[node1] = []
        if node2 not in self.adj_list:
            self.adj_list[node2] = []
        self.adj_list[node1].append(node2)
        self.adj_list[node2].append(node1)


graph = Graph()
graph.add_edge(1, 2)
graph.add_edge(2, 3)

print(graph)


{1: [2], 2: [1, 3], 3: [2]}


### 2.3. Adjacency Matrix

In this approach, we represent the graph as a 2D matrix where rows and columns represent nodes, and matrix cells indicate whether there's an edge between the nodes.

For weighted graphs, the values in the matrix correspond to the weights of the edges.

Pros:

- Efficient for dense graphs.
- Constant time access to check edge existence.

Cons:

- Consumes more memory for sparse graphs.
- Adding/removing nodes can be expensive as it requires resizing the matrix.
- Not ideal for graphs with a large number of nodes due to space complexity.

Python Implementation:

In [6]:
class Graph:
    def __init__(self, num_nodes):
        self.adj_matrix = [[0] * num_nodes for row in range(num_nodes)]

    def __repr__(self) -> str:
        return str(self.adj_matrix)

    def add_edge(self, node1, node2):
        self.adj_matrix[node1][node2] = 1
        self.adj_matrix[node2][node1] = 1


graph = Graph(num_nodes=3)
graph.add_edge(0, 1)
graph.add_edge(1, 2)

print(graph)

[[0, 1, 0], [1, 0, 1], [0, 1, 0]]


The adjacency matrix approach is useful in machine learning as it naturally fits into a tensor representation, which most ML libraries (e.g. pytorch, tensorflow) play nicely with. So we'll stick with tthat going forward.

## 3. Implementing a Grpah with networkx

Grpahs are common enough to work with that we don't need to implement them from scratch every time we want ot use them (although doing so in the previous section is instructive).

`networkx` is a useful third-party library for this task

In [7]:
import networkx as nx


graph = nx.Graph()
graph.add_edges_from([
    ('A', 'B'),
    ('A', 'C'), 
    ('B', 'D'), 
    ('B', 'E'), 
    ('C', 'F'), 
    ('C', 'G'), 
    ('G', 'G'), 
])

print(graph)


From this Graph object we can see our familiar edge list:

In [41]:
graph.adj


AdjacencyView({'A': {'B': {}, 'C': {}}, 'B': {'A': {}, 'D': {}, 'E': {}}, 'C': {'A': {}, 'F': {}, 'G': {}}, 'D': {'B': {}}, 'E': {'B': {}}, 'F': {'C': {}}, 'G': {'C': {}, 'G': {}}})

Or the adjacency matrix:

In [11]:
nx.adjacency_matrix(graph).todense()

  nx.adjacency_matrix(graph).todense()


matrix([[0, 1, 1, 0, 0, 0, 0],
        [1, 0, 0, 1, 1, 0, 0],
        [1, 0, 0, 0, 0, 1, 1],
        [0, 1, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 1]])

# Old

In [7]:
g.degree

DegreeView({'A': 2, 'B': 3, 'C': 3, 'D': 1, 'E': 1, 'F': 1, 'G': 3})

In [8]:
nx.degree_centrality(g)

{'A': 0.3333333333333333,
 'B': 0.5,
 'C': 0.5,
 'D': 0.16666666666666666,
 'E': 0.16666666666666666,
 'F': 0.16666666666666666,
 'G': 0.5}

In [9]:
nx.closeness_centrality(g)

{'A': 0.6,
 'B': 0.5454545454545454,
 'C': 0.5454545454545454,
 'D': 0.375,
 'E': 0.375,
 'F': 0.375,
 'G': 0.375}

In [10]:
nx.betweenness_centrality(g)

{'A': 0.6, 'B': 0.6, 'C': 0.6, 'D': 0.0, 'E': 0.0, 'F': 0.0, 'G': 0.0}

## 3. Graph Metrics

Connectedness


## 4. Graph Search

### 4.1. Depth-First Search


### 4.2. Breadth-First Search


## 5. Graph Algorithms

### 5.1. Dijkstra's Algorithm


### 5.2. A* Algorithm


### 5.3. Kruskal's Algorithm


### 5.4. Prim's Algorithm

## N. Graph Learning

There are several categories of learning tasks we may want to perform on a graph:

1. **Node classification**: Predict the category of each node. E.g. categorising songs by genre.
2. **Link prediction**: Predict missing links between nodes. E.g. friend recommnedations in a social network.
3. **Graph classification**: Categorise an entire graph.
4. **Graph generation**: Generate a new graph based on desired properties.


There are several prominent families of graph learning techniques:

1. **Graph signal processing**: Apply traditional signal processing techniques like Fourier analysis to graphs to study its connectivity and structure.
2. **Matrix factorisation**: