# Learning about Graph Neural Networks
These lectures are made in conjuction with CS224W resources from Standford as well as the Graph Representation Learning book by William L. Hamilton.

This module will go over and explain how graph neural networks work from the ground up.

## Introduction

Graphs are common data structures and work as a universal language for describing complex systems. Graphs are described as collections of objects that have interactions (or edges) between them. 

![Graph](./Images/graph.png)

Graphs are able to establish relationships between points and are highly generizable. There are a lot different types of data that can be represented as a graph. Another huge benefit of graphs is they have a mathematical foundation which can be used to analyze and understand them. We can make use of them to make complex predictions. 

This is especially important for deep learning as modern deep learning is often specialized for simple, linear sequences and grids. 

Graphs and networks are much more complex to process. They possess arbitrary sizes and complex topological structure (i.e., no spatial locality like grids). They have no fixed node ordering and also lack reference points. Pair that with dynamic and multimodal features, they become really complex.

### Graphs in Depth


A graph $G = (V, E)$ is described as having a set of nodes $V$ and a set of edges $E$ between the nodes. An edge going from node $u ∈ V$ to node $v ∈ V$ is denoted as $(u, v) ∈ E$. 

- **Simple Graphs**: These are graphs where there is at most one edge between each pair of nodes, no edges between a node and itself and all edges have no direction.

One potential way to represent graphs is through adjacency matrixes: $A ∈ R^{|V|×|V|}$. Here, we would order the nodes in the graph such that every node indexes a particualr row and column in the adjacency matrix. Then, we represent the presence of edges as entries in the matrix: $\textbf{A}[u,v] = 1$ if $(u,v) ∈ E$ and $\textbf{A}[u,v] = 0$ otherwise. If a graph contains no directed edges, the graph will be symmetric but if it does have direction, then A may not be symmetric. There is also the chance that graphs have weighted edges where the entries in the adjacency matrix are arbitrary real-values rather than {0,1}.

![Graph](./Images/admat.png)

Since graphs may often be sparse, we can also represent graphs using an adjacency list. **Adjacency lists** are easier to work with if the network is large and spare and it allows us to quickly retrieve all neighbors of a given node. For example, for the directed graph above, we can create an adjacency list as such:

1: 4

2: 1

3:

4: 2, 3

It is also possible for graphs to have different kinds of edges beyond undirected, weighted and directed edges. We can expand edge notation to include an edge or relation type τ, e.g., $(u, τ, v) ∈ E$ and we can define one adjacency matrix $\textbf{A}_{τ}$ per edge type. We call these **mutli-relational** graphs and this entire graph can be summarized using the adjacency tensor $A ∈ R^{|V|×|R|×|V|}$ where $R$ is the set of relations. There are two important subsets of mutli-relationship graphs:

1. **Heterogeneous Graphs**: Heterogenous graphs have nodes with imbued types. They contain either multiple types of objects or multiple types of links. For these sorts of graphs, edges typically have to follow some sort of constraint. The most common is what specific types of objects can attach to. 
2. **Mutliplex Graphs**: Multiplex graphs assume that graphs can be made into a set of $k$ layers. Nodes are replicated accross layers but each layer has differing connectivity. Inter-layer edge types can exist to connect same nodes across layers. 

We also have attributes or features associated with a graph. These can be represented with a real-valued matrix $X ∈ R^{|V |×m}$ where the ordering is assumed to be the same as the ordering in the adjacency matrix. 

Node degrees are also another important concept to understand for undirected graphs. A **node degree, $k_{i}$** is the number of edges adjacent to Node $i$. For example, 

![Graph](./Images/ND4.png)

$k_{A}=4$

**Avg. Degree:** = $\bar{k}=\langle k \rangle = \frac{1}{N}\sum_{N}^{i=1}{k_{i}}=\frac{2E}{N}$

For directed networks, we define an **in-degree** and an **out-degree**. The (total) degree of a node is the sum of in- and out- degrees.

![Graph](./Images/NDDirected.png)

$k_{C}^{in} = 2$ and $k_{C}^{out}=1$ and thus $k_{C}=3$

$\bar{k} = \frac{E}{N}$ and here $\bar{k^{in}} = \bar{k^{out}}$

**Bipartite graphs** are graphs whose nodes can be divided into two disjoint sets $U$ and $V$ such that every link connects a node in $U$ to one in $V$; that is, $U$ and $V$ are **independent sets**

### Machine Learning On Graphs

For machine learning with graphs, we don't necesarily rely on the typical supervised and unsupervised systems to provide us information about the graphs. We'll start off by discussing what methods would be helpful for machine learning on graphs and discuss their implementation. With deep learning on graphs, our main goal is to input a network and then have an output of predictions which could be node labels, new links, generated graphs and subgraphs, etc, etc.

![Graph](./Images/GNNDL.png)

In traditional machine learning, a lot of feature engineering is required but with representation learning, the network is able to automatically learn the features. We can think of representation learning as a way to map nodes of a graph to d-dimensional embeddings such that similar nodes in the network are embedded close together.


**Node Classification** is when the goal is to predict the label $y_{u}$ which could be a type, category, or attribute associated with all the notes $u ∈ V$ when we only given the true labels on a training set of nodes $V_{train}∈ V$. This is the most popular machine learning task done on graph data. Node classification may appear to be somewhat similar to standard supervised classification, but there are many important differences to consider. The most important thing to consider is that the nodes on a graph are not independent and identically distributed. With supervised learning, we assume the datapoints are statistically independent from all other datapoints and that they are identically distributed. For graphs, its a good idea to leverage their connectedness through ideas such as **homophily**, **structural equivalence** and **heterophily**.
- Homophily: This is the tendency for nodes to share atteributes with their neighbors in the graph. 
- Structural equivalence: This is the idea that nodes with similar local neighborhood structures will have similar lables.
- Heterophily: Presumes that nodes have preferentially connected to nodes with different labels. 

**Link Prediction** or **Relation Prediction** is a method to predict whether there are missing links between two nodes. The standard set up for this is that 

**Graph Classification**: Categorize different graphs

**Clustering:** Detecting if nodes are forming an interconnected community

Here you should pause and take a look at `Notebook 1.ipynb` in the Learning folder to get a better sense of how to program graphs using NetworkX. For more information, take a look at the [documentation](https://networkx.org/documentation/stable/).

## Traditional Methods for Graphs

## Node Embeddings

## Graph Neural Networks 1: GNN Model

## Graph Neural Networks 2: Design Space

## Applications of Graph Neural Networks

## Theory of Graph Neural Networks