## Applications of Graph Neural Networks

#### Over-Smoothing Problem
The issue of stacking too many GNN layers is that GNNs suffer from the over-smoothing problem. This is essentially that all the node embeddings converge to the same value. This is really bad as we want to use node embeddings to differentiate nodes. 

The **receptive field** is the set of nodes that determine the embedding of a node of interest. In a K-layer GNN, each node has a receptive field of K-hop neighborhood. The shared neighbors quickly grow when we increase the number of hops (number of GNN layers). 

Over-smoothing can be explained via the notion of the receptive field. We know that the embedding of a node is determined by its receptive field. If two nodes have highly-overlapped receptive fields, then their embeddings are highly similar. 

Stacking many GNN layers will lead to nodes having highly-overlapped receptive fields. Node embeddings will be highly similar and suffer from the over-smoothing problem. How do we overcome this?

The first lesson is that we need to be cautious when adding GNN layers. Adding more GNN layers do not always help. 

We can also make GNNs more expressive for when we use shallow GNNs. We can make aggregation/transformation become a deep neural network. We could add layers that do not pass messages. A GNN does not necessarily only contain GNN layers. For example, we could add MLP layers (applied to each node) before and after GNN layers as pre-process layers and post-process layers. **Pre-processing layers**: are important when encoding node features is necessary (eg when nodes represent images/text). **Post-processing layers** are important when reasoning/transformation over node embeddings is needed (eg. graph classification, knowledge graphs). These layers work really well in pratice.

If we absolutely require many layers, we can also add skip connections. The basic idea of skip connections is that before adding shortcuts, the function is F(x) and after adding shortcuts, it becomes F(x) + x. We want to create a mixture of models. N skip connections leads to $2^{N}$ possible paths. Each path could have up to N modules and we automatically get a mixture of shallow GNNs and deep GNNs.

## Theory of Graph Neural Networks