# Task 10: Graph Convolutional Networks

Assume we have a graph $G$
- $V$ is the vertex set
- $\mathbf{A}$ is the adjacency matrix(assume binary)
- $\mathbf{X} \in \mathbb{R}^{m |V|}$is a matrix of node features
- $v$ is a node in $V$ and $\mathrm{N}(v)$ is the set of neighbors of $v$.


## A Single GNN layer

A single GNN layer consists of the following 3 steps:
- **Message computation**. Each node will create a message which will be sent to other nodes later. 

$$\mathbf{m}_u^{(l)}=\text{MSG}^{(l)}\left(\mathbf{h}_u^{(l-1)}\right)$$

Where $h_u^{(l-1)}$ is the embedding/features for input nodes. $m_u^{(l)}$ is the message created for layer $l$.
- **Aggregation**. Each node will aggregate the message from its' neighbors.

- **Nonlinear Transformation**.  Perfrom nonlinear transformation on message or aggregation to add expressiveness.

$$\mathbf{h}_v^{(l)} = \text{AGG}^{(l)}\left(\{\mathbf{m}_u^{(l)},u\in N(v)\}\right)$$

The issue in the above message aggregation process is that information from node $v$ itself **could get lost**. Because $\mathbf{h}_v^{(l)}$ does not diretly depend on $\mathbf{h}_v^{(l-1)}$.

To address the issue, we first include message computation for the $\mathbf{h}_v^{(l-1)}$. As such, the message compuation has two components:

$$\mathbf{m}_u^{(l)}=\mathbf{W}^{(l)}\left(\mathbf{h}_u^{(l-1)}\right)\tag{1}\label{eq:message1}$$

$$\mathbf{m}_v^{(l)}=\mathbf{B}^{(l)}\left(\mathbf{h}_u^{(l-1)}\right)\tag{2}\label{eq:message2}$$

Then for the aggregation step, we further aggregate the message from node $v$ itself via **concatenation** or **summation**


$$\mathbf{h}_v^{(l)} = \text{CONCAT}\left(\text{AGG}^{(l)}\left(\{\mathbf{m}_u^{(l)},u\in N(v)\}\right), \mathbf{m}_v^{(l)} \right)\tag{3}\label{eq:aggregation}$$

## GCN

For Graph Convolutional Networks (GCN), the message commputation and aggregation are as follows:

- **Message Computation**. GCN assumes the existence of self-edges. In such a case,  information from node $v$ itself is caputured in $\mathbf{m}_u^{(l)}$. In addition, Normalized by node degree is applied.

    $$\mathbf{m}_u^{(l)}= \frac{1}{|\mathrm{N}(v)|}\mathbf{W}^{(l)} h_u^{(l-1)}$$

    

- **Aggregation**. the aggregation function $\text{AGG}^{(l)}$ for GCN is $\text{Sum}$

    $$\mathbf{h}_v^{(l)} = \sigma\left(\text{Sum}\left(\{\mathbf{m}_u^{(l)},u\in N(v)\}\right) \right)$$

   

## References

\[1\][CS224W: Machine Learning with Graphs](http://web.stanford.edu/class/cs224w/)  
\[2\][A General Perspective on Graph Neural Networks Slides. Stanford CS224W: Machine Learning with Graphs | 2023](http://web.stanford.edu/class/cs224w/slides/05-GNN2.pdf)