## Temporal Graph Networks


Till now, we have been focused on the *static* graphs where the graph structures and the node features are *fixed* over time. However, there are domains where the graph changes over time. 

Temporal graphs can be divided in two categories:
- **Static graphs with temporal signals:** The underlying graph structure does not change over time, but features and labels evolve over time.
        <center><img src="images/static_structure_dynamic_features.png" width="500"></center>
    
A main example is for traffic forcasting where graphs are based on traffic sensor data (e.g. the PeMS dataset) where each sensor is a node, and the edges are road connections. The geographical distribution of sensors in PeMS is shown below:
        
<center><img src="images/pems.ppm" width="400"></center>

- **Dynamic graphs with temporal signals:** The topology of the graph (the presence of nodes and edges), features, and labels evolve over time.
<center><img src="images/dynamic_structure_dynamic_features.png" width="500"></center>
A main example is in a social network where new edges are added when people make new friends, existing edges are removed when people stop being friends, and node features change as people change their attributes, e.g., when they change their career assuming that career is one of the node features.
<center><img src="images/Dynamic_Graphs.png" width="500"></center>

> Note:\
>Dynamic graphs can be divided into *discrete-time* and *continuous-time* categories as well. 

- A discrete-time dynamic graph (DTDG) is a sequence $[G^{(1)}, G^{(2)},...,G^{(\tau)}]$ of graph snapshots where each $G^{(t)} = \left(V^{(t)},A^{(t)},X^{(t)}\right)$ has vertices $V^{(t)}$, adjacency matrix $A^{(t)}$ and feature matrix $X^{(t)}$. DTDGs mainly appear in applications where data is captured at reguarly-spaced intervals.

<center><img src="images/DTDG.png" width="700"></center>
<center><small>Image from https://graph-neural-networks.github.io/static/file/chapter15.pdf</small></center> 

- A continuous-time dynamic graph (CTDG) is a pair $\left(G^{(t_0)},O\right)$ where ${G^{(t_0)}=\left(V^{(t_0)},A^{(t_0)},X^{(t_0)}\right)}$ is a static initial graph at initial state time $t_0$ and $O$ is a sequence of temporal observations/events. Each observation is a tuple of the form *(event, event type,timestamp)* where *event type* can be a node or edge addition, node or edge deletion, node feature update, etc. *event* represents the actual event that happened, and *timestamp* is the time at which the event occured:

<center><img src="images/CTDG.png" width="400"></center>
<center><small>Image from https://arxiv.org/pdf/2404.18211v1</small></center>

We focus on DTDG in this tutorial.

## combining GNNs with sequence models

DTDGs are made up of several snapshots arranged in order over time, which can be treated as sequential data. Temporal patterns in DTDGs are identified by looking at the relationships between these snapshots. Recurrent Neural Networks (RNNs) are often combined with GNNs to create dynamic models for DTDGs. These combinations are generally grouped into two types: stacked architectures and integrated architectures.

- **Stacked dynamic GNNs:** The most straightforward way to model a discrete dynamic graph is to have a separate GNN handle each snapshot of the graph and feed the output of each GNN to a time series component, such as an RNN. This is illustrated in the following Figure: 

<center><img src="images/stacked_DTDG.png" width="400"></center>
<center><small>Image from https://arxiv.org/pdf/2404.18211v1</small></center>

One of most well-known approaches in this cateogry is Waterfall Dynamic-GCN. In this architectures, a GCN is stacked with an LSTM per node. More specifically, at first separate GCNs (with same parameters) handle each snapshot of the graph and next the output of each GNN is sequentially given to a LSTM. In fact, a separate LSTM is used per node (although the weights across the LSTMs are shared). The architecture is illustaretd in the following Figure:
<center><img src="images/waterfall.png" width="700"></center>
<center><small>Image from https://arxiv.org/pdf/2404.18211v1</small></center>

The figure shows a network working on sequences of four snapshots of a graphs composed
of five vertices. The first GCN layer acts as four copies of a regular GCN layer, each one working on a snapshot of the sequence of the graphs. The output of this first layer is processed by the LSTM layer that acts as five copies of a LSTM, each one working on a nodes of the graphs.
The final fully-coonected (FC) layer produces the $C$-class probability vector for each nodes of every snapshot of the sequence. This layer, which produces the $C$-class probability vector for each node and for each instant of the sequence, can be seen as 5 x 4 copies of a FC layer.

- **Integrated dynamic GNNs**:  
