# Introduction to Graph data with PyG
by @moaziat 

### Graphs?

Graphs are kind of data structure which models a set of objects (Nodes) and their and their relationships (edges). As a unique non-Euclidien data structure for ML, graph analysis focues on tasks such as node classification, link prediction, and clustering. Graph Neural Networks (GNNs) are deep learning based methods that operate on graph domain. 
We denote a graph as $ G = (V, E) $, where $|V| = N$ : number of nodes in the graph  and $ |V| = N^e$: number of edges.

### Graph type?

- **Directed/Undirected graphs:** Directed graphs are the ones where edges are directed from oned node to another, while undirected graphs can b seen as two directed graphs
- **Homogeneous/Heterogeneous graphs:** If nodes and edges are of the same type then the graph is homogeneous. Else, the graph is heterogeneous.
- **Static/Dynamic graphs:** Graphs are dynamic when input features or the topology of the graph vary with time. 

### Code?

In [4]:
import torch
import networkx as nx 
import matplotlib.pyplot as plt

In [5]:
"""
A function to visualize a graph data
"""

def visualize_graph(G, color):
    plt.figure(figsize=(7,7))
    plt.xticks([])
    plt.yticks([])
    nx.draw_networkx(G, pos=nx.spring_layout(G, seed=42), with_labels=False,
                     node_color=color, cmap="Set2")
    plt.show()

Pytorch geometic provides a bunch of graph datasets here: 
https://pytorch-geometric.readthedocs.io/en/2.6.0/modules/datasets.html 

We are going to use the GeometricShapes dataset (I love geometry)

In [None]:
from torch_geometric.datasets import AirfRANS

"""
GeoemtricShapes requires one positional argument root directory where the dataset should be saved
"""

dir = "mo/home/Desktop" 
dataset = AirfRANS(root=dir, task="reynolds")

Downloading https://data.isir.upmc.fr/extrality/pytorch_geometric/AirfRANS.zip


### Explore the the graph dataset (fancy wording huh!)

In [10]:
print('Dataset properties')
print('==============================================================')
print(f'Dataset: {dataset}') #This prints the name of the dataset
print(f'Number of graphs in the dataset: {len(dataset)}')
print(f'Number of features: {dataset.num_features}') #Number of features each node in the dataset has
print(f'Number of classes: {dataset.num_classes}') #Number of classes that a node can be classified into


#Since we have one graph in the dataset, we will select the graph and explore it's properties

data = dataset[0]
print('Graph properties')
print('==============================================================')

# Gather some statistics about the graph.
print(f'Number of nodes: {data.num_nodes}') #Number of nodes in the graph
print(f'Number of edges: {data.num_edges}') #Number of edges in the graph
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}') # Average number of nodes in the graph
print(f'Contains isolated nodes: {data.has_isolated_nodes()}') #Does the graph contains nodes that are not connected

print(f'Contains self-loops: {data.has_self_loops()}') #Does the graph contains nodes that are linked to themselves

print(f'Is undirected: {data.is_directed()}') #Is the graph an undirected graph

Dataset properties
Dataset: GeometricShapes(40)
Number of graphs in the dataset: 40
Number of features: 0
Number of classes: 40
Graph properties
Number of nodes: 32
Number of edges: 0


AttributeError: 'GlobalStorage' object has no attribute 'edge_index'

### Implement?

A simple GNN? Like a simple Graph? whut? 
When building a GNN model there are three modules to take into account: 
* **Propagation module** : Propagate information between nodes so that the aggregated information could capture both feature and topological info. How? We use the convolution operator to aggregate information from neighbours. 
* **Sampling module** : Needed when the graph is large to conduct propagation on graphs. T