<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/gnn/01_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
In this notebook, we will go through the basics of graphs and introduce Pytorch Geometric (PyG) and NetworkX.


## What Are Graphs?

<center><img src="https://ai.science/api/authorized-images/UYdgNW5993ssQoBT1A6CLFw7FacErIJuHs%2Ft9wOKPUn4apc7P48JgpOch%2BNocLQpTQKC6SVZG27biEoYaT2pNAeE5bH7vtgErt6KSAY1VWcFpoiNteAWcyph3s3ewuYyNWRPzUjrzpQUiGRiylIc3HW3RIrRbs4oKYcSJkB9h25R%2BXUUPyjwcGKz58aanUhAmIzeAL1CxNg2zfZ5n8r4uQxDLQeDNEZYonz6QqSMbCtTed%2Fvw6Q9cTViejVJXQsgLCcxGNg9E1rQ1X4PdhIb%2FZsBE1p4hodBVWVh12NU%2FtZYnoTw34Bf0zEi3fTjS8M3MKn6m6XE0r8aWHMhHdoVYSV3ZKeZ6ulySsnB8cvyv8pPUApD%2FOmuLscL7u6ALjBB5UU2h3yRGiPjB5EzCnk4Re52wjtv8SNPaubgl6JfJCLZVHeo2B%2FqYLIYo6k%2BzrUSOSKBryfREMsItBnd%2B5uN54RjokaZpHRlBFqRhM0lqVtHDnmDgXQYTmZZAEny8lxZu5EaUKkhxyN2qYj54aNKYF%2BiQmxrmhrJTG%2FNK%2F8457W3FP%2FXov4zjOgNEXFP9ukwRZCc8uxJzxNE%2FhJQBMyDjsrYqg%2BaOG8dv8KvsFQXSdJ3X1FzCgNaJZGDnBIKxFESPrloSJs9XjGojo8JzzXpbkNjhddvFGYNni%2BHo3PkfAU%3D0" width="30%" > </center>



A variety of real-world data is best represented as **graphs** (networks), which are composed of a set of **nodes** (representing objects) and a set of **edges** (representing relationships between these objects). Common examples include:

*   social networks: nodes represent users, and edges represent their friendships
*   molecular graph: nodes represent atoms, and edges represent chemical bonds
*   transportation networks: edges represent routes, and nodes represent intersections


<center><img src="https://ai.science/api/authorized-images/cflQ9xMQtkM0scsgwytr52zm%2Fdkpe39ejVqOLmHCJU%2F1dQN5b2U62ZiJ%2Bse6L00Sx331TAJh7qybBHRacQvbGu33CH81IAgMv7SPYmEyA4ZnpPdC%2FLq1ZP2r%2Fl%2F7ESugF1SZATFb8RTdzaZIXYTIVPVYj6p%2FpJX2BUA4h5zMfl%2BfxifhSX0tqiKjmREgdIDN26gRphxYl6aKTrxnxG782qogTDs96%2BuZZj3%2B7ZXwvrat%2Fp2hjBuOxlqldw5gwiuZpnWHbHB1iOyGinbDGktqtHCumZfPvffA%2FXcrzAeWrm7HVdfvGZITvHvdIvnguYU3%2BiKhwKSzH%2BRVrhCEVHm6WZ7cOpnbMO18sEoWdq1qDCIWmC46VGihBB8kvojl4jeseo3cOumIB%2F%2BPYc4WYwULHL%2FbJHzb0wRZfdbS5ZPp5xmez5H6OpIpueHixQ5nuDJEjXDU%2F%2FgdbKmm2rcM5xmoMjCZlUOM4XWR1qax65ZzXz7ieFtjwgz%2BczqDSdhuHAGg0De5%2BYI8uMeHYHYrj9NbVMED16eRSDhlXGOIsYdyOCa%2FsmXzeYiA%2B6881U8AD%2Bt50RD3BAcZ2%2B%2Bk2RoDp7cBOAzARG0rjJVJ3erdfNdhJfNA6sZ9befead0%2BronwPg497Y9PKDkAQMUnJKBw%2FdOXan%2FTfh0pnStloqa8HLgPuWg%3D" width="33%"><img src="https://ai.science/api/authorized-images/l%2Bi6WAna%2F%2B9EEfr8%2FzIJefpAmW38nkoWLptpUtVuGlxNXqgmwyLB%2Ft%2BH7vn1oe5TbewZRITh%2BjPQqjVo9%2By9nM8efwGXjDzfUNRGRwmhCwEkQPoEq7BOAYnjioczqITreu4x3nXxFd%2FZmwxwM8Dx%2Bqy03XOBAjoZwwprOpXdeDVHXgxMaosjcZFCzY97ByQBx7fJLIwsA4ha3h24HC6%2BLRCN%2F1QbMlqlzK%2BtDuvoAhWVZYEh8aKU%2F4dOSEiPCsn97yk0uexXfzDtdwnTL58V%2F2dcq%2FqSwKHi%2BwrDLHFxG%2FhjAHVH8Pki32%2Fu%2Fh7AnZIHcoBhQD71WuCJl88ow3OUdIGL%2FAT99RQgFPOCBSlu7bOFjaufKEXpnsBRH2lTaGUld9qVjsg%2BgwHwhFJx31XJVhBAF9Lie1a614nR2I0IaRNg3GLq0YoHmDM2fXYLA4bVtpRAQWCU7KjyK5GNyxl4gosncovNtLKyMWM%2BGnsUaFtqcG58TGsUi1rkE2rsWsyzYoJje5mYpyFvDz91XSoHsvoT8DDreDmm5n3tBZR8ICruGee8JwfqG4Y7LveA2MlCEiBW63FT%2F9pKBUC8kjQ%2F1vo7NeEQ0qb6BjccZW%2FbY2uUIDmf%2FlNR0H7QjwhQxvyK01GnBZq6gXf2v%2FqIf3Oka1LypfbKagV4smpFPpvcQxs%3D" width="33%"><img src="https://ai.science/api/authorized-images/Ha7%2B7AGuRds1ctJWbqt9iJnhIsJ9ETX6IHmGfjmf1iNWc5wwZvpViFHr0QiNfCuIjK4AsEz9A6xYfjW4Ya1dUR0AAdp%2FLlT23DjnwHfAHQ7wEbL1KI%2BHH5mRFCN88XAsEC2KRx9lwmile11ZP%2BmD6Jcm5M2J1VPfduefWdNMTmF9wjYbgFyi97mewoHlUI5%2B7iWajhsqvrr14bwG20LxvFqzkCWf0FFkMhCPFgHDmhki7Fc2XSvAh%2FzpXKHdpCDCiztzQMxakOnasD85uSBFo6WLLzNXbE9IsYTT6b2qhwUDa2VTYXXDci4%2BUVWyY0r95PCBNEBFr6rM0t%2B6N1w3U5ILNkBmegWRzP8mkGp7Xe9qWQu%2F1PsxpdUEmPkXvmHVziQFHaNAaOZRtGnF%2FSyOEuszzpraUH5P8TRI3RVTEB3klnO0qTndIRfhViB6EXxvUkLynLpRgh5o5A6LJjGTnoD%2B%2B7pea20vGefhfBMGRsvYdOnHMZc2moBIC4EwqPhgbn2r%2F1gJtmArNXZfphcKpUOkQD3aqc6mvRDSA314QaCTi%2FOPMVTuBJq4stF8LKV%2FxsoTHUkag3OREXcdpEKoFSFc4c75evHrhfanLLoXuS7py0z%2FdLXp19nukwVI3cpXudDQ4qP2FZEObvuSv6w3TL3IKCi9z16lUPFmJEKN%2FpY%3D" width="33%"></center>

The nodes and edges in a graph may have attributes. For example, the nodes in a social network graph may have information about each user (e.g. date of birth, current location etc.), and the edges may contain information about the relation between the users (e.g. date of connection, number of messages sent between the users, etc.).








# Definitions

## Graph
We will define a graph as a tuple $G=(V,E,\mathbf{u})$.

*   $V$ is a set of node features  $V=\{\mathbf{v}_i\}_{i=1:N^v}$, where $\mathbf{v}_i$ is the feature vector of node $v_i$ and $N^v$ is the number of nodes (Note: The use of $v$ comes from the term "**vertex**" which can be used interchangeably with the term "node"). 
*   $E$ is a set of edges  $E=\{(r_k, s_k)\}_{k=1:N^e}$ where $r_k$ is the index of the target (or receiver) node, $s_k$ is the index of the source node, and $N^e$ is the number of edges. Note that another common convention is to define edges as  $E=\{(s_k, r_k)\}_{k=1:N^e}$.

 If the edges have features, then $E$ can be written as $E=\{(\mathbf{e}_k, r_k, s_k)\}_{k=1:N^e}$  where $\mathbf{e}_k$ is the feature vector of edge $k$. 

*   $\mathbf{u}$ is the feature vector of the graph.
<center><img src="https://ai.science/api/authorized-images/w2vNrEQostOQWI7NKXYlPTdN3YGpEe%2FE5DkfULAz4VtFltXMiDuQmCwRDJ2soT6q%2B1G29Wf3adihAbikxK0Lz%2B4wH8dhXIlbgcVqd%2Br95456yMNLUcAsHJSrm89iCJdGmMdafLmx8UrLt3upApSXCPJatcpEJEA4jQUB0tsNocpaA%2BACfntZ8A19BUoUPRm570%2BgHYBcpfeoEXF9O6bQR%2FcUcrU%2FqiM6wffWzvA4sMsCcBHz3Q%2BvhG4RvCXbNIczohgJBTyxYcMsDBzRMQXFvRdE%2FwpvFr1zFdiIvY7NITGv9KykjzCk2N6%2BoYKz4ltEhV1IWhAsWDLz51qL5SucwiAywCf99rwLor7lqreYp9gRnjL9EkOqamLTtD9kYQ%2BG5iYT8kV4bJbjKAsFGqyn35wYNFMViMx3TiCccdiufWS%2BI0Fls8yZDTWwTlvT7qH1jq1%2BqLPUDsRH2X0GJ%2FndYrEY1T3L3o299HtgFE5sjkPBXjUKK3v%2BpI4Qm1DxUkPFuaMeqJjoR7BQWLmzapA7gDjv2isxme%2BQ2DG87JdahsadvN1cV%2BclGZOt6%2FOw22yZiTd40q3C%2BeCyYQ8KC7Fkodr6vPKKsgl%2F%2FK5i5CXHxFhSFrCyy3TJPGlc8vG%2Fe7BVLooyPJdrzDiTuzrZXBF5DgL%2FjhU3qozjQoTbraI21W8%3D" width="60%" > </center>

## Directed vs Undirected
Graphs may be either directed or undirected. In a **directed graph**, an edge going from node $i$ to node $j$ is distinct from an edge going from node $j$ to node $i$. In an **undirected graph**, there is no such distinction.



<center>
<img src="https://ai.science/api/authorized-images/r3NAc%2BfB5uR03G22kuetjVJrMc%2FZQQ3AcyGKHAWo2ejC5WZxKCiK4PaWD%2F3sO3u8r2EcUh7vSpaiuvGVVhZO4aLinIZetcdosj5vQ6g9%2BRlvxia0P8pb0cLAnbIOx9Ar9l92IeBCrNAy6eISY2a2fqeBGrHRz1IUhM5gdtaszuJq2pypk%2BmXQ6j1VoGNN6J0yBVuQKh7l8Q1QBBltkI%2Ft%2FeIjgkcoj0zf%2Bmg3Sq%2FoBM9jyeiGt6olsuFR%2BxI3giahAd2SSbKFCqhCLpQtNw9OeJByHz8TJgsABh%2Bn0gnvtzZnIT9ts55Zes%2Fyt2%2FhAhTJa07OXrveYV9%2BbYSnSrNLorTHcAaa0eWJ678qs4I5S9qjo%2BMzfe%2BtK%2FUV9ju2TycHe%2B9yiZ1YRUXQNxcueE50veNJgXeky3LSeFbmZ%2BKbmoWgoNqTh0Z2eGcR1MMKtjjMgZPnlD203cQq2x5oIY0p2qlxEu8g%2B%2Fh16HXIPwNFikS2CAJmj5wRD%2BbkXZrTRRibahYRzOSXwi1gELFNRmX2G9svh1TlEtIKFLnopPXJolCjEDeYjsN3OufBMTpwr87tlbYZmneQSJ4Cf6jDZXdujyAX%2FX93BLrT1ZBzo0HXU%2FRn3nVh8r0zwQr%2BWdxfhD4gglWpfA6QyR4fnBFfEAIMt9MwszStPRKTA0S4%2BnAwFw%3D" width="30%" >          &emsp;&emsp;
<img src="https://ai.science/api/authorized-images/2ScMF5Kpr6ttHKSR9w29blKs43dcbp04%2B84KyuRByLajG4ewXNEuVCle3ZvjoXw8ZmeoUIXWHU2k4oEyiWrJqPR3aQrZTDEDk95pdAYbKcLCvSe7I3bZUKHhiK3W33%2BBZQwZJgl%2Bdu8eQuzl4RqDvXyut%2BwB%2BNZzX0X%2B1SJWKBR8XJ%2F1C%2BnwKbwCG%2FlNbTOZKo%2FCIct5OJtHjfSYakSzqMtmc7eV81KpJF1razUhZZj8TQSMbCKBQEoOy914byHrh95b7jNyLv5VFCiWZ6aX%2FrEYR1VJ%2FJgfXf3H%2Buk3jgQKjrdhq8VTglwc0wtWvxFcKhnuT95AYB40%2BsMPJKy%2F9DtaZELmRUX3%2BXhojBlfOc0dbgXXium4LsLM8lEM27IYsFtjsmWzryA7zSfqw26QVY9ESjqMs0HfoqA7DIEp45zuAgi3e%2FKNbz0Ty6lZoc0k167KYki5yNwPU7Jaexg%2F%2FTitYhcdLnmIm%2FI5PK0CnfJp2X4EB14s4v7Z2fuxa4l03%2Bx6WmT%2FAZCcyuhUqazV0I5%2FWKBPJiwnGUGVPX9Y1dK390agzRDL2R2d%2FVp8EBXRPtmfrUHYH3oQuFZm0DZChgK1nqt8MMKNK6Yn7JcRSGyujFYzDBK84oXZKip0N8eDROjrb3bBcrI2tsY2al5wtgQAd%2BvIEke%2BtmZEABflVnU%3D" width="30%" > </center>






## Homogeneous vs. Heterogeneous


The nodes in a graph may all be of the same type. Such graphs are called **homogeneous graphs**. A graph may also have different types of nodes. For example, in a  knowledge graph, entities may be of different types, such as "person" "location", "institute", etc. Such graphs are called **heterogenous graphs**.



<center>
<img src="https://ai.science/api/authorized-images/LIuVM0%2BPeTeNmAPxjsIVH8SbyLdpJS9H9VwVZ5nEBaWmjcjz2NoiH5yYBcynjNgSRZ8tCpGpuTJWDzmU6871Q3ApTsr1MJ3GWYzTjME5aRH3YZOxJrU6pakPsReYp9Zq5s%2FXzwtozPdjKoSLRQVlfjIMVaBQ3x8CjctfM49GjhQR74Caq%2BvSA001CHQYBhCbXJrPGYhvC1AuogNgJYk7%2FxTPsnLGcqNM3vAWZ%2BLAzi%2F%2FQuFKaFQOqyUkr5vaKIt%2FC9U%2FZtXmv4fYoJ6SEuohrdp5k9hr4SLlaJjRMkTQvStMlsRpQVRlhf9Jto5%2B81V%2BmgHrx3NJaxN3ZSLUMIj3ot6M69RhZf9%2FLj5kNlx%2FOMZyBiMGiAdc9SBELb3wYqXOQePQXO4EPAA3pId0clNDx8s4vRFWqjZELP2g5c53mq3TOePeEakCkwO63%2B0%2B1uhtx%2Bh%2Fvh0%2F86KIBtW%2FJDm8WcgB1uQdbKNlqo7JSgTOPZVsI4youIeKCP%2BQcohOqG1ZAY3iIYHISTmTddann0xIhtOQ7iIvj%2Ff52ae6Hdh%2BmTUTbBK8dRbBLs3Pn%2FlN%2FTGE6fCgGY6wAtzYnXYuL2dejPNyEY0Qc4NnIyAiqzOdaPFhG%2FN5imZpPbzHFrDwl1E2dVXHu7xUw6rNte8mqDgmD48uvbrX%2BQQbOrc451SwMKw%3D" width="30%" > &emsp;&emsp;&emsp;
<img src="https://ai.science/api/authorized-images/Igfn%2F3D4cgkS170E2y2MyaL1hLNcugW1Mvs7OCcoPmFa0I1eM93MVIma73mN0Ox%2B3%2Bc9rnwTgO3ub9JfQx7%2BGJarD1RpdmlW%2F6ZMPUPUb50Kjax99V3q5nr3HBgQcUOlx%2Be17u4lMYpHfNu7IHdqiddXiVOQCcAyscKeGjWM%2F3jaA6yLfqY2nhAXkmZFiuTRYR2Su9CQdoIxrfIr4Y6xBMrdKBj0V1QuK2ai%2BvMfPeR1%2BQETbNEF5t0OPQwDSZntBfJpYR4wxTE7rbv6Eh%2BUPSSmXXHb%2FJf68K5IcKnuf%2BwgNxfVPmNkBQN4ohfrieEALITpAoZCaF5JqQmHJS%2Fyjg%2Bwz22bEDWWu1FajDQsNCBJuKLhzWaFGBnW9ZdqErcIvTDOdK39Qnv6aCL%2FqWTttVMrmMv%2FOYpgrZ31ZVGVkg8b9EmkgxwWxCaCagvrAqWpOCGhjfUwkQJ7neLStTRvcOKiP1Om7vb8RilW9NmTOdkYCGhpEv3POQ29RQhpTSZW9wytyiqGl%2Bd7n6gf8zFxZg%2BxzObuzsd9vS%2FN4uz1rlsGC%2BQEwxaIgE3aFzkofTZ4JO1LJz9DMykAO34UMqIwJY2VJ9RtKotIX1dbYeg%2BH0ZQTGnOSc6yXd7e4O364c22eT7eMD7bMt577Od%2F1XjmRVPowpo%2Fd1DGdYUXtolJAwM%3D" width="30%" > </center>



# Pytorch Geometric



## Installation

Pytorch geometric (**PyG**) is an extension library for **Pytorch** which facilitates learning on datasets that are represented by graphs, and provides:

* Common [convolutional operators](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html) on graphs.
* Common benchmark [graph datasets](https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html).
* Operations on graphs.


If Pytorch is already installed, PyG can be installed using the following command: (See the github [page](https://github.com/rusty1s/pytorch_geometric) for the latest installation instructions)

In [None]:
!pip install torch-scatter==latest+cu101 torch-sparse==latest+cu101 -f https://s3.eu-central-1.amazonaws.com/pytorch-geometric.com/whl/torch-1.6.0.html
!pip install torch-geometric==1.6.1

Looking in links: https://s3.eu-central-1.amazonaws.com/pytorch-geometric.com/whl/torch-1.6.0.html
Collecting torch-scatter==latest+cu101
[?25l  Downloading https://pytorch-geometric.com/whl/torch-1.6.0/torch_scatter-latest%2Bcu101-cp36-cp36m-linux_x86_64.whl (11.5MB)
[K     |████████████████████████████████| 11.5MB 4.0MB/s 
[?25hCollecting torch-sparse==latest+cu101
[?25l  Downloading https://pytorch-geometric.com/whl/torch-1.6.0/torch_sparse-latest%2Bcu101-cp36-cp36m-linux_x86_64.whl (22.9MB)
[K     |████████████████████████████████| 22.9MB 177kB/s 
Installing collected packages: torch-scatter, torch-sparse
Successfully installed torch-scatter-2.0.5 torch-sparse-0.6.7
Collecting torch-geometric==1.6.1
[?25l  Downloading https://files.pythonhosted.org/packages/88/67/6c0bce6b6e6bc806e25d996e46a686e5a11254d89257983265a988bb02ee/torch_geometric-1.6.1.tar.gz (178kB)
[K     |████████████████████████████████| 184kB 2.8MB/s 
Collecting rdflib
[?25l  Downloading https://files.pythonho

## Creating a graph

A graph in PyG is described by an instance of the [`torch_geometric.data.Data`](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Data) class.


The edges in a directed graph are represented by an edge list `Data.edge_index`, which is a `torch.tensor` of size ($2 \times N^e$). The first row of `Data.edge_index` contains the indices of the source vertices of each edge, and the second row of `Data.edge_index` contains the indices of the target indices. Note that all edges are directed in PyG. To represent an undirected graph, both directions of the edge should be included in Data.edge_index.

A graph may also have a node feature tensor `Data.x` of size ($N^v \times N^{f_{nodes}}$), where $N^{f_{nodes}}$ is the number of features of each node.

 In addition to the `edge_index` and `x` attributes, an instance of `torch_geometric.data.Data` may contain other attributes such as
 
   * the labels of each node (for supervised learning tasks),
   * boolean masks which specify the nodes in a training - validation - testing split, 
   * edge features (analogous to node features). Typically stored in PyG as `Data.edge_attr` and should be of size ($N^e \times N^{f_{edges}}$), where $N^{f_{edges}}$ is the number of features of each edge,
   *  node-type and edge-type (for heterogeneous graphs). 

In order for these attributes to be useful, their dimensions have to be consistent with the number of nodes and number of edges implied by `edge_index`




Here we show an example of defining a graph in PyG. The graph is a directed graph with 5 nodes and 6 edges 

In [None]:
import torch
from torch_geometric.data import Data

# edge_index of size 2x6 corresponding to 6 edges
edge_index = torch.tensor([
                           [0, 1, 1, 2, 4, 3], # source nodes
                           [1, 0, 2, 3, 3, 1]  # target nodes
                           ],dtype=torch.long)

# feature matrix that stores the features of the nodes of size 5x2
x = torch.tensor([[-1, 0],
                  [ 0, 1],
                  [ 1,-1],
                  [-1, 0],
                  [ 0, 1]], dtype=torch.float)

# feature matrix that stores the features of the edges of size 6x1
edge_attr = torch.tensor([[170.],
                          [200.],
                          [120.],
                          [100.],
                          [230.],
                          [100.]], dtype=torch.float)

# Initialize a Data object
data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr)


Let us look at some of the graph's attributes

In [None]:
print("Number of nodes: ",data.num_nodes)
print("Number of edges:",data.num_edges)
print("Is this an undirected graph?",data.is_undirected())
print("Number of features per node (length of feature vector)",data.num_node_features,"\n")
print("Number of features per edge (length of feature vector)",data.num_edge_features,"\n")

Let's quickly visualize the graph using Networkx ( more on NetworkX below)

In [None]:
from torch_geometric.utils.convert import to_networkx
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

G = to_networkx(data)
nx.draw_networkx( G=G )

## Loading datasets

Let's explore some of the datasets which PyG provides convenient access to. We will first look at [PubMed Diabetes](https://linqs.soe.ucsc.edu/data) citation network dataset.

Each node in the PubMed graph represents a publication pertaining to diabetes and each publication is classifed into one of three categories: "Diabetes Mellitus, Experimental", "Diabetes Mellitus Type 1", "Diabetes Mellitus Type 2".

In addition, each node contains a feature vector which is a [TF/IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) weighted word vector from a dictionary of 500 unique words. 

In [None]:
from torch_geometric.datasets import Planetoid

In [None]:
dataset_pubmed = Planetoid(root="./tmp", name="PubMed")

Let us look at some of the dataset's attributes

In [None]:
print("Name of dataset", dataset_pubmed.name)
print("Number of graphs", dataset_pubmed.len())
print("Number of node features per node in a graph", dataset_pubmed.num_node_features)
print("Number of edge features per edge in a graph", dataset_pubmed.num_edge_features)
print("Number of possible node classes", dataset_pubmed.num_classes)


## Graph Representation in Pytorch

The graphs in a dataset can be accessed by simply indexing the dataset and are represented by an instance of [`torch_geometric.data.Data`](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Data). In the PubMed dataset there is only one graph, which can be  accessed as follows

In [None]:
graph_pubmed = dataset_pubmed[0]
print(graph_pubmed)

The fields available in a PyG graph can be accessed using the `keys` attribute

In [None]:
graph_pubmed.keys 

Let us look at some of the graph's attributes

In [None]:
print("Number of nodes: ",graph_pubmed.num_nodes)
print("Is this an undirected graph?",graph_pubmed.is_undirected())
print("Number of edges (Note: divide by 2 to get number of undirected edges): ",graph_pubmed.num_edges)
print("Unique labels of the nodes: ", graph_pubmed.y.unique())
print("Number of features per node (length of feature vector)",graph_pubmed.num_node_features,"\n")
print("Edge Index representation: \n", graph_pubmed.edge_index,"\n")
print("Example of Node feature (first 50 features only): \n", graph_pubmed.x[0][:50], "\n") 
print("Train mask (first 100 entries only): \n", graph_pubmed.train_mask[:100]) 
print("Labels of training nodes: \n", graph_pubmed.y[graph_pubmed.train_mask] ) 
print("Labels of validation nodes (first 100 entries only): \n", graph_pubmed.y[graph_pubmed.val_mask][:100] ) 
print("Labels of testing nodes (first 100 entries only): \n", graph_pubmed.y[graph_pubmed.test_mask][:100] ) 


# NetworkX

PyG comes with utilities to convert *Data* objects to [networkx](https://networkx.github.io/) graphs. This is often useful if you want to run some graph analysis since *Networkx* provides a lot of graph analysis [functions and algorithms](https://networkx.github.io/documentation/stable/reference/index.html). Additionally the *networkx* package provides many vizualisation tools.


## Converting graphs from PyG to NetworkX format

PyG provides the `to_networkx` function to convert a PyG `Data` object to a networkX `Graph`. Let's convert the pubmed graph to networkX format.

In [None]:
from torch_geometric.utils.convert import to_networkx
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

graph_pubmed_nx = to_networkx(graph_pubmed)


## Visualizing (Sub-)Graphs

PyG comes with utilities to convert *Data* objects to [networkx](https://networkx.github.io/) graphs. This is often useful if you want to run some graph analysis since *Networkx* provides a lot of graph analysis [functions and algorithms](https://networkx.github.io/documentation/stable/reference/index.html). Additionally the *networkx* package provides many vizualisation tools.

 It is often easier to visualize a subgraph of the complete graph. We can use the [*ego_graph*](https://networkx.github.io/documentation/stable/reference/generators.html?highlight=ego%20graph#module-networkx.generators.ego) function in *Networkx* to extract a neighborhood around a random node and draw a layout of that subgraph 

---



In [None]:
from torch_geometric.utils.convert import to_networkx
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

# Convert the class label of the nodes from pytorch to numpy format 
node_labels = graph_pubmed.y.numpy()

# Convert the pubmed graph grom PyG format to NetworkX format
graph_pubmed_nx = to_networkx(graph_pubmed)#.to_undirected()

# Plot 3 random subgraphs
for i in range(3):

  # Pick a random node id
  node_id = np.random.randint(graph_pubmed.num_nodes)

  # Create an ego graph of radius 2 around that node
  graph_pubmed_nx_ego = nx.ego_graph(graph_pubmed_nx, node_id,
                                     radius=2, undirected=True)
  
  # Get the class labels of the nodes in the ego graph
  node_labels_ego=node_labels[graph_pubmed_nx_ego.nodes()]

  plt.figure(1,figsize=(5,5)) 

  # Compute the position of each node for the purpose of plotting
  pos = nx.spring_layout(graph_pubmed_nx_ego)
  
  # Draw the ego graph
  nx.draw(graph_pubmed_nx_ego, pos=pos, cmap=plt.get_cmap('Set1'),
          node_color = node_labels_ego, node_size=50,linewidths=2)

  # Draw the the randomly chosen node alone using a bigger size
  nx.draw_networkx_nodes(graph_pubmed_nx_ego,  cmap=plt.get_cmap('Set1'), pos=pos,
                        nodelist=[node_id], node_size=1000,  node_color = "black")
  
  plt.title("Ego network of node #{} in the Pubmed dataset. ".format(node_id))
  plt.show()


# Optional Exercise

As an exercise, have a look at the various [dataset loaders](https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html) provided in PyG load and explore the ones you find interesting.

# Additional Resources

Basic graph data structures:

[Graph Data Structure Intro from freeCodeCamp ](https://www.youtube.com/watch?v=DBRW8nwZV-g)


Overview of Graph Neural Networks: 

[Recent Advances in Graph Neural Network](https://www.youtube.com/watch?v=YhKUgh0XY50)

