# Stanford Graph Learning Workshop 2022

### Introduction

Given a graph structure, when we want to make a prediction at a given node, the network needs to account for the graph data/representation such as connectivity. They depend on message passing and aggregation algorithms.

Views:
- GNNs learn to combine features from neighbouring nodes
- GNNs learn the graph patterns and relations

Benefits:
- Adapt to the shape of the data

CNNs/transformers are a specific application of GNNs!

Use-cases:
- Drug discovery, recommender systems, 

This is one of the hottest ML topics right now (top 4 keyword in ICLR 2022).

Example:
- Financial networks: describe financial entities and their relations
- Tasks: fraud detection, anomaly detection, credit scores, etc

We can identifiy at the node, graph and edge levels. Their algorithms helped a small European country identify transaction patterns over time.

#### ROLAND: Tool for dynamic graphs
- Provides scalable and adaptive training for high accuracy models
- Built on top of PyG GraphGym
- PyG: open-source GNN library that provides state-of-the-art Graph Representation Learning
- pyg-lib: a low-level GNN engine to accelerate PyG
- Most widely used framework for GNNs
- Dedicated sparsity-aware CUDA kernels (for higher performance)

PyG is used all the way from research to large companies in industry. There are also many Graph ML Tutorials and courses by Stanford. We can reach them through their Slack channel.
- Course: http://web.stanford.edu/class/cs224w/
- Tutorials: https://medium.com/stanford-cs224w

#### Scaling-up Graph Learning
- Came up with concept of a _Graph Store_ and _Feature Store_ to make graph storage more efficient
- Partnered with NVIDIA (GPUs) and Intel (CPUs) for further speed increases

They are also releasing a few Graph Data Benchmarks: datasets to compare different GNN structures and to understand performance.

#### Knowledge Graphs
A way to capture human knowledge in a machine-understandable form. Can range from common-sense to industry knowledge. They have worked on predicting a node that might target two other nodes. In collaboration with Google, they have increased computing to compute large-scale knowledge graphs.
- Framework: SMORE --> scalable framework for multi-hop knowledge tasks
- Knowledge graphs can provide additional supervision to language models created (such as reasoning, unstructred information)


## What's new in PyG
__Author__: Matthias Fey

Neural message passing schemes:
- Data-dependent computation graphs
- Generalization of any neural network to GNNs
- GNNs are very challenging and general

Graphs are typically sparse and irregular, which makes them tricky to implement. They can also describe numerical, categorical, and many other types of data simultaneously! Graphs may change over time. We may want to learn one large graph, or many small graphs. We also want to be applicable to various tasks.

PyG was created on top of PyTorch to unify all these different requirements. Build on 4 main principles.
- Graph-based neural network building blocks
- In-memory graph storage, datasets & loaders (supports graphs with various data types)
- Provides graph transformations & augmentations (e.g. graph diffusion, missing feature value imputation, etc)
- Have prepared various examples & tutorials to learn more about GNNs with videos and blogs!!

To use:
- First define a dataset
- Then like in PyTorch, create a dataLoader
- Create a normal PyTorch module, but you can now use your own GNN layers! (SAGE)
- Train 

#### PyG Progress/Evolution
- Open-source in 2017
- Released a paper based on this in 2019
- Began collaborating with OGB
- Last year, introduced Stanford Partnership
- Next released will come in Novemeber and aims to improve acceleration and scalability

#### Announcements
- Major architecture change: a new GNN engine
- New optimizations: Principled aggegations and improved scalability
- pyg-lib: a unified GNN engine for optimized low-level graph routines

#### Accelerating Heterogeneous GNNs
- HeteroData: in-memory storage
- Heterogeneous graph samplers
- Heterogeneous GNN layers
- to_hetero() is powerful, but lacks parallelism across node/edge types
- pyg-lib supports parallel type-dependent transformatons via NVIDIA CUTLASS integration

#### Principled Aggregations
- Choosing the neighbourhood aggregation is a central topic in Graph ML
    - mean-distribution: global features
- We have seen an adaptable SoftMax aggregator that can act as both: Deeper-GCN paper
- Aggregations are a first-class principle in PyG
    - All aggregations are a first-class principle
   
- PyG now also simplifies the implementation of scalable link prediction tasks
- PyG aim to support any backend by providing FeatureStore and GraphStore abstractions
- "in-memory": storage is on the main memory of the computer
- "in-memory": database
- "single-node in-memory"-->

- __Captum__: Enables prediction explainability for any GNN models!

## Building PyG Open Source Community
__Author__: Ivaylo Bahtchevanov (product manager of PyG)

- PyG has recently been gaining lots of popularity across research and industry!
- Common use-cases:
    - Financial transactions (model interactions between entities)
    - Fraud and risk detection using anomaly/outlier detection
    - Validate smart contracts on existing blockchain
- Security
    - Identify compromised systems
- Recommenders
- Know-your customers
- Drug discovery

- Various libraries have been developed based on PyG
- e.g. PyTorch Geometric Temporal (signals that vary across space and time)
- Quiver
- PyTorch Geometric Signed Directed
- PyGod: Outlier detection
- Graphein: Protein and RNA sequences
- GraphFramEx --> Systematic performance evaluation

## Kumo.ai – Scaling-up PyG
__Author__: Manan Shah & Dong Wang

- Will introduce GraphStores
- Then will talk about how Kumo has leveraged this

#### Graph Learning:
- Graphs (edges and nodes) contain features (tensors)
- Message passing performs scatter/gather on features between node space and edge space
- Scaling to data larger than GPU VRMA requires training on sampled subgraphs instead of the entire graph
- Adds stochasticity but reduces GPU memory requirements to those of the sampled subgraphs
- Data parallelsim requires replicating the graph and features in each compute node
- Scalability: only processing sampled subgraphs

- Independent scale-out: no longer constrained to single-node, in-memory datasets
- Feature store: contains features of different nodes
- Graph store: contains the graph structure supports efficient sampling
- Sampler: operates on a graph store to sample a subgraph from root nodes and related parameters
- Data Loader: Lives on the compute node, fetches samples fom graph store through sampler

- Putting it all together (new version):
    - Looks very similar to previous version
    - Adding features to a custom feature store is easy, just define the store, and let PyG handle the syntactic sugar
    - Adding edges is simple: specific the edge tensor, type and layout, and the custom graph stores implementation
- Summary: they make their remote distributed computing act very similarly to how we would do single-node computing
    - Be sure to monitor throughputs of feature fetching, sampling and other bottlenecks

- How Kumo.ai build large graphs at scale
    - In memory graph store needs a very large amount of memory

- __Alternative to PyG: DGL__

## Podcast Recommendations with GNNs (Spotify)
__Author__: Andreas Damianou

- Recommendation system: User features and product features and connecting users to products
- GNNs enable explicit use of the graph structure present in these interactions
- Combined graph: bringing together heterogeneous knowledge from different parts of the platform!
    - Holistic representations of users and content
- They deal with heterogeneous graphs!
- Graph design:
    - Semi-automated vs. hand-crafted?
    - Focus on entities (KG approach) or consumption patterns (RecSys approach)
- Their approach:
    - First generate possible node candidates
    - Second, rank possible candidates --> Chooses features for podcasts and users
    - Finally the model can be calibrated --> 
- Predictions: Networks learn a __embedding space__, where NN is used to pick a suggestion based on a query
- Need to capture both collaborative filtering and representation learning effects!
- Often better to enhance rather than replace production recommender system (serve embeddings over link predictions)

## Enabling Enterprises to Query the Future using PyG
__Authors:__ Hema Raghavan & Tin-Yun Ho, Kumo.ai

- ML Lifecycle:
    - Input data cleaning, curation
    - Target label engineering
    - Feature engineering
    - Architecture & Hyperparameter search
    - ML Ops
    
- Machine leaning productivity is a bottleneck
- Finding new high ROI problems: where to mine for gold!
- Most data in enterprises are not text or images, but graph-based!
- There is a publicly available Kaggle dataset on H&M meta-data (unstructured)
- Kumo platform: predictive querying --> Data cleaning and curation
    - E.g. Predict customers that have customers that haven't had transactions in 60 days, to find which ones won't make any more transactions in the next 30 days, predict which of these are least likely to churn
- Kumo seems like a really interesting product!
    - They don't need their input dataset to be in a 'training' table!

## Graph AI to Enable Precision Medicine
__Author:__ Marina Zitnik (Broad Institute)

Applications: diagnostics and treatment!
- $>$ 7000 rare diseases in the US alone
- Patients with these diseases: very different phenotypes and clinicians may have never seen a patient like this
- Diagnostic delay is pervasive and leads to problems for patients
- SHEPHERD is few-shot learning AI for multi-faceted diagnosis of patients with rare diseases
- Base-model: self-supervised pre-training to embed a biomedical KG
- Individual patient information is overlaid on the KG!
- SHEPHERD is trained on a cohort of simulated patients, and was evaluated on two external patient cohorts
- For 70% of patients, we would not have been able to classify them based on their disease phenotypes!
- Allows us to query knowledge graph learn to find out more about the disease
- Graphs: necessary, not only beneficial!

- The way of thinking about disease/drugs:
    - Disease == perturbation in normal functioning of individual
    - Find medicine that will remove perturbation
- Networks could be used to repurpose drugs!
    - In fact, there are direct target drugs, that directly target the cause of issue
    - Network-based drugs: cause a cascade of reactions that treat the underlying condition!
- All their datasets are publicly available on their website!

## Challenges and Solutions in Applying Graph Neural Networks at Google
__Author:__ Bryan Perozzi

- Research scientist at Google
- One of the pioneers of graph embeddings work
- Co-authored work on DeepWalks (took word2vec to graphs)
- Started with PageRank, now they're trying to use ML with the sheer volume of data they have!
- He's the head of the GNN team
- Big graphs are very complex and aren't homphilus (connections don't necessarily mean similarity for a downstream task)
- Challenges for GNNs (over time):
    - Heterogeneity:
    - Scale: Big GNNs are really slow! Models get slower as they go deeper in the graph
    - What kind of graph do we use?
        - Given a partially labelled set of nodes, how do we predict things for the remaining nodes?
        - Graph design problem: given a multi-modal feature space and partial labelling, can we learn an algorithm that will give us the right graph?
        - Grale: scalable solution
    - Generative/

- Message passing allows flexibility:
    - typically use custom message passing operation for each particular task
    - TF-GNN: Tensor-flow GNN framework
    
- GraphWorld:
    - Simulate millions of GNN task datasets
    - Can be used to benchmark GNN tasks!
    - Very nice to benchmark by just dropping an algorithm in
    
- Biased samples can really affect GNN performance
- Shift-robust GNNs

## Dynamic and Signed GNNs for Web Safety and Integrity - Applications to Bad Actor Detection on Social Media Platforms
__Author:__ Srijan Kumar

- 145M fake accounts on FB
- $>$ 300M fake reviews on Amazon
- They create dynamic GNNs for web safety and integrity
- Temporal interaction networks: flexible way to represent time-evolving relations
- Jodie: mutually-recursive RNN framework
    - User RNN and Item RNN --> update component for 'users' and 'items'
    - From Kalman filters: forecasting the embeddings
- Model works already at facebook
- Signed dynamic networks
    - Signed networks: edges are positive or negative
    - Very useful to detect things like conflict and toxicity
- Prediction: signed links with GNNs
    - SEMBA: balanced aggregation

## Graph Mining for Next-Generation Intelligent Assistants on AR/VR Devices
__Author:__ Luna Dong (Meta)

- Aim: Meta's assistant
- Meta has Smart Glasses
- Assistant goes from sound-only to multi-modal
- ASR + CV --> GNN for integrating multi-modal information
- They want to create a knowledge graph based on knowledge, behaviour and social behaviour!

## Graph Learning in NLP Applications
__Author:__ Michi Yasunaga

- Corpus is not a list of documents, but a __GRAPH__ of documents!
- Knowledge graphs can capture lots of latent relations about entities
- His aim is to create a language-knowledge model
- Looked into LinkBERT and DRAGON models to combine them
- All his papers and code are available online

AT THIS POINT I GOT TIRED AND STOPPED ATTENDING, ALMOST MIDNIGHT HERE :'(

In [2]:
import torch
from time import time
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
Data(edge_index=[2, 4], x=[3, 1])

OSError: /home/joao/Desktop/GNN-Tutorial/env/lib/python3.8/site-packages/torch_sparse/_spmm_cuda.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE