# ML Tasks

## Operate on:
  * Node
  * Edge
  * Graph

## Task:
  * Classify: Troll or not?
  * Regress: How trollish?
  * Predict: Likely edge?
  
# Problem: How to reuse powerful legacy tabular ML (neural network, random forest) to use graph relationships?

Ex: Twitter misinfo - text analyze looks at content, but not who is sharing

  * **Neural nets**: NLP, Vision, ..
  * **Random forest**: Tables


# Standard Solution: Pipeline!
## Enrich features with precomputed graph stats

```python

df = pd.read_csv('inputs.csv')

# 1. Enrich entities with graph stats features
df['pagerank'] = ...
df['degree'] = ...
df['community'] = ....

# 2. Call random forest or neural net as usual
df['score'] = xgboost(df)
```

## Powerful signal: Give more weight to influencers, ...

## But ignores more subtle relationships

* Ex: Twitter graph with Follow vs Block actions


# Emerging solution: Graph neural nets!

Instead of a word as a vector, why not a node?


  * Started to scale in the last ~year!
  * Initially: Pinterest/Stanford: GraphSAGE for recommendations
  * Now popular: Deepmind (Google's top deep learning R&D team) - AlphaFold, ...
  
## GCN: Graph Convolutional Network

Like a normal neural network, except instead of on text/pictures/.., on nodes!

* Encode each node as a vector based on itself + neighbors
  * "You are your 5 closest friends"
  * Can reuse classic encodings, like NLP word counts, just need to combine them ("sum")
  * Homophily: If all your friends talk about conspiracy X...
* Network layer: All the nodes
  * More network layers, more hops out on the graph


## RGCN: Relational Graph Convolutional Networks for real-world graphs

Focuses on solving scale + hetereogeneity

* Heterogeneous graphs - quality: 
  * Multiple node/edge types, attributes, ...
  * Following vs Blocking very different!
  * Each layer, instead of 1 weight matrix, ...
  * ... many, per relationship
* Heterogeneous graphs - scale: 
  * ... minificiation tricks: block vs follow might be a coefficient instead
  * GPU implementations
  * + multi-GPU
* ... but still quite tricky in practice: APIs pretty difficult!
  

## Next steps

If interested, contact for contributing to anti-covid-misinformation effort at Project Domino (100M+ tweets, ...)

* Small graph ML: NetworkX extension https://github.com/benedekrozemberczki/karateclub
* Big graph ML, esp. RGCN:
  * https://www.dgl.ai/
  * https://github.com/stellargraph/stellargraph
  * PyTorch Geometric