# Task 7: Label Propagation for Node Classification
Given a network with labels on some nodes, how do we assign labels to all
other nodes in the network?

We can potentailly leverage that correlations exist in networks - nearby nodes are similar. Correlations exist becasue : 

- Homophily: The tendency of individuals to associate and bond with similar others

- Influence: Social connections can influence the individual characteristics of a person. 

To leverage this correlation observed in networks to predict node labels,we can classify the label of a node $v$ in network using

- Features of $v$
- Labels of the nodes in $v$’s neighborhood
- Features of the nodes in $v$’s neighborhood

## Relational Clasification

### Probabilistic Relational Classifier

Class probability $Y_v$ of node $v$ is a weighted average of class probabilities of its neighbors.

1. For labeled nodes $v$, initialize label $Y_v$ with ground-truth label $Y_v^*$. For unlabeled nodes, initialize $Y_v = 0.5$.
2. For each node $v$ and label $c$

    $$P(Y_v=c)= \frac{1} {\sum_{(v,u) \in E} A_{v,u}}\sum_{(v,u) \in E} A_{v,u}P(Y_u=c)$$

    If edges have strength/weight information, $A_{v,u}$ can be the edge weight between $v$ and $u$. $P(Y_u=c)$ is the probability of node $u$ having label $c$

3. Update all nodes in a random order until convergence or until maximum number of iterations is reached. 

There are two issues with the method:
- Convergence is not guaranteed
- Node feature information is not used.

## Iterative Clasification

### Interative Classifier
To leverage node level features, iterative classification classify
node $v$ based on its attributes $f_v$ as well as
labels $z_v$ of neighbor set $\mathbf{N}_v$.

This method involves training two classifiers on **labelled training data**:
- **Base classifier** $\phi_1 (f_v)$ predicts node label $Y_v$ based on node feature vector $f_v$. 
- **Relational classifider** $\phi_2 (f_v,z_v)$ predicts label $Y_v$ based on node feature vector $f_v$ and summary $z_v$ of labels of $v$’s neighbors.

When doing the inferences:

1. For each node in the test data
    - set labels $Y_v$ based on the base classifier $\phi_1 (f_v)$
    - compute $z_v$ 
    - predict the labels with $\phi_1(f_v,z_v)$
2. Repeat for each node $w$:
    - update $z_v$ based on $Y_u$ for all $u \in \mathbf{N}_v$
    - update $Y_v$ based on the new $z_v(\phi_2)$
    
3. Update all nodes in a random order until convergence or until maximum number of iterations is reached. Again, convergence is not guaranteed.


## Collective Clasification

### Correct & Smooth



Correct & Smooth takes the following steps
1. Train a base predictor that predict soft labels (class probabilities) over all nodes.
    - Labeled nodes are used for train/validation data.
    - Base predictor can be simple. For example, Linear model/Multi-Layer-Perceptron(MLP) over node features

2. Given a trained base predictor, we apply it to obtain soft labels for all the nodes. We expect these soft labels to be decently accurate.

3. 2-step procedure to postprocess the soft predictions.
    - **Correct step**: The degree of the errors of the soft labels are biased. We need to correct for the error bias.
        - ***Compute training errors of nodes***. The training error is caculated as ground-truth label minus soft label.Defined as 0 for unlabeled nodes.
        - ***Diffuse training errors $E^{(0)}$ along the edges***. The assumption here is that errors are simiar for nearby nodes.
        
        
            $$E^{(t+1)}\leftarrow (1-\alpha) \cdot E^{(t)} +\alpha \cdot \tilde A E^{(t)}$$
            
          Where $\alpha$ is a hypterparamter. $\tilde A$ is the normalized diffusion matrix. It is defined as follows:
          
            $$ \tilde  A \equiv D^{-1/2} A D^{-1/2} $$
          
          Where $A$ be the adjacency matrix and Let $D \equiv \mathrm{Diag}(d_1,..,d_N)$ be the degree matrix.
          
          For more details on this steps, Please refer to [Zhu et al. ICML 2013](https://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf).
        - ***Add the scaled diffused training errors into the predicted soft labels***
          
    - **Smooth step**: The predicted soft labels may not be smooth over the graph. We need to smoothen the corrected soft labels along the edges. The assumption here is that neighboring nodes tend to share the same labels. 
        - ***Diffuse label $Z^{(0)}$ along the graph structure*** 
           
           $$Z^{(t+1)}\leftarrow (1-\alpha) \cdot Z^{(t)} +\alpha \cdot \tilde A Z^{(t)}$$
         
           
C&S achieves strong performance on semisupervised node classification

## References

\[1\][Label Propagation for Node Classification Youtube. Stanford CS224W: Machine Learning with Graphs | 2021](https://www.youtube.com/watch?v=6g9vtxUmfwM&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=14)

\[2\][Label Propagation for Node Classification Slides. Stanford CS224W: Machine Learning with Graphs | 2021](http://snap.stanford.edu/class/cs224w-2021/slides/05-message.pdf)