Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math?

**Tutorial**: 7

**Describe the bug**
I think there's a math error for this claim "Finally, to take the average instead of summing, we calculate the matrix  𝐷̂   which is a diagonal matrix with  𝐷𝑖𝑖  denoting the number of neighbors node  𝑖  has." 

I read this and couldn't get this to be an average. When I did the math out I got this result

```math
\left( D^{-1/2}AD^{-1/2}\right)_{ij} = \frac{A_{ij}}{\sqrt{D_{ii}} \sqrt{D_{jj}}} 
```

which got me to try to implement the math directly instead of just averaging the rows to see 

```python
class GCNLayer(nn.Module):
    
    def __init__(self, c_in, c_out):
        super().__init__()
        self.projection = nn.Linear(c_in, c_out)

    def forward(self, node_feats, adj_matrix):
        """
        Inputs:
            node_feats - Tensor with node features of shape [batch_size, num_nodes, c_in]
            adj_matrix - Batch of adjacency matrices of the graph. If there is an edge from i to j, adj_matrix[b,i,j]=1 else 0.
                         Supports directed edges by non-symmetric matrices. Assumes to already have added the identity connections. 
                         Shape: [batch_size, num_nodes, num_nodes]
        """
        # Num neighbours = number of incoming edges
        num_neighbours = adj_matrix.sum(dim=-1, keepdims=True)

        # Create the diagonal matrix D^{-1/2}
        # Avoid division by zero
        num_neighbours = torch.clamp(num_neighbours, min=1) 
        d_inv_sqrt = torch.pow(num_neighbours, -0.5)

        # Create a diagonal matrix from d_inv_sqrt
        identity = torch.eye(num_neighbours.shape[1]).to(node_feats.device)  
        d_inv_sqrt_matrix = identity * d_inv_sqrt


        # Perform D^{-1/2} adj_matrix D^{-1/2}
        adj_matrix = torch.matmul(d_inv_sqrt_matrix, adj_matrix)  
        adj_matrix = torch.matmul(adj_matrix, d_inv_sqrt_matrix)
        
        node_feats = self.projection(node_feats)
        node_feats = torch.bmm(adj_matrix, node_feats)  

        return node_feats
```

which yielded results that still line up with the claims presented afterwards, mainly that the graph nodes would output the same values because the set of nodes is the same. 
```
Adjacency matrix
 tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features
 tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features
 tensor([[[0.7071, 1.5607],
         [3.3868, 4.5677],
         [3.9107, 4.8660], <- this node output is the same 
         [3.9107, 4.8660]]]) <- as this node, which was the point of the example. 
```

So i guess the main thing is that the 'averaging' claim is not fully on the nose, it's more that it's getting scaled by some pairwise factor. 
```math
D_{ii}^{-1/2} D_{jj}^{-1/2} * A_{ij}
```

at least that's my intuition of the situation 

**Runtime environment (please complete the following information):**
Colab - GPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions