Skip to content

Tutorial 7 Bug In D^{-1/2} A D^{-1/2} Math? #154

@johanos

Description

@johanos

Tutorial: 7

Describe the bug
I think there's a math error for this claim "Finally, to take the average instead of summing, we calculate the matrix 𝐷̂ which is a diagonal matrix with 𝐷𝑖𝑖 denoting the number of neighbors node 𝑖 has."

I read this and couldn't get this to be an average. When I did the math out I got this result

$$\left( D^{-1/2}AD^{-1/2}\right)_{ij} = \frac{A_{ij}}{\sqrt{D_{ii}} \sqrt{D_{jj}}}$$

which got me to try to implement the math directly instead of just averaging the rows to see

class GCNLayer(nn.Module):
    
    def __init__(self, c_in, c_out):
        super().__init__()
        self.projection = nn.Linear(c_in, c_out)

    def forward(self, node_feats, adj_matrix):
        """
        Inputs:
            node_feats - Tensor with node features of shape [batch_size, num_nodes, c_in]
            adj_matrix - Batch of adjacency matrices of the graph. If there is an edge from i to j, adj_matrix[b,i,j]=1 else 0.
                         Supports directed edges by non-symmetric matrices. Assumes to already have added the identity connections. 
                         Shape: [batch_size, num_nodes, num_nodes]
        """
        # Num neighbours = number of incoming edges
        num_neighbours = adj_matrix.sum(dim=-1, keepdims=True)

        # Create the diagonal matrix D^{-1/2}
        # Avoid division by zero
        num_neighbours = torch.clamp(num_neighbours, min=1) 
        d_inv_sqrt = torch.pow(num_neighbours, -0.5)

        # Create a diagonal matrix from d_inv_sqrt
        identity = torch.eye(num_neighbours.shape[1]).to(node_feats.device)  
        d_inv_sqrt_matrix = identity * d_inv_sqrt


        # Perform D^{-1/2} adj_matrix D^{-1/2}
        adj_matrix = torch.matmul(d_inv_sqrt_matrix, adj_matrix)  
        adj_matrix = torch.matmul(adj_matrix, d_inv_sqrt_matrix)
        
        node_feats = self.projection(node_feats)
        node_feats = torch.bmm(adj_matrix, node_feats)  

        return node_feats

which yielded results that still line up with the claims presented afterwards, mainly that the graph nodes would output the same values because the set of nodes is the same.

Adjacency matrix
 tensor([[[1., 1., 0., 0.],
         [1., 1., 1., 1.],
         [0., 1., 1., 1.],
         [0., 1., 1., 1.]]])
Input features
 tensor([[[0., 1.],
         [2., 3.],
         [4., 5.],
         [6., 7.]]])
Output features
 tensor([[[0.7071, 1.5607],
         [3.3868, 4.5677],
         [3.9107, 4.8660], <- this node output is the same 
         [3.9107, 4.8660]]]) <- as this node, which was the point of the example. 

So i guess the main thing is that the 'averaging' claim is not fully on the nose, it's more that it's getting scaled by some pairwise factor.

$$D_{ii}^{-1/2} D_{jj}^{-1/2} * A_{ij}$$

at least that's my intuition of the situation

Runtime environment (please complete the following information):
Colab - GPU

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions