Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in feature aggregation #4

Open
lkct opened this issue Feb 9, 2021 · 1 comment
Open

Bug in feature aggregation #4

lkct opened this issue Feb 9, 2021 · 1 comment

Comments

@lkct
Copy link

lkct commented Feb 9, 2021

Hi, @gordicaleksa . Thank you for your implementation of GAT.

I'm new to GNNs so I'm not sure whether I understood your code correctly, but I think there is a bug in the feature aggregation in your GATLayer. The direction of aggregation appears as target->source.

In your implementation 1, attention scores are calculated as follows:

# shape = (NH, N, 1) + (NH, 1, N) -> (NH, N, N) with the magic of automatic broadcast <3
# In Implementation 3 we are much smarter and don't have to calculate all NxN scores! (only E!)
# Tip: it's conceptually easier to understand what happens here if you delete the NH dimension
all_scores = self.leakyReLU(scores_source + scores_target.transpose(1, 2))
# connectivity mask will put -inf on all locations where there are no edges, after applying the softmax
# this will result in attention scores being computed only for existing edges
all_attention_coefficients = self.softmax(all_scores + connectivity_mask)

The three dimensions of all_attention_coefficients mean (head, src, tgt), and you apply softmax on dim=-1 i.e. dim=2, making the scores sum up to 1 for each attention head and each source node.

And then in aggregation:

# shape = (NH, N, N) * (NH, N, FOUT) -> (NH, N, FOUT)
out_nodes_features = torch.bmm(all_attention_coefficients, nodes_features_proj)

Let's ignore the head dimension, then this calculates:
out_nodes_features[i,:] = sum_over_j(all_attention_coefficients[i,j], nodes_features_proj[j,:])
The definition of all_attention_coefficients is (head, src, tgt), and nodes_features_proj (node, feat), where "node" corresponds to "tgt" dim, so out_nodes_features's 2 dims should mean (src, feat).

All of the code above has done the following: calculate attention score for each node as source of edge, and aggregate features of all its neighboring target nodes.
However based on my understanding, the feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target.

The implementation 2 also comes with the same problem. I'm still working to understand impl 3 so I don't know if the big persists.

@decoherencer
Copy link

decoherencer commented Apr 30, 2021

feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target

I think this is not the case in GNNs.
image
Equation from grl book
If we think from the point of the adjacency matrix equation, the source node aggregation depending on the row vector of this A is effectively aggregating target nodes for each source node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants