Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detail on softmax #4

Closed
DevinKreuzer opened this issue Feb 11, 2021 · 2 comments
Closed

Detail on softmax #4

DevinKreuzer opened this issue Feb 11, 2021 · 2 comments

Comments

@DevinKreuzer
Copy link

DevinKreuzer commented Feb 11, 2021

Great work!

I have a question concerning the implementation of softmax in the graph_transformer_edge_layer.py

When you define the softmax, you use the following function:

def exp(field):
    def func(edges):
        # clamp for softmax numerical stability
        return {field: torch.exp((edges.data[field].sum(-1, keepdim=True)).clamp(-5, 5))}
    return func

Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector which you then compute .sum() on. The graph_transformer_layer.py layer does not have this .sum() function.

def scaled_exp(field, scale_constant):
    def func(edges):
        # clamp for softmax numerical stability
        return {field: torch.exp((edges.data[field] / scale_constant).clamp(-5, 5))}

    return func

Would appreciate any clarification on this :)

Best,
Devin

@vijaydwivedi75
Copy link
Member

Hi @DevinKreuzer,

The .sum() is done here in graph_transformer_layer.py.

def func(edges):
return {out_field: (edges.src[src_field] * edges.dst[dst_field]).sum(-1, keepdim=True)}

@DevinKreuzer: Shouldn't the attention weights/scores be scalars? From what I see, each head has an 8-dimensional score vector

  • In graph_transformer_edge_layer.py, the process of injecting available edge features is chosen to be feature-dimension wise, i.e. implicit attention scores (per feature dimension) is multiplied with available edge features (per feature dimension), in Eqn. 12 of the paper, and implemented as:

    def func(edges):
    return {implicit_attn: (edges.data[implicit_attn] * edges.data[explicit_edge])}

  • Eqn. 12 outputs a d-dim feature vector (say d is the feature dimension). This d-dim edge feature vector is critical since its passed to the edge feature pipeline (to be maintained at every layer), starting from Eqn. 10, towards Eqns. 16-18 in the paper. In Eqn.11 the features of \hat{w}_{i, j} are summed across the d-dimensions to obtain scalars, which is the .sum() that you mention in your query.

def exp(field):
def func(edges):
# clamp for softmax numerical stability
return {field: torch.exp((edges.data[field].sum(-1, keepdim=True)).clamp(-5, 5))}
return func

Hope this helps for understanding the implementation.
Vijay

@vijaydwivedi75
Copy link
Member

Closing the issue for now. Feel free to open for any (further) clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants