Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The graph adjacency matrix format for the model #1

Closed
Punchwes opened this issue Apr 28, 2021 · 1 comment
Closed

The graph adjacency matrix format for the model #1

Punchwes opened this issue Apr 28, 2021 · 1 comment

Comments

@Punchwes
Copy link

Punchwes commented Apr 28, 2021

Hi @alirezamshi , thanks very for sharing this work, very interesting indeed.

I've got a question on the exact labelled graph format when using your example input:

#sample input
input = torch.tensor([[1,2],[3,4]])
graph = torch.tensor([ [[2,0],[0,3]],[[0,1],[4,0]] ])
output = encoder(input_ids=input,graph_arc=graph,)

As you can see above, the input sequence shape is (2,2), assuming: batch size is 2 and sentence length is 2;
The corresponding graph shape you have is (2,2,2), assuming: batch size 2, seq_len * seq_len matrix with each value representing a specific relation.

I am little bit confused about this graph adjacency matrix format: does the 0 represents the relation indexed at 0 or simply no connection.... I cannot find the way how you distinguish between these two scenarios... how you gonna represent both no connection and the relation 0 in the same matrix.... (if you don't have a relation 0, then I guess the embedding layer in your graph model should have the padding_idx to be 0? So that the 0 position will always be 0 and not attended. Currently it seems that the None/0 will have embedding and will be updated each time - which behaves more like a connected relation)

It would be great if you could give me some information on this matter.

(Currently, in order to allow both no connect and relation type 0, I preserve a padding_idx which is also used for the construction of nn.Embedding layer for dp_relation_k and dp_relation_v:
e.g. self.dp_relation_k = nn.Embedding(2*config.label_size+2,self.attention_head_size, padding_idx = 2*config.label_size+1)
I am not sure if it is necessary to do so, or I simply misunderstand your code w.r.t the labelled graph)

Many thanks

@alirezamshi-zz
Copy link
Collaborator

Thanks, @Punchwes . That's a great question. Just to clarify, the sample input mentioned in the readme file is a general graph, not necessarily a valid dependency graph. You can find the example of a graph matrix for the dependency graph in Figure 2 of the paper.

For your second question, as mentioned after Equation 4 of the paper, the r_ij element is a one-hot vector representing the relation between token x_i and x_j. For example, if there is a relation from x_i to x_j, then r_ij equals to the id of that label in the lookup table, and r_ji would be the same id plus the size of label set to somehow model the direction. But, if there is no relation between token x_i and x_j, then we actually put 0 in both r_ij and r_ji, which shows the "no relation" label, and it will be learned during training. The reason we also learn the "no relation" label is that it gives the model information about the independence of these two tokens. But, you can freeze it in your application, if you think there is no need to provide this information to the model.

If you have more questions feel free to post them here or email me.

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants