The graph adjacency matrix format for the model #1

Punchwes · 2021-04-28T15:57:25Z

Hi @alirezamshi , thanks very for sharing this work, very interesting indeed.

I've got a question on the exact labelled graph format when using your example input:

#sample input
input = torch.tensor([[1,2],[3,4]])
graph = torch.tensor([ [[2,0],[0,3]],[[0,1],[4,0]] ])
output = encoder(input_ids=input,graph_arc=graph,)

As you can see above, the input sequence shape is (2,2), assuming: batch size is 2 and sentence length is 2;
The corresponding graph shape you have is (2,2,2), assuming: batch size 2, seq_len * seq_len matrix with each value representing a specific relation.

I am little bit confused about this graph adjacency matrix format: does the 0 represents the relation indexed at 0 or simply no connection.... I cannot find the way how you distinguish between these two scenarios... how you gonna represent both no connection and the relation 0 in the same matrix.... (if you don't have a relation 0, then I guess the embedding layer in your graph model should have the padding_idx to be 0? So that the 0 position will always be 0 and not attended. Currently it seems that the None/0 will have embedding and will be updated each time - which behaves more like a connected relation)

It would be great if you could give me some information on this matter.

(Currently, in order to allow both no connect and relation type 0, I preserve a padding_idx which is also used for the construction of nn.Embedding layer for dp_relation_k and dp_relation_v:
e.g. self.dp_relation_k = nn.Embedding(2*config.label_size+2,self.attention_head_size, padding_idx = 2*config.label_size+1)
I am not sure if it is necessary to do so, or I simply misunderstand your code w.r.t the labelled graph)

Many thanks

The text was updated successfully, but these errors were encountered:

alirezamshi-zz · 2021-04-30T14:47:14Z

Thanks, @Punchwes . That's a great question. Just to clarify, the sample input mentioned in the readme file is a general graph, not necessarily a valid dependency graph. You can find the example of a graph matrix for the dependency graph in Figure 2 of the paper.

For your second question, as mentioned after Equation 4 of the paper, the r_ij element is a one-hot vector representing the relation between token x_i and x_j. For example, if there is a relation from x_i to x_j, then r_ij equals to the id of that label in the lookup table, and r_ji would be the same id plus the size of label set to somehow model the direction. But, if there is no relation between token x_i and x_j, then we actually put 0 in both r_ij and r_ji, which shows the "no relation" label, and it will be learned during training. The reason we also learn the "no relation" label is that it gives the model information about the independence of these two tokens. But, you can freeze it in your application, if you think there is no need to provide this information to the model.

If you have more questions feel free to post them here or email me.

Best regards,

alirezamshi-zz closed this as completed May 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The graph adjacency matrix format for the model #1

The graph adjacency matrix format for the model #1

Punchwes commented Apr 28, 2021 •

edited

alirezamshi-zz commented Apr 30, 2021

The graph adjacency matrix format for the model #1

The graph adjacency matrix format for the model #1

Comments

Punchwes commented Apr 28, 2021 • edited

alirezamshi-zz commented Apr 30, 2021

Punchwes commented Apr 28, 2021 •

edited