Question about alphabet #20

ghost · 2018-05-28T01:40:03Z

Hello,I have a problem about the alphabet.
In the file alphabet.py ,the function size returns len(self.instances) + 1,I think it is cause of the padding /pad,but in the file seqmodel.py,why we have to add two more labels for down stream lstm?Though we use the original label size for CRF,actually in the file CFR model,in the transition matrix ,still add "start" and "end".this confused me.
And if I do not use CRF,_, tag_seq = torch.max(outs, 1)maybe lead to the wrong index.
Thank you~

The text was updated successfully, but these errors were encountered:

ghost · 2018-05-28T08:01:17Z

Sorry,maybe I have not described this problem clearly.
For example,I have a label set {"label1":1,"label2":2},so the function size returns 3 cause of the length of instance is 2.
When use_crf=False,

# add two more label for downlayer lstm, use original label size for CRF
label_size = data.label_alphabet_size
data.label_alphabet_size += 2
# so the hidden2tag 
self.hidden2tag = nn.Linear(data.HP_hidden_dim, data.label_alphabet_size)

the output size of hidden2tag is 5,
when use _, tag_seq = torch.max(outs, 1),the result maybe 0-4,but the label alphabe only has two labels,index 1 and index 2.
Thank you.

jiesutd · 2018-05-28T08:46:54Z

I think I get your point. I also had this concern during writing this framework.

The "START" and "END" is for the CRF calculation. In the CRF layer, I set some of the default transition scores to -10000 to avoid the "START" and "END" output.

To keep the code simple, the "START" and "END" are also added in the model with softmax output. Theoretically, the model may decode some invalid labels "START/END" but it is almost impossible in real data. When the model is trained with the training data, it will not decode the invalid labels as they do not exist in the training data.

ghost · 2018-05-28T09:34:09Z

Thank you.
In other words,once after training,the _, tag_seq = torch.max(outs, 1) almost will not produce invalid output.

jiesutd · 2018-05-28T09:36:40Z

Exactly!

jiesutd closed this as completed May 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about alphabet #20

Question about alphabet #20

ghost commented May 28, 2018

ghost commented May 28, 2018

jiesutd commented May 28, 2018 •

edited

ghost commented May 28, 2018

jiesutd commented May 28, 2018

Question about alphabet #20

Question about alphabet #20

Comments

ghost commented May 28, 2018

ghost commented May 28, 2018

jiesutd commented May 28, 2018 • edited

ghost commented May 28, 2018

jiesutd commented May 28, 2018

jiesutd commented May 28, 2018 •

edited