Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and Travis-CI #20

Merged
merged 18 commits into from Dec 22, 2017
Merged

Test and Travis-CI #20

merged 18 commits into from Dec 22, 2017

Conversation

ShigekiKarita
Copy link
Member

I'm working on Travis-CI #18

@ShigekiKarita ShigekiKarita mentioned this pull request Dec 21, 2017
@ShigekiKarita ShigekiKarita changed the title [WIP] Test and Travis-CI Test and Travis-CI Dec 21, 2017
@ShigekiKarita
Copy link
Member Author

@sw005320 I want you to merge this if it is OK.

@ShigekiKarita
Copy link
Member Author

just a moment, I found Travis-CI is falling. https://travis-ci.org/ShigekiKarita/espnet

@ShigekiKarita
Copy link
Member Author

ShigekiKarita commented Dec 21, 2017

Now, Travis is OK. Many tests are skipped because torch is not installed but we can wait for him pytorch/pytorch#4178 (comment)

acc = 0
pad_pred = y_all.data.view(pad_target.size(0), pad_target.size(1), y_all.size(1)).max(2)[1]
mask = pad_target.data != ignore_label
return torch.sum(pad_pred.masked_select(mask) == pad_target.data.masked_select(mask)) / torch.sum(mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final calculation torch.sum(pad_pred.masked_select(mask) == pad_target.data.masked_select(mask)) / torch.sum(mask) is int / int, therefore, in the case of python2, acc become = 0. (I confirmed)

We should cast to float.
I will show my modified version.

def th_accuracy(y_all, pad_target, ignore_label):
    pad_pred = y_all.data.view(pad_target.size(0), pad_target.size(1), y_all.size(1)).max(2)[1]
    mask = pad_target.data != ignore_label
    numerator = torch.sum(pad_pred.masked_select(mask) == pad_target.data.masked_select(mask))
    denominator = torch.sum(mask)
    return float(numerator) / float(denominator)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

for i in range(len(ys)):
acc += torch.sum(pred_pad[i, :ys[i].size(0)] == ys[i].data)
acc /= sum(map(len, ys))
acc = th_accuracy(y_all, pad_ys_out, ignore_label=self.ignore_id)
logging.info('att loss:' + str(self.loss.data))

# show predicted character sequence for debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of chainer, ignore label is -1.
Therefore, we should change as follows.

# now 
                 idx_hat = np.argmax(y_hat_[y_true_ != -1], axis=1)
                 idx_true = y_true_[y_true_ != -1]
# proposed
                 idx_hat = np.argmax(y_hat_[y_true_ != self.ignore_id], axis=1)
                 idx_true = y_true_[y_true_ != self.ignore_id]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think you are talking about show predicted character sequence for debug at Decoder.forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I noticed the current version self.ignore_id=0. Is 0 for CTC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about this?

self.ignore_id = 0 # NOTE: 0 for CTC?

This may have some problems.
@ShigekiKarita, can you tell me why you set 0 here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation, character index for decoder is 1 to odim - 1.
Therefore, 0 is not used in decoder and it might be no effect even if we use 0 as ignore id, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to make more clear, I agree to change it to -1.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kan-bayashi Many thanks for your reply!

@kan-bayashi
Copy link
Member

@ShigekiKarita Hi Shigeki. I added some comments.
Could you reflect them?

self.h_length = self.enc_h.shape[1]
# utt x frame x att_dim
self.pre_compute_enc_h = F.tanh(linear_tensor(self.mlp_enc, self.enc_h))
self.pre_compute_enc_h = linear_tensor(self.mlp_enc, self.enc_h)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to perform tanh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure why I delete it. However I will follow the reference impl of chainer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In eq 9 in https://arxiv.org/pdf/1609.06773.pdf, pre_compute_h corresponds to V h_l that does not have tanh. Anyway it might not be big deal because we cannot see big difference in #9 (comment)

@ShigekiKarita
Copy link
Member Author

Thank you for many comments.

@@ -469,7 +469,7 @@ def forward(self, enc_hs_pad, enc_hs_len, dec_z, att_prev, scaling=2.0):
self.enc_h = enc_hs_pad # utt x frame x hdim
self.h_length = self.enc_h.shape[1]
# utt x frame x att_dim
self.pre_compute_enc_h = linear_tensor(self.mlp_enc, self.enc_h)
self.pre_compute_enc_h = torch.tanh(linear_tensor(self.mlp_enc, self.enc_h))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tanh op is performed for only AttDot in Chainer.
I think it is not needed for AttLoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that is right

@ShigekiKarita
Copy link
Member Author

Does everything look good? @kan-bayashi @sw005320

@sw005320
Copy link
Contributor

can you include pytorch and others in all at Makefile?

@ShigekiKarita
Copy link
Member Author

done

@sw005320 sw005320 merged commit 0c6cbdc into espnet:master Dec 22, 2017
@sw005320 sw005320 mentioned this pull request Jun 15, 2018
zhichaowang pushed a commit to zhichaowang/espnet that referenced this pull request Jan 20, 2021
mergify bot pushed a commit that referenced this pull request Jul 21, 2023
tjysdsg referenced this pull request in tjysdsg/espnet Sep 8, 2023
tjysdsg referenced this pull request in tjysdsg/espnet Dec 13, 2023
Full-size training and wav2vec2 + mbart training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants