Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use CE regularizer in chain-model training? #11

Open
glynpu opened this issue Oct 23, 2019 · 1 comment
Open

How to use CE regularizer in chain-model training? #11

glynpu opened this issue Oct 23, 2019 · 1 comment

Comments

@glynpu
Copy link

glynpu commented Oct 23, 2019

Hi,
I wander is it appropriate to add CE regularizer(grad_xent) directly to grad in chain model training?
As implemented as: grad.add_mat(chain_opts.xent_regularize, grad_xent) .

In kaldi's chain model recipe, e.g. aishell s5. The network architecture has two branches after layer tdnn6, one for chain-model(output layer), the other one for CE(output-xent layer).
Derivative matrix grad is applied to "output", while grad_xent is applied to "output-xent".
If grad_xent is merged into grad, there will be no prefinal-xent--> output-xent branch at all.

image

@jzlianglu
Copy link
Owner

That is a good question. I missed that. Indeed, for the current implementation, the CE regularization does not work in my experiments. It is a bit unclear to me why it needs another branch for CE regularization. For the lattice-based sequence training, I did not use the 2nd branch, and CE regularization worked well. I'll do some comparison, and will update the code. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants