Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New beam-search framework: ScorerInterface, CPU/GPU float16/32/64 decoding, and new language models (SeqRNNLM and TransformerLM) #1092

Merged
merged 52 commits into from
Aug 19, 2019

Conversation

ShigekiKarita
Copy link
Member

@ShigekiKarita ShigekiKarita commented Aug 15, 2019

This PR refactors beam search as discussed in #489

I will implement these essential feature not to break current features

  • custom language model training with lm_train.py --model-module xxx
  • ASR decoding with custom language model asr_recog.py --lm xxx (since it is not --rnnlm anymore. I will leave the existing --rnnlm as-is.)

and I will not support the other existing features with custom LMs like ASR training with custom LMs, GPU batch decoding, streaming decoding, word LM, etc in this PR.

@ShigekiKarita ShigekiKarita added WIP Work in process New Features labels Aug 15, 2019
@codecov
Copy link

codecov bot commented Aug 15, 2019

Codecov Report

Merging #1092 into master will increase coverage by 2.29%.
The diff coverage is 87.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1092      +/-   ##
==========================================
+ Coverage   76.36%   78.65%   +2.29%     
==========================================
  Files          83       91       +8     
  Lines        7810     8045     +235     
==========================================
+ Hits         5964     6328     +364     
+ Misses       1846     1717     -129
Impacted Files Coverage Δ
espnet/nets/scorers/ctc.py 100% <100%> (ø)
espnet/nets/pytorch_backend/transformer/decoder.py 95.74% <100%> (+0.62%) ⬆️
espnet/nets/pytorch_backend/ctc.py 96.07% <100%> (ø) ⬆️
espnet/nets/pytorch_backend/e2e_asr_transformer.py 82.52% <100%> (+12.68%) ⬆️
espnet/nets/pytorch_backend/transformer/mask.py 100% <100%> (ø)
espnet/nets/scorers/length_bonus.py 100% <100%> (ø)
espnet/nets/pytorch_backend/rnn/attentions.py 98.15% <100%> (ø) ⬆️
espnet/nets/pytorch_backend/e2e_asr.py 68.42% <100%> (+0.08%) ⬆️
...pnet/nets/pytorch_backend/transformer/attention.py 100% <100%> (ø) ⬆️
espnet/lm/chainer_backend/lm.py 34.58% <33.33%> (-0.05%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7323eaf...f74d810. Read the comment docs.

@sw005320
Copy link
Contributor

and I will not support the other existing features with custom LMs like ASR training with custom LMs, GPU batch ASR decoding, etc.

Actually, the current GPU batch ASR decoding does not do special things for rnnlm. See

def buff_predict(self, state, x, n):
if self.predictor.__class__.__name__ == 'RNNLM':
return self.predict(state, x)
new_state = []
new_log_y = []
for i in range(n):
state_i = None if state is None else state[i]
state_i, log_y = self.predict(state_i, x[i].unsqueeze(0))
new_state.append(state_i)
new_log_y.append(log_y)
return new_state, torch.cat(new_log_y)

So, you could just do the same thing for now.
I want to revisit it after #980.

@sw005320 sw005320 changed the base branch from master to v.0.6.0 August 15, 2019 16:53
@sw005320 sw005320 added this to the v.0.6.0 milestone Aug 15, 2019
@ShigekiKarita
Copy link
Member Author

@sw005320 As there is no breaking change, I expect master is better.

@sw005320
Copy link
Contributor

@sw005320 As there is no breaking change, I expect master is better.

OK.

@sw005320 sw005320 changed the base branch from v.0.6.0 to master August 15, 2019 17:22
@sw005320 sw005320 modified the milestones: v.0.6.0, v.0.5.2 Aug 15, 2019
@ShigekiKarita
Copy link
Member Author

I have finished easy ones: Transformer and length penalty because they are state-less in the current implementation. Tomorrow, I will implement the stateful ones (RNN ASR, RNN LM, CTC Prefix Score). Currently, the old and new implementations results completely same scores. However, IMO, I do not want to reimplement this local pruning with ctc_beam (its value is hard-coded)

if lpz is not None:
local_best_scores, local_best_ids = torch.topk(
local_att_scores, ctc_beam, dim=1)
ctc_scores, ctc_states = ctc_prefix_score(
hyp['yseq'], local_best_ids[0], hyp['ctc_state_prev'])
local_scores = \
(1.0 - ctc_weight) * local_att_scores[:, local_best_ids[0]] \
+ ctc_weight * torch.from_numpy(ctc_scores - hyp['ctc_score_prev'])
if rnnlm:
local_scores += recog_args.lm_weight * local_lm_scores[:, local_best_ids[0]]
local_best_scores, joint_best_ids = torch.topk(local_scores, beam, dim=1)
local_best_ids = local_best_ids[:, joint_best_ids[0]]
else:
local_best_scores, local_best_ids = torch.topk(local_scores, beam, dim=1)

I have no idea to share scores between decoder and ctc modules because every decoder runs independently in my impl.
# scoring
for k, (d, w) in dec_weights.items():
scores[k], states[k] = d.score(hyp.yseq, hyp.states[k], x)
wscores += w * scores[k]

Does this ctc_beam really matter? If so, I will reconsider how to do that...

@sw005320
Copy link
Contributor

Does this ctc_beam really matter? If so, I will reconsider how to do that...

ctc_beam would have a big impact when we have large numbers of output units (e.g., Japanese, Chinese, and BPE). If we don't have it, the beam search would be significantly slower. Actually, the current batch beam search does not have ctc_beam and causes significant speed degradation for Japanese, Chinese, and BPE. One of the benefits of #980 is to fix this.

@ShigekiKarita
Copy link
Member Author

OK. I got how to do it. For simplicity, I will add new option like pre_beam_ratio that is same to this

ctc_beam = min(lpz.shape[-1], int(beam * CTC_SCORING_RATIO))

So first beam_search performs topk with pre-beam size on pre-decoders scores (s2s, nn-lm), then add post-decoders scores (ctc, fst-lm?), and finally topk with the full beam size.

@ShigekiKarita
Copy link
Member Author

@kan-bayashi I have a suggestion for flake8-docstrings. I like this but it is not checked by our CI. So I want CI to do that as follows:

  1. pip install flake8-docstrings in ci/install.sh
  2. change ci/test_python.sh like
# --extend-ignore for wip files for flake8-docstrings
flake8 --extend-ignore=D espnet test utils

# white list of files that should support flake8-docstrings
flake8 \
  espnet/nets/beam_search.py \
  espnet/nets/lm_interface.py \
  espnet/nets/scorer_interface.py \
  ...

see how to disable flake8-docstrings temporally https://stackoverflow.com/questions/41860123/how-to-ignore-installed-flake8-plugin-easily-for-one-time

@kan-bayashi
Copy link
Member

That’s a nice idea.
Maybe the blacklist is better?
Anyway, let us introduce it.

@ShigekiKarita
Copy link
Member Author

I agree. I made the black list. No more invalid docstrings 🍺

Now, I need to fix ppl things in lm.py. Just a moment.

@ShigekiKarita
Copy link
Member Author

I believe it is done. Additionally, I split the flake8 part in ci/test_python.sh to ci/test_flake8.sh because it also helps rapid docstring annotation in local. I also updated doc/README.md for this.

@ShigekiKarita
Copy link
Member Author

Fixed!

Copy link
Member

@kan-bayashi kan-bayashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-Authored-By: Tomoki Hayashi <hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp>
@sw005320
Copy link
Contributor

Many thanks!

@sw005320 sw005320 merged commit 4dbb467 into espnet:master Aug 19, 2019
@ShigekiKarita ShigekiKarita changed the title New beam-search framework: ScorerInterface New beam-search framework: ScorerInterface and new language models (SeqRNNLM and TransformerLM) Aug 20, 2019
@ShigekiKarita ShigekiKarita changed the title New beam-search framework: ScorerInterface and new language models (SeqRNNLM and TransformerLM) New beam-search framework: ScorerInterface, CPU/GPU float16/32/64 decoding, and new language models (SeqRNNLM and TransformerLM) Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants