Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time sync decoding for asr #4792

Merged
merged 20 commits into from Dec 24, 2022
Merged

Conversation

brianyan918
Copy link
Contributor

@brianyan918 brianyan918 commented Nov 28, 2022

This PR supports time synchronous decoding for ASR.

Librispeech_100 results:

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
beam20_ctc0.3/dev_clean 2703 54402 94.5 5.1 0.4 0.8 6.3 56.3
beam20_ctc0.3/dev_other 2864 50948 84.6 13.9 1.5 2.1 17.4 79.9
beam20_ctc0.3/test_clean 2620 52576 94.3 5.3 0.4 0.8 6.5 57.0
beam20_ctc0.3/test_other 2939 52343 84.7 13.7 1.6 2.0 17.3 81.6
timesync_beam20_ctc0.3/dev_clean 2703 54402 94.4 5.1 0.5 0.7 6.3 56.6
timesync_beam20_ctc0.3/dev_other 2864 50948 83.9 13.4 2.7 1.8 17.8 80.3
timesync_beam20_ctc0.3/test_clean 2620 52576 94.3 5.2 0.5 0.7 6.5 57.3
timesync_beam20_ctc0.3/test_other 2939 52343 84.1 13.4 2.4 1.8 17.7 82.2

RTF

image

Note on RTF:

  • I have not yet implemented batching for the attentional decoder's fwd pass. This means that compared to the batchified label synchronous algorithm, this time synchronous algorithm is currently slower.
  • This gap is widened on GPU inference, as shown above.

@sw005320 sw005320 added this to the v.202211 milestone Nov 28, 2022
@sw005320 sw005320 added New Features ASR Automatic speech recogntion labels Nov 28, 2022
Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can you add the result in egs2/librispeech_100/asr1/README.md?
  • does this support the GPU inference? (if so, please add some comments)
  • add some test functions. Please check some examples in the normal beam search

sos=asr_model.sos,
decoder=asr_model.decoder,
lm=scorers["lm"] if "lm" in scorers else None,
ctc_weight=ctc_weight,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the weight and scores to be the same interface as the normal BeamSearch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if time_sync:
beam_search = TimeSyncBeamSearch(
beam_size=beam_size,
ctc=asr_model.ctc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some error handling when there is no ctc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lambda: (float("-inf"), float("-inf"))
) # (p_nb, p_b)
tmp = []
for l in hyps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend you add some logging.debug functions of how the hypothesis grows in each time step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

beam_size: num hyps
sos: sos index
ctc: CTC module
pre_beam_ratio: pre_beam_ratio * beam_size = pre_beam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to have more explanations of pre_beam

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sw005320
Copy link
Contributor

@YosukeHiguchi, could you also review this PR?

@mergify
Copy link
Contributor

mergify bot commented Nov 30, 2022

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Nov 30, 2022
@kan-bayashi kan-bayashi modified the milestones: v.202211, v.202301 Dec 11, 2022
@mergify mergify bot added the README label Dec 12, 2022
@mergify mergify bot removed the conflicts label Dec 12, 2022
@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Merging #4792 (796b000) into master (7a203d5) will increase coverage by 0.04%.
The diff coverage is 93.95%.

@@            Coverage Diff             @@
##           master    #4792      +/-   ##
==========================================
+ Coverage   80.62%   80.66%   +0.04%     
==========================================
  Files         535      536       +1     
  Lines       47379    47517     +138     
==========================================
+ Hits        38197    38329     +132     
- Misses       9182     9188       +6     
Flag Coverage Δ
test_integration_espnet1 66.38% <ø> (ø)
test_integration_espnet2 48.62% <18.79%> (-0.16%) ⬇️
test_python 69.17% <93.95%> (+0.07%) ⬆️
test_utils 23.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/bin/asr_inference.py 85.18% <57.89%> (-1.21%) ⬇️
espnet/nets/beam_search_timesync.py 99.23% <99.23%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@brianyan918
Copy link
Contributor Author

@sw005320 I think this passed CI except for an unrelated test

@sw005320
Copy link
Contributor

@sw005320 I think this passed CI except for an unrelated test

I'm working on it #4826

@sw005320
Copy link
Contributor

Can you add a unit test?

@brianyan918
Copy link
Contributor Author

Can you add a unit test?

Sorry for the delay. I added a unit test; it is very basic and just checks that the beam search produces some result. I do not compare to any hand computed values.

Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two comments

  • naming: by following beam_search.py and beam_search_transducer.py, it would be better to name it as beam_search_timesync.py. Please also change the other names (e.g., the class name TimeSyncBeamSearch --> BeamSearchTimeSync accordingly.
  • I'm thinking about where is the best place to put this function. We generally want to add a new function under the espnet2 directory (e.g., espnet2/asr). But at the same time, putting it in its current place also makes sense in terms of putting all beam search-related functions in the same directory. I think we can leave it based on your idea, but I just added this remark for the future beam search design.

@brianyan918
Copy link
Contributor Author

Thanks @sw005320. This is fixed now.

@sw005320 sw005320 added the auto-merge Enable auto-merge label Dec 24, 2022
@sw005320
Copy link
Contributor

Thanks, @brianyan918 !

@mergify mergify bot merged commit 2096f15 into espnet:master Dec 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion auto-merge Enable auto-merge ESPnet1 ESPnet2 New Features README
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants