Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Libriheavy small and medium ASR2 recipes #5512

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

akreal
Copy link
Contributor

@akreal akreal commented Oct 30, 2023

What?

ASR2 recipe for Libriheavy medium subset.

Why?

I'm running experiments with a causal LM and ASR2 on this dataset and would like to have a comparison point without a causal LM.

See also

https://github.com/k2-fsa/libriheavy

I'll add results and a model link in about a week.

@akreal akreal marked this pull request as draft October 30, 2023 13:14
@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Merging #5512 (10c3bb8) into master (0d0ab98) will decrease coverage by 12.34%.
Report is 5 commits behind head on master.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master    #5512       +/-   ##
===========================================
- Coverage   70.31%   57.97%   -12.34%     
===========================================
  Files         711      710        -1     
  Lines       65757    65670       -87     
===========================================
- Hits        46237    38074     -8163     
- Misses      19520    27596     +8076     
Flag Coverage Δ
test_integration_espnet2 48.61% <ø> (ø)
test_python_espnet1 ?
test_python_espnet2 51.36% <ø> (+<0.01%) ⬆️
test_utils 22.19% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 130 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@ftshijt ftshijt added Recipe ASR Automatic speech recogntion labels Oct 30, 2023
@ftshijt ftshijt added this to the v.202312 milestone Oct 30, 2023
@akreal akreal changed the title [WIP] Add Libriheavy medium ASR2 recipe [WIP] Add Libriheavy small and medium ASR2 recipes Nov 3, 2023
@akreal akreal changed the title [WIP] Add Libriheavy small and medium ASR2 recipes Add Libriheavy small and medium ASR2 recipes Nov 3, 2023
@akreal akreal marked this pull request as ready for review November 3, 2023 13:18
@akreal
Copy link
Contributor Author

akreal commented Nov 3, 2023

I also added the recipe for the small subset. It should be ready for review now.

The official baseline seems to do case sensitive scoring for the non-normalized text, so I added -s to score_opts in these recipes.

sclite skips all lines starting with ** and prints a warning for all lines starting with * (it's hardcoded as a comment character). I've added the removal of * in beginnings of texts in the scoring stage of asr2.sh. I've manually verified that it works as intended on the dev set, it should have no effect on other recipes.

Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @akreal!

@sw005320 sw005320 added the auto-merge Enable auto-merge label Nov 8, 2023
@mergify mergify bot merged commit d610dbc into espnet:master Nov 8, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion auto-merge Enable auto-merge ESPnet2 README Recipe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants