New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for LibriSpeech dataset #825
Conversation
Codecov Report
@@ Coverage Diff @@
## master #825 +/- ##
=======================================
Coverage 89.99% 89.99%
=======================================
Files 35 35
Lines 2719 2719
=======================================
Hits 2447 2447
Misses 272 272 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left some comments for further improvement
test/datasets/librispeech_test.py
Outdated
f.write('\n'.join(trans_content)) | ||
|
||
def test_librispeech(self): | ||
dataset = librispeech.LIBRISPEECH(self.root_dir, ext_audio='.wav') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making ext_audio
argument, you can simply add the instance attribute like, which shadows the corresponding class attribute
dataset = librispeech.LIBRISPEECH(self.root_dir)
dataset._ext_audio = 'flac'
this way, LIBRISPEECH
class can stay as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this originally but _ext_audio
is used by the walker in __init__
, so it appears that it looks for the files before the extension can be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. In that case you can temporarily override the class attribute like librispeech.LIBRISPEECH._ext_audio = 'wav'
before instantiating it. Do this in setUp
method, then revert it in tearDown
method so that test failure would not leave the default extension changed.
test/datasets/librispeech_test.py
Outdated
|
||
utterance = ' '.join( | ||
[NUMBERS[int(x)] for x in list( | ||
str(speaker_id) + str(chapter_id) + str(utterance_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of converting the integers to a list of characters them back to integers, you can simply do NUMBERS[x] for x in [speaker_id, chapter_id, uyterance_id]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>
LibriSpeech test using emulated data as part of #821.