New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test for CMUArctic Dataset #829
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, overall looks good. Added some comments for improvement.
test/datasets/cmuarctic_test.py
Outdated
backend = "default" | ||
|
||
root_dir = None | ||
URL = "aew" # default url in CMUARCTIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this URL
is used in only one place, and I do not see a need this to be class variable, so you can merge this with string literal.
test/datasets/cmuarctic_test.py
Outdated
utterance = "This is a test utterance." | ||
|
||
base_dir = os.path.join(cls.root_dir, "ARCTIC", "cmu_us_" + cls.URL + "_arctic") | ||
# Contains utterance ID & sentence prompts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of comments, ones not adding more information than what code expresses are not necessary. You can remove them.
test/datasets/cmuarctic_test.py
Outdated
with open(txt_file, "w") as txt: | ||
for i in range(10): | ||
# Write audio file | ||
utterance_id = f"arctic_a{i:04d}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some of f"arctic_b{i:04d}"
patterns?
test/datasets/cmuarctic_test.py
Outdated
assert utterance == expected_sample[2] | ||
assert utterance_id == expected_sample[3] | ||
self.assertEqual(expected_sample[0], waveform, atol=5e-5, rtol=1e-8) | ||
assert (i + 1) == len(self.samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use temporary variable outside of the loop in which the variable was defined and meant to be used. This works, but this is easy to miss for the other developers who work on this code later. Define a dedicated variable for this.
@@ -32,10 +32,6 @@ def test_speechcommands(self): | |||
data = SPEECHCOMMANDS(self.path) | |||
data[0] | |||
|
|||
def test_cmuarctic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the corresponding asset and import
statement??
duration=3, | ||
n_channels=1, | ||
dtype="int16", | ||
seed=seed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use different seed
value for each generated samples, otherwise all the loaded Tensors have the exactly same shape and value, and that will lose the point of comparing loaded Tensor object.
Could you rebase onto the latest master? |
Codecov Report
@@ Coverage Diff @@
## master #829 +/- ##
=======================================
Coverage 89.99% 89.99%
=======================================
Files 35 35
Lines 2719 2719
=======================================
Hits 2447 2447
Misses 272 272 Continue to review full report at Codecov.
|
Co-authored-by: lawrencechen <lawrencechen@devvm3189.vll0.facebook.com>
* Add test for CommonVoice dataset * Migrate the existing tests for `bg_iterator` and `diskcache_iterator` to `test/datasets/utils_test.py` Co-authored-by: Leon Gao <legao@linkedin.com>
…to cmuarctic_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good, but seed value is not quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
PR linked to #821
I have used the same dummy utterance for all the emulated samples.
Requested review: @mthrok