Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test for CMUArctic Dataset #829

Merged
merged 17 commits into from Jul 28, 2020
Merged

Conversation

subramen
Copy link
Contributor

@subramen subramen commented Jul 27, 2020

PR linked to #821
I have used the same dummy utterance for all the emulated samples.

Requested review: @mthrok

@subramen subramen marked this pull request as draft July 27, 2020 17:37
@subramen subramen marked this pull request as ready for review July 27, 2020 18:41
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall looks good. Added some comments for improvement.

backend = "default"

root_dir = None
URL = "aew" # default url in CMUARCTIC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this URL is used in only one place, and I do not see a need this to be class variable, so you can merge this with string literal.

utterance = "This is a test utterance."

base_dir = os.path.join(cls.root_dir, "ARCTIC", "cmu_us_" + cls.URL + "_arctic")
# Contains utterance ID & sentence prompts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of comments, ones not adding more information than what code expresses are not necessary. You can remove them.

with open(txt_file, "w") as txt:
for i in range(10):
# Write audio file
utterance_id = f"arctic_a{i:04d}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add some of f"arctic_b{i:04d}" patterns?

assert utterance == expected_sample[2]
assert utterance_id == expected_sample[3]
self.assertEqual(expected_sample[0], waveform, atol=5e-5, rtol=1e-8)
assert (i + 1) == len(self.samples)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use temporary variable outside of the loop in which the variable was defined and meant to be used. This works, but this is easy to miss for the other developers who work on this code later. Define a dedicated variable for this.

@@ -32,10 +32,6 @@ def test_speechcommands(self):
data = SPEECHCOMMANDS(self.path)
data[0]

def test_cmuarctic(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the corresponding asset and import statement??

duration=3,
n_channels=1,
dtype="int16",
seed=seed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use different seed value for each generated samples, otherwise all the loaded Tensors have the exactly same shape and value, and that will lose the point of comparing loaded Tensor object.

@mthrok
Copy link
Collaborator

mthrok commented Jul 28, 2020

Could you rebase onto the latest master?

@codecov
Copy link

codecov bot commented Jul 28, 2020

Codecov Report

Merging #829 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #829   +/-   ##
=======================================
  Coverage   89.99%   89.99%           
=======================================
  Files          35       35           
  Lines        2719     2719           
=======================================
  Hits         2447     2447           
  Misses        272      272           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db38fc8...d340fb5. Read the comment docs.

subramen and others added 7 commits July 28, 2020 15:54
Co-authored-by: lawrencechen <lawrencechen@devvm3189.vll0.facebook.com>
* Add test for CommonVoice dataset

* Migrate the existing tests for `bg_iterator` and `diskcache_iterator` to `test/datasets/utils_test.py`

Co-authored-by: Leon Gao <legao@linkedin.com>
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good, but seed value is not quite right.

test/datasets/cmuarctic_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@mthrok mthrok merged commit 33f762f into pytorch:master Jul 28, 2020
@subramen subramen deleted the cmuarctic_test branch July 29, 2020 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants