Add test for LibriSpeech dataset #825

edwardhdlu · 2020-07-24T17:54:31Z

LibriSpeech test using emulated data as part of #821.

pytest test/datasets/datasets_test.py

codecov · 2020-07-24T22:12:37Z

Codecov Report

Merging #825 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #825   +/-   ##
=======================================
  Coverage   89.99%   89.99%           
=======================================
  Files          35       35           
  Lines        2719     2719           
=======================================
  Hits         2447     2447           
  Misses        272      272

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45761f0...a97c508. Read the comment docs.

mthrok

Looks good. Left some comments for further improvement

mthrok · 2020-07-24T20:59:59Z

test/datasets/librispeech_test.py

+                    f.write('\n'.join(trans_content))
+
+    def test_librispeech(self):
+        dataset = librispeech.LIBRISPEECH(self.root_dir, ext_audio='.wav')


Instead of making ext_audio argument, you can simply add the instance attribute like, which shadows the corresponding class attribute

dataset = librispeech.LIBRISPEECH(self.root_dir) dataset._ext_audio = 'flac'

this way, LIBRISPEECH class can stay as is.

I tried this originally but _ext_audio is used by the walker in __init__, so it appears that it looks for the files before the extension can be updated

You are right. In that case you can temporarily override the class attribute like librispeech.LIBRISPEECH._ext_audio = 'wav' before instantiating it. Do this in setUp method, then revert it in tearDown method so that test failure would not leave the default extension changed.

mthrok · 2020-07-25T19:51:49Z

test/datasets/librispeech_test.py

+
+                    utterance = ' '.join(
+                        [NUMBERS[int(x)] for x in list(
+                            str(speaker_id) + str(chapter_id) + str(utterance_id)


Instead of converting the integers to a list of characters them back to integers, you can simply do NUMBERS[x] for x in [speaker_id, chapter_id, uyterance_id]

mthrok

Looks good. Thanks!

Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>

Edward Lu added 2 commits July 24, 2020 10:50

add new LibriSpeech test

8205bff

fix style

12a539e

mthrok reviewed Jul 25, 2020

View reviewed changes

address comments

a97c508

mthrok approved these changes Jul 27, 2020

View reviewed changes

mthrok merged commit 577796b into pytorch:master Jul 27, 2020

mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022

OSS Automated Fix: Addition of Contributing (pytorch#825)

2cfac4a

Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for LibriSpeech dataset #825

Add test for LibriSpeech dataset #825

edwardhdlu commented Jul 24, 2020

codecov bot commented Jul 24, 2020 •

edited

mthrok left a comment

mthrok Jul 24, 2020

edwardhdlu Jul 26, 2020

mthrok Jul 26, 2020

mthrok Jul 25, 2020

mthrok left a comment

Add test for LibriSpeech dataset #825

Add test for LibriSpeech dataset #825

Conversation

edwardhdlu commented Jul 24, 2020

codecov bot commented Jul 24, 2020 • edited

Codecov Report

mthrok left a comment

Choose a reason for hiding this comment

mthrok Jul 24, 2020

Choose a reason for hiding this comment

edwardhdlu Jul 26, 2020

Choose a reason for hiding this comment

mthrok Jul 26, 2020

Choose a reason for hiding this comment

mthrok Jul 25, 2020

Choose a reason for hiding this comment

mthrok left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 24, 2020 •

edited