Apply functional batch consistency tests to batches of different items #1315

jcaw · 2021-02-25T17:57:14Z

The batch_consistency_test.py files check that behavior is consistent on batches vs. items, but the current infrastructure does this with batches of identical items. This approach may miss bugs related to data leakage between items within a batch, for example #994.

This PR makes changes functional/batch_consistency_test.py so (most) batches are generated with items that are not identical, then the entire batch is passed to the consistency check. Batch sizes are parameterized so tests are generated for single-item and three-item batches. I've also changed the white noise generator to generate different data in each channel (rather than duplicating one wave), and the consistency check now checks both inputs and outputs, to detect inconsistencies with in-place operations.

I am still repeating a single item in a couple of places - with a one-channel signal loaded from disk, and for pitch frequency. Using different frequencies within the same batch may be a more robust way to test pitch frequency.

I haven't touched the root batch_consistency_test.py yet, but it seems that this approach could be extended to that file too. Let me know if that sounds sensible.

Closes #1294

Issues that occur when different items in a batch influence one another will not present when a batch is composed of identical items. When checking the consistency of batched behavior, in order to catch these issues items should be different. Thus, use different items for the `functional` batch consistency tests wherever possible.

Don't duplicate a single channel multiple times. Since this is used for testing, generate different noise in each channel so data leakage between channels can be detected.

Rather than creating a batch of 3 items in each test and slicing it to test two different batch sizes at once, parameterize the batch size on the TestFunctional class itself. This will generate a separate set of tests for each batch size (better isolating failures) and removes a leaky abstraction where the test calling `assert_batch_consistencies` had to know to give it a batch size greater than 1.

Check inputs to the batch consistency operations too, to ensure any in-place operations operate the same on items as batches - not just that they output the same result.

Using a 5-second signal for the phaser test takes a long time on CPU, much longer than the other batch consistency tests. Use a shorter signal instead.

mthrok · 2021-02-25T21:34:32Z

Batch sizes are parameterized so tests are generated for single-item and three-item batches.

What do you think about the single-item test? I know this is what was there since the early time and I kept it in the first major test refactoring, but I cannot convince myself that it is serving well as a base case, and I feel something like 2 as a base case and 32 as a common use case might suit better for the purpose of the test. of course 32 increases computational cost for running the test.

I've also changed the white noise generator to generate different data in each channel (rather than duplicating one wave),

Sounds good. nice to know that it does not disturb other tests.

and the consistency check now checks both inputs and outputs, to detect inconsistencies with in-place operations.

Nice!

I am still repeating a single item in a couple of places - with a one-channel signal loaded from disk, and for pitch frequency. Using different frequencies within the same batch may be a more robust way to test pitch frequency.

Testing multiple frequency sounds a good suggestion and improvement.

I haven't touched the root batch_consistency_test.py yet, but it seems that this approach could be extended to that file too. Let me know if that sounds sensible.

I am open for updating the other batch_consistency_test as well, but let's keep it a separate PR. Smaller PRs are easier to review, thus faster approval.

mthrok · 2021-02-25T21:38:52Z

test/torchaudio_unittest/functional/batch_consistency_test.py

+        self.assert_batch_consistency(
+            F.sliding_window_cmn, waveforms, center=False, norm_vars=False)
+
+    def test_vad_from_file(self):
        common_utils.set_audio_backend('default')
        filepath = common_utils.get_asset_path("vad-go-mono-32000.wav")
        waveform, sample_rate = torchaudio.load(filepath)


How about using vad-go-stereo-44100.wav? Its second channel is slightly delayed, so we can consider it different waveform from the first channel.

Also could you replace torchaudio.load with common_utils.load_wav, then remove backend='default' from the class? Using I/O function makes this test depend on both F.vad and torchaudio.load, which is not ideal for unit test as it makes harder to grasp what caused test failure.

That's a good idea. I'll use that instead (and switch the loading method).

I've updated this to use the dual-channel wav. For now I'm ignoring the parameterized batch dimension and passing a two-item batch, but that means the test is generated twice, so it might be worth extracting it. (I could also manipulate the two-item batch depending on the parameterized dimension.)

The stereo wav has two channels, slightly offset, so they'll count as different items.

jcaw · 2021-02-26T00:56:27Z

What do you think about the single-item test?

I assumed that was there to isolate when issues came specifically from adding a dimension, rather than from processing multiple items. I could see the distinction coming up (e.g. reshaping issues) but I can't imagine dropping it would really impede diagnosis.

2 as a base case and 32 as a common use case might suit better for the purpose of the test. of course 32 increases computational cost for running the test.

That's up to you guys. Just ran a quick test and it pushes the run time for the batch_consistency_tests.py suite to about 4 minutes on this laptop (i7-3630QM @ 2.40GHz), which seems a long time, but I haven't benchmarked the other suites. Do you think the overhead is worth it for catching issues that crop up between a batch size of 3 and 32?

I am open for updating the other batch_consistency_test as well, but let's keep it a separate PR. Smaller PRs are easier to review, thus faster approval.

Awesome. In that case I'll get this done first.

The pitch frequency batch test was using the same frequency for each item, which may not catch data leakage between items within a batch. Use different frequencies so these kinds of issues would be triggered, just like the other batch consistency tests.

mthrok · 2021-02-26T21:45:31Z

test/torchaudio_unittest/functional/batch_consistency_test.py

-        self.assert_batch_consistencies(
-            F.contrast, waveform, enhancement_amount=80.)
+        torch.random.manual_seed(0)
+        waveforms = torch.rand(self.batch_size, 2, 100) - 0.5


Could you also take this opportunity to replace these waveform generation with get_whitenoise?

No need, since it's now using common_utils.load_wav

jcaw · 2021-02-27T20:46:08Z

Sorry about that - I just posted a comment that had a couple of things wrong. I've deleted it while I double check everything.

mthrok · 2021-02-28T16:25:12Z

@jcaw

Thanks for the update. I will go ahead and merge this PR so that you can start working on the other batch consistency test. If you come up with an update on top of this PR, you can add that in the new PR.

Regarding my comment about replacing randn with white noise, I think you are right. The inputs are supposed to be spectrogram, so using get_whitenoise looks strange and confusing for the other maintainers. I think we can add a new helper function get_spectrogram, which generates spectrogram on the given waveform or whitenoise if not provided, using librosa.

jcaw added 8 commits February 22, 2021 16:13

Test VAD batch consistency with differing items

d696f76

Generate different white noise in each channel

c2d66dc

Don't duplicate a single channel multiple times. Since this is used for testing, generate different noise in each channel so data leakage between channels can be detected.

Move backend assignment after docstring

08870c9

Check inputs too, to catch in-place operations

f1cbba2

Check inputs to the batch consistency operations too, to ensure any in-place operations operate the same on items as batches - not just that they output the same result.

Don't add dimension before repeating

f63ee8a

Use much shorter sample for phaser test

235b4a3

Using a 5-second signal for the phaser test takes a long time on CPU, much longer than the other batch consistency tests. Use a shorter signal instead.

facebook-github-bot added the CLA Signed label Feb 25, 2021

Pep8 formatting

1d702e5

mthrok reviewed Feb 25, 2021

View reviewed changes

jcaw added 2 commits February 26, 2021 00:46

Load dual-channel wav for VAD test

1d1baa6

The stereo wav has two channels, slightly offset, so they'll count as different items.

Load wav using common_utils, not torchaudio.load

dfeb803

jcaw added 2 commits February 26, 2021 01:01

Remove unused import

5e22b43

mthrok reviewed Feb 26, 2021

View reviewed changes

mthrok approved these changes Feb 26, 2021

View reviewed changes

jcaw added 2 commits February 27, 2021 14:13

Explain justification for single-item batch

19de2a4

Do not set audio backend for VAD test from files

938ff4b

No need, since it's now using common_utils.load_wav

mthrok merged commit 3047dc9 into pytorch:master Feb 28, 2021

This was referenced Mar 2, 2021

Use parametrize and set random seed in functional batch consistency test #1341

Merged

BC-Breaking: Rename waveform tensor in F.sliding_window_cmn #1347

Merged

mthrok mentioned this pull request Jul 21, 2021

Remove lazy behavior from MelScale #1636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply functional batch consistency tests to batches of different items #1315

Apply functional batch consistency tests to batches of different items #1315

jcaw commented Feb 25, 2021

mthrok commented Feb 25, 2021

mthrok Feb 25, 2021 •

edited

jcaw Feb 25, 2021

jcaw Feb 26, 2021 •

edited

jcaw commented Feb 26, 2021

mthrok Feb 26, 2021

jcaw Feb 27, 2021

jcaw commented Feb 27, 2021 •

edited

mthrok commented Feb 28, 2021

Apply functional batch consistency tests to batches of different items #1315

Apply functional batch consistency tests to batches of different items #1315

Conversation

jcaw commented Feb 25, 2021

mthrok commented Feb 25, 2021

mthrok Feb 25, 2021 • edited

Choose a reason for hiding this comment

jcaw Feb 25, 2021

Choose a reason for hiding this comment

jcaw Feb 26, 2021 • edited

Choose a reason for hiding this comment

jcaw commented Feb 26, 2021

mthrok Feb 26, 2021

Choose a reason for hiding this comment

jcaw Feb 27, 2021

Choose a reason for hiding this comment

jcaw commented Feb 27, 2021 • edited

mthrok commented Feb 28, 2021

mthrok Feb 25, 2021 •

edited

jcaw Feb 26, 2021 •

edited

jcaw commented Feb 27, 2021 •

edited