Skip to content

Conversation

mthrok
Copy link
Contributor

@mthrok mthrok commented Feb 27, 2020

This PR follows up #366 and adds test for InverseMelScale (and MelScale) for librosa compatibility.

For MelScale compatibility test;

  1. Generate spectrogram
  2. Feed the spectrogram to torchaudio.transforms.MelScale instance
  3. Feed the spectrogram to librosa.feature.melspectrogram function.
  4. Compare the result from 2 and 3 elementwise.
    Element-wise numerical comparison is possible because under the hood their implementations use the same algorith.

For InverseMelScale compatibility test, it is more elaborated than that.

  1. Generate the original spectrogram
  2. Convert the original spectrogram to Mel scale using torchaudio.transforms.MelScale instance
  3. Reconstruct spectrogram using torchaudio implementation
    3.1. Feed the Mel spectrogram to torchaudio.transforms.InverseMelScale instance and get reconstructed spectrogram.
    3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1.
  4. Reconstruct spectrogram using librosa
    4.1. Feed the Mel spectrogram to librosa.feature.inverse.mel_to_stft function and get reconstructed spectrogram.
    4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.)
  5. Check that resulting P1 distance are in a roughly same value range.

Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude.
Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using librosa. For this purpose, threshold was empirically chosen

print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1))
print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1))
>>> p1 dist (orig <-> ta): tensor(1482.1917)
>>> p1 dist (orig <-> lr): tensor(1420.7103)

This value can vary based on the length and the kind of the signal being processed, so it was handpicked.

Closes #366

@mthrok
Copy link
Contributor Author

mthrok commented Feb 27, 2020

@vincentqb Once we settle on the test, we can do;

  1. Squash merge InverseMelScale Implementation #366 to master
  2. Rebase this branch to master
  3. Squash merge this.
    That way the credit of @jaeyeun97 is preserved properly.

@mthrok
Copy link
Contributor Author

mthrok commented Feb 28, 2020

For the record;
the following is what the reconstructed spectrograms look like in test_InverseMelScale test.
As mentioned in my comment and as in the original PR, these values are mostly close but in some places they can be extremely different.

test_InverseMelScale

code for plot
# based on https://gist.github.com/jaeyeun97/8651dff509d5b084636ac6c3a7547108
# thanks @jaeyeun97

def _plot_melspecs(sample_rate, spec_original, spec_ta, spec_lr):
    def log_mag(spec):
        ref = spec.max().clamp(1e-16).log10_()
        spec = spec.clamp(1e-16).log10().sub_(ref).mul_(20).clamp(min=-80)
        return spec.squeeze().numpy()

    spec_original = log_mag(spec_original)
    spec_ta = log_mag(spec_ta)
    spec_lr = log_mag(spec_lr)

    import librosa.display
    import matplotlib
    import matplotlib.pyplot as plt

    matplotlib.use('TkAgg')
    plt.figure(figsize=(20, 10))

    plt.subplot(3, 1, 1)
    librosa.display.specshow(spec_original, sr=sample_rate)
    plt.title('Original')

    plt.subplot(3, 1, 2)
    librosa.display.specshow(spec_lr, sr=sample_rate)
    plt.title('Librosa')

    plt.subplot(3, 1, 3)
    librosa.display.specshow(spec_ta, sr=sample_rate)
    plt.title('Torchaudio')
    plt.savefig(f'test_InverseMelScale.png')

jaeyeun97 and others added 5 commits February 28, 2020 14:01
This PR follows up pytorch#366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility.

For `MelScale` compatibility test;
1. Generate spectrogram
2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance
3. Feed the spectrogram to `librosa.feature.melspectrogram` function.
4. Compare the result from 2 and 3 elementwise.
Element-wise numerical comparison is possible because under the hood their implementations use the same algorith.

For `InverseMelScale` compatibility test, it is more elaborated than that.
1. Generate the original spectrogram
2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance
3. Reconstruct spectrogram using torchaudio implementation
3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram.
3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1.
4. Reconstruct spectrogram using librosa
4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram.
4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.)
5. Check that resulting P1 distance are in a roughly same value range.

Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude.
Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen

```
print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1))
print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1))
>>> p1 dist (orig <-> ta): tensor(1482.1917)
>>> p1 dist (orig <-> lr): tensor(1420.7103)
```

This value can vary based on the length and the kind of the signal being processed, so it was handpicked.
@vincentqb
Copy link
Contributor

As you noticed, test is failing.

        # This threshold was choosen empirically, based on the following observations
        #
        # torch.dist(spec_orig, spec_ta, p=1)
        # >>> tensor(1482.1917)
        # torch.dist(spec_orig, spec_lr, p=1)
        # >>> tensor(1420.7103)
        # torch.dist(spec_lr, spec_ta, p=1)
        # >>> tensor(881.7889)
>       assert torch.dist(spec_orig, spec_ta, p=1) < threshold
E       AssertionError: assert tensor(1658.8206) < 1500.0
E        +  where tensor(1658.8206) = <built-in method dist of type object at 0x7fe5e2937880>(tensor([[[1.5051e+00, 7.9520e-01, 3.0737e-01,  ..., 6.0686e-01,\n          8.8888e-01, 2.8056e+00],\n         [7.7557e-0...e-04, 2.0129e-03],\n         [2.0623e-04, 4.6134e-05, 6.3226e-05,  ..., 6.4552e-05,\n          1.0691e-03, 2.0032e-03]]]), tensor([[[0.3844, 0.4208, 0.1461,  ..., 0.9873, 0.7092, 0.8993],\n         [0.7756, 1.1142, 0.9477,  ..., 0.8303, 1.985...0466, 0.0493, 0.0557,  ..., 0.0246, 0.0529, 0.0000],\n         [0.8467, 0.9353, 0.5868,  ..., 0.6988, 0.8108, 0.9290]]]), p=1)
E        +    where <built-in method dist of type object at 0x7fe5e2937880> = torch.dist
test/test_transforms.py:620: AssertionError

self.assertTrue(torch.allclose(computed, expected))

def test_batch_InverseMelScale(self):
n_fft = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to vary some of these parameters a bit in a few subsequent tests. In particular to also include edge cases to see what the error behavior is. Unless this is verified through other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with running test on multiple parameters.
However, I think that it can be accomplished better with reorganizing the whole test suite.

Right now Tester class contains all kinds of test, (like batch test, torch script test, librosa compatibility test etc...) and it was hard to tell what type of test I should add and where to add.
By creating separate test suite for different test, it will be much easier.
So I think, I would add that kind parameterized test later, and add similar things to the existing ones.

S=spec_lr, sr=sample_rate, n_fft=n_fft, hop_length=hop_length,
win_length=n_fft, center=True, window='hann', n_mels=n_mels, htk=True, norm=None)
# Note: Using relaxed rtol instead of atol
assert torch.allclose(melspec_ta, torch.from_numpy(melspec_lr[None, ...]), rtol=1e-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using self.assertTrue might yield a nicer message if this fails.

In the future and in a separate PR we might want to look into introducing some of the Unittest extensions that PyTorch implements that'll enable things such as self.assertAllClose and also does torch.Tensor specific checks such as dtype,memory layout etc. . allclose might do upcasting, broadcasting etc., but actually we care that those properties match. cc @vincentqb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using self.assertTrue might yield a nicer message if this fails.

I found it opposite. Using assertTrue on torch.allclose only says

>       self.assertTrue(torch.allclose(spec_ta, spec_lr, atol=threshold))
E       AssertionError: False is not true

test/test_transforms.py:618: AssertionError

whereas assert says
(although this is still hard to read due to combination of multiple line messages and pytest's annotation)

>       assert torch.allclose(spec_ta, spec_lr, atol=threshold)
E       AssertionError: assert False
E        +  where False = <built-in method allclose of type object at 0x121552eb0>(tensor([[[0.8752, 0.8655, 0.6858,  ..., 0.7232, 0.3609, 0.2115],\n         [0.7756, 1.1142, 0.9477,  ..., 0.8303, 1.985...0338, 0.0434, 0.0437,  ..., 0.0581, 0.0294, 0.0445],\n         [0.4310, 0.7263, 0.4167,  ..., 0.1131, 0.5628, 0.8183]]]), tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,\n          0.0000e+00, 0.0000e+00],\n         [7.7557e-0...e-04, 2.5366e-04],\n         [3.0709e-11, 5.1357e-12, 3.0357e-12,  ..., 0.0000e+00,\n          4.2634e-11, 1.0426e-10]]]), atol=1.0)
E        +    where <built-in method allclose of type object at 0x121552eb0> = torch.allclose

test/test_transforms.py:618: AssertionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined with your comment on parameterized test, I think reorganizing test structure and using PyTorch's helper functions to show a good example of how to write a test will be great benefit for all developers.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The greater discussion about testing is good to have. I'd say we should definitely reorganizing the tests, and use standard pytorch tools for this, and open issues upstream to improve them too.

Given the original scope of this PR, this is ready to merge. Thanks for working on this @jaeyeun97 and @mthrok!

@vincentqb vincentqb merged commit babc24a into pytorch:master Feb 28, 2020
@mthrok mthrok deleted the inv-mel-spec branch March 2, 2020 15:14
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
Fix typo in word_embeddings_tutorial.py. Thanks Zhiqiang.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants