Skip to content

Conversation

dvisockas
Copy link

@dvisockas dvisockas commented Sep 30, 2020

TODO:

  • Point to the README for tests
  • Test for correctness against librosa/sox/scipy/kaldi/etc, torchscriptability, batchable (No reference found)
  • Add documentation

@dvisockas dvisockas mentioned this pull request Sep 30, 2020
@dvisockas dvisockas marked this pull request as draft September 30, 2020 07:29
@dvisockas dvisockas marked this pull request as ready for review October 2, 2020 14:18
@vincentqb
Copy link
Contributor

vincentqb commented Oct 7, 2020

Thanks for working on this :)

The main thing we need is a way of knowing that the algorithm is correct (now and in the future). Do you have a reference you used, e.g. a paper, a book, or a software? Looking at wikipedia as you pointed here, I see two graphs that could be used for testing. A first test would consist of a python/numpy implementation for the graph "Plot of F(x) for A-Law for A = 87.6", and then compare with pytorch. Maybe the other graph can also be turned into a second test.

The first test would be enough, along with a link to wikipedia, which does provide the equation.

@vincentqb
Copy link
Contributor

vincentqb commented Oct 7, 2020

As mentioned here, let me clarify the things that need to be done:

  • open an issue first to get feedback before coding :)
  • test for correctness against librosa/sox/scipy/kaldi/etc
  • test for torchscriptability on CPU and GPU, if possible
  • test the operation is batchable
  • add documentation
  • avoid additional dependencies in the library (e.g. numpy, scipy, etc)

Does the readme for tests give you enough clarity on what each test line mean above?

@dvisockas
Copy link
Author

For the implementation, I used wikipedia formula as a reference. Now I found that python has alaw in its stdlib.

The question is then should I just run a sample wave through audioop.lin2alaw and torchaudio and test for the results to match and then do the same for decoding in transforms_test.py?

How can I test "for torchscriptability on CPU and GPU" and "the operation is batchable"?

Regarding the documentation - I added an A Law entry in the transforms.rst, will that be enough or there are additional things I have to do?

@vincentqb
Copy link
Contributor

Thanks again for working on this :)

For the implementation, I used wikipedia formula as a reference. Now I found that python has alaw in its stdlib.

The question is then should I just run a sample wave through audioop.lin2alaw and torchaudio and test for the results to match and then do the same for decoding in transforms_test.py?

Good catch! Yes, that is a great way to test for correctness. :)

How can I test "for torchscriptability on CPU and GPU" and "the operation is batchable"?

For torchscript-ability, all you have to do is add a test like this one. The test will then be ran automatically on CPU and GPU in CircleCI :) It checks that the python version and the jitted version of a transform give the same result.

For batch-ability, all you need to do is add a test like this one. The test simply checks that "running 3 times the same transformation" versus "running one time on a batch tensor with 3 copies" give the same results.

Regarding the documentation - I added an A Law entry in the transforms.rst, will that be enough or there are additional things I have to do?

Adding an entry for the functional in here and for the transform in here as you point out.

@vincentqb vincentqb mentioned this pull request Oct 9, 2020
8 tasks
@dvisockas
Copy link
Author

Added documentation, tested for torchscriptability anr batchablity, yet the tests seem to be failing. Is there a way to inspect why?

The only thing that is left for doing is the compliance test. Sadly audioop seems to require that the wave chunks should be in binary, have to think about how to do that.

@vincentqb
Copy link
Contributor

vincentqb commented Oct 13, 2020

Added documentation, tested for torchscriptability anr batchablity, yet the tests seem to be failing. Is there a way to inspect why?

There should be a "details" button next to unittest section in the checks list (I believe this is public :) ). On the following page, we can select a system and click. This should lead to here for instance. Does that help?

The tests can also be ran locally with python -m pytest with a freshly compiled version of the code, see here.

@vincentqb
Copy link
Contributor

btw the errors below are fixed on master (#934), so rebasing will fix them :)

self = <torchaudio.datasets.tedlium.TEDLIUM object at 0x000002028AD91B50>
path = 'C:\\Users\\circleci\\AppData\\Local\\Temp\\tmpobcfx03l\\tedlium\\TEDLIUM_release2\\train\\sph\\release2.sph'
start_time = 0, end_time = 32000, sample_rate = 16000

    def _load_audio(self, path: str, start_time: float, end_time: float, sample_rate: int = 16000) -> [Tensor, int]:
        """Default load function used in TEDLIUM dataset, you can overwrite this function to customize functionality
        and load individual sentences from a full ted audio talk file.
    
        Args:
            path (str): Path to audio file
            start_time (int, optional): Time in seconds where the sample sentence stars
            end_time (int, optional): Time in seconds where the sample sentence finishes
    
        Returns:
            [Tensor, int]: Audio tensor representation and sample rate
        """
        start_time = int(float(start_time) * sample_rate)
        end_time = int(float(end_time) * sample_rate)
        if torchaudio.get_audio_backend() == "sox_io":
            return torchaudio.load(path, frame_offset=start_time, num_frames=end_time - start_time)
>       return torchaudio.load(path)[:, start_time:end_time]
E       TypeError: tuple indices must be integers or slices, not tuple

..\env\lib\site-packages\torchaudio-0.7.0a0+8896b41-py3.8.egg\torchaudio\datasets\tedlium.py:171: TypeError

@vincentqb
Copy link
Contributor

I'm sure you've noticed :) but for reference: there's some lint errors

torchaudio/functional.py:409:1: E302 expected 2 blank lines, found 1
torchaudio/transforms.py:573:1: E302 expected 2 blank lines, found 1
flake8 failed. Check the format of Python files.

and the new test is failing

self = <torchaudio_unittest.transforms_test.Tester testMethod=test_a_law_companding>

    def test_a_law_companding(self):
    
        quantization_channels = 256
        compression_param = 83.7
    
        waveform = self.waveform.clone()
        if not waveform.is_floating_point():
            waveform = waveform.to(torch.get_default_dtype())
        waveform /= torch.abs(waveform).max()
    
        self.assertTrue(waveform.min() >= -1. and waveform.max() <= 1.)
    
        waveform_a = transforms.ALawEncoding(quantization_channels, compression_param)(waveform)
        self.assertTrue(waveform_a.min() >= 0. and waveform_a.max() <= quantization_channels)
    
        waveform_exp = transforms.ALawDecoding(quantization_channels, compression_param)(waveform_a)
        self.assertTrue(waveform_exp.min() >= -1. and waveform_exp.max() <= 1.)
    
        segment_length = 1
        small_int_waveform = waveform.to(torch.uint8)
        waveform_bytes = bytearray(small_int_waveform[0, :])
    
        encoded = audioop.lin2alaw(waveform_bytes, segment_length)
        torch_encoded = transforms.ALawEncoding(quantization_channels, compression_param)(small_int_waveform)
>       self.assertEqual(torch.tensor(list(encoded)), torch_encoded[0])

torchaudio_unittest/transforms_test.py:71: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../env/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py:1139: in assertEqual
    self.assertTrue(result, msg=msg)
E   AssertionError: False is not true : Tensors failed to compare as equal! Found 64000 different element(s) (out of 64000), with the greatest difference of 295 (90 vs. 385) occuring at index 200.

@dvisockas
Copy link
Author

Yes, I noticed. Just wanted to push some changes to have an ability to work on other computer.

Regarding the tests, audioop gives different results when encoding with ALaw. The same is with MuLaw. Was MuLaw tested for correctness?

@vincentqb
Copy link
Contributor

Yes, I noticed. Just wanted to push some changes to have an ability to work on other computer.

Regarding the tests, audioop gives different results when encoding with ALaw. The same is with MuLaw. Was MuLaw tested for correctness?

No, this was not against audioop. In fact, if you notice some difference, could you open an issue with a code snippet so we can look into it? Thanks!

@facebook-github-bot
Copy link
Contributor

Hi @dvisockas!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
Added RPC parameter server tutorial
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@arlofaria-cartesia
Copy link

Yes, I noticed. Just wanted to push some changes to have an ability to work on other computer.
Regarding the tests, audioop gives different results when encoding with ALaw. The same is with MuLaw. Was MuLaw tested for correctness?

No, this was not against audioop. In fact, if you notice some difference, could you open an issue with a code snippet so we can look into it? Thanks!

I have also noticed this discrepancy. Here's a test:

import audioop  # NOTE: will be removed from Python stdlib at v3.13!

import torchaudio

# Load as normalized fp32.
waveform, _ = torchaudio.load("https://mod9.io/hi.wav", normalize=True)
ulaw_encoded_torchaudio = torchaudio.functional.mu_law_encoding(waveform, quantization_channels=256)

# Load as int16.
waveform, _ = torchaudio.load("https://mod9.io/hi.wav", normalize=False)
ulaw_encoded_audioop = audioop.lin2ulaw(waveform.numpy().tobytes(), 2)

print("torchaudio", ulaw_encoded_torchaudio[0, :16].tolist())
print("audioop   ", list(ulaw_encoded_audioop)[:16])

This prints:

torchaudio [147, 148, 143, 138, 137, 136, 132, 129, 128, 128, 127, 124, 123, 123, 123, 121]
audioop    [236, 236, 239, 245, 247, 248, 251, 254, 254, 254, 126, 124, 123, 123, 123, 121]

I think the difference might be because TorchAudio is using the Wikipedia definition of the mu-law algorithm, but other audio tools tend to use the reference C implementation of the G711, e.g.:
https://github.com/escrichov/G711/blob/master/g711.c#L202-L263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants