Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contrast to functional #551

Merged
merged 3 commits into from
Apr 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/source/functional.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,41 @@ Functions to perform common audio operations.

.. autofunction:: equalizer_biquad

:hidden:`bandpass_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: bandpass_biquad

:hidden:`bandreject_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: bandreject_biquad

:hidden:`band_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: band_biquad

:hidden:`treble_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: treble_biquad

:hidden:`deemph_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: deemph_biquad

:hidden:`riaa_biquad`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: riaa_biquad

:hidden:`contrast`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: contrast

:hidden:`mask_along_axis`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 4 additions & 0 deletions test/test_batch_consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ def test_istft(self):
])
_test_batch(F.istft, stft, n_fft=4, length=4)

def test_contrast(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use simple random tensor of value range [-1.0, 1.0] i.e. torch.rand(2, 100) - 0.5 ?

The reason is that we do not want this unit test to rely on other library function torchaudio.load.
That way when torchaudio.load is broken, this test is not affected.
I know that test_detect_pitch_frequency does this but it's only because of recent refactoring and in my opinion it should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok I have this modified .
Are torchaudio functionals conditioned on normalization ?

  • Because sox in some functions it internally clips the values as per max and min of input.
  • I assumed normalized waveform in writing contrast function .
  • In tests also i didn't see anywhere normalization=False.
  • If unnormalized waveform is passed, the outputs will differ between existing SoxEffects and functional contrast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhargavkathivarapu

I am not sure if there is/was a design principle for audio normalization but since torchaudio.load has normalization=True by default, and all the tests are written in normalization=True so I think that's de-facto standard.

Talking about batch consistency test here specifically, the purpose of the test is to assure that function returns the same result regardless of batching and not to assure the result is same as SoX. So it's okay to put whatever the tensor, but to be closer to the actual situation, I thought rather than torch.rand (which produces [0, 1]), torch.rand - 0.5 would be slightly better. You can normalize it to be further closer if you would like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if there is/was a design principle for audio normalization but since torchaudio.load has normalization=True by default, and all the tests are written in normalization=True so I think that's de-facto standard.

Yes, we assume that waveforms are in the range [-1, 1] in torchaudio.

We should consider documenting this convention in the readme, and guaranteeing it in some form. A caveat is that, when doing a transformation that makes the waveform fall outside (e.g. gain), there may be more than one way to bring the signal back to the [-1, 1] range.

waveform = torch.rand(2, 100) - 0.5
_test_batch(F.contrast, waveform, enhancement_amount=80.)


class TestTransforms(unittest.TestCase):
"""Test suite for classes defined in `transforms` module"""
Expand Down
18 changes: 18 additions & 0 deletions test/test_sox_compatibility.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,24 @@ def test_riaa(self):

torch.testing.assert_allclose(output_waveform, sox_output_waveform, atol=1e-4, rtol=1e-5)

@unittest.skipIf("sox" not in BACKENDS, "sox not available")
@AudioBackendScope("sox")
def test_contrast(self):
"""
Test contrast effect, compare to SoX implementation
"""
enhancement_amount = 80.
noise_filepath = common_utils.get_asset_path('whitenoise.wav')
E = torchaudio.sox_effects.SoxEffectsChain()
E.set_input_file(noise_filepath)
E.append_effect_to_chain("contrast", [enhancement_amount])
sox_output_waveform, sr = E.sox_build_flow_effects()

waveform, sample_rate = torchaudio.load(noise_filepath, normalization=True)
output_waveform = F.contrast(waveform, enhancement_amount)

torch.testing.assert_allclose(output_waveform, sox_output_waveform, atol=1e-4, rtol=1e-5)
mthrok marked this conversation as resolved.
Show resolved Hide resolved

@unittest.skipIf("sox" not in BACKENDS, "sox not available")
@AudioBackendScope("sox")
def test_equalizer(self):
Expand Down
10 changes: 10 additions & 0 deletions test/test_torchscript_consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,16 @@ def func(tensor):

self._assert_consistency(func, waveform)

def test_contrast(self):
filepath = common_utils.get_asset_path("whitenoise.wav")
waveform, _ = torchaudio.load(filepath, normalization=True)

def func(tensor):
enhancement_amount = 80.
return F.contrast(tensor, enhancement_amount)

self._assert_consistency(func, waveform)


class _TransformsTestMixin:
"""Implements test for Transforms that are performed for different devices"""
Expand Down
33 changes: 33 additions & 0 deletions torchaudio/functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
"deemph_biquad",
"riaa_biquad",
"biquad",
"contrast",
'mask_along_axis',
'mask_along_axis_iid'
]
Expand Down Expand Up @@ -1160,6 +1161,38 @@ def riaa_biquad(
return biquad(waveform, b0, b1, b2, a0, a1, a2)


def contrast(
waveform: Tensor,
enhancement_amount: float = 75.
) -> Tensor:
r"""Apply contrast effect. Similar to SoX implementation.
Comparable with compression, this effect modifies an audio signal to make it sound louder

Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
enhancement_amount (float): controls the amount of the enhancement
Allowed range of values for enhancement_amount : 0-100
vincentqb marked this conversation as resolved.
Show resolved Hide resolved
Note that enhancement_amount = 0 still gives a significant contrast enhancement

Returns:
Tensor: Waveform of dimension of `(..., time)`

References:
http://sox.sourceforge.net/sox.html
"""

if not 0 <= enhancement_amount <= 100:
raise ValueError("Allowed range of values for enhancement_amount : 0-100")

contrast = enhancement_amount / 750.

temp1 = waveform * (math.pi / 2)
temp2 = contrast * torch.sin(temp1 * 4)
output_waveform = torch.sin(temp1 + temp2)

return output_waveform


def mask_along_axis_iid(
specgrams: Tensor,
mask_param: int,
Expand Down