Skip to content

Conversation

@jdariasl
Copy link
Contributor

@jdariasl jdariasl commented Nov 2, 2022

I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. @ahmed-fau opened this requirement in December 2021 with the number (#2103). The new functionality includes three different well-known approximations of the Bark scale.

@jdariasl
Copy link
Contributor Author

jdariasl commented Nov 2, 2022

I'm unsure if the failing tests are related to the functionalities I'm trying to add.

@jdariasl jdariasl marked this pull request as ready for review November 2, 2022 14:07
Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jdariasl, thanks for working on this! I left a couple of comments.

I also wanted to note that as this is a new feature, the TorchAudio team would like to first support it as a prototype feature (feature classifications) available in our nightly builds, and will reevaluate moving it to beta in one of the future releases, pending use cases and API stability (it would be great if you could provide any additional context/existing implementations of this feature we could refer to!). If that sounds good to you, we can help move it to torchaudio.prototype after your PR is reviewed and merged, lmk if you have any questions. Thanks again!

specgram = torch.randn(3, 2, 201, 256)

atol = 1e-6 if os.name == "nt" else 1e-8
atol = 1e-4 if os.name == "nt" else 1e-6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert back this change? these values should be fine for the original melscale tests

Copy link
Contributor Author

@jdariasl jdariasl Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course. When I ran the test locally, MelScale didn't satisfy these values, so I played a little bit with the values and forgot to revert the changes. You can test it on your own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot_20221103_103546

This is the output of batch_consistency_test with the original tolerance values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any feedback about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yea it is also erroring out for me with those values locally, but seems to run okay on our GitHub CircleCI tests -- I think we can leave as is for now, and I can also test out to see if bark scale is passing on CircleCI tests with smaller tolerances after this PR is merged.

def test_batch_BarkScale(self):
specgram = torch.randn(3, 2, 201, 256)

atol = 1e-4 if os.name == "nt" else 1e-6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why these tolerances are increased from the working melscale ones above?

@jdariasl
Copy link
Contributor Author

jdariasl commented Nov 3, 2022

The bark scale is just another auditory scale used in speech and audio processing. Such scale is used to estimate RASTA-PLP coefficients (the acronym stands for Relative Spectral Transform - Perceptual Linear Prediction), which is also an alternative to MFCC used in different contexts, including biomedical applications. Most of the available implementations are developed in Matlab: https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/, https://es.mathworks.com/help/audio/ref/designauditoryfilterbank.html, even though there is also an implementation in Praat. In this sense, I think a better integration of the Bark scale into torchaudio would be to develop an auditory_fbanks function that subsumes melscale_fbanks and barkscale_fbanks, but I didn't want to modify existing functions.

On the other hand, I'm unsure whether it is possible to propose a comparison with Matlab implementations.

Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of small comments, and could you also add the new functions to docs/source/functional.rst and docs/source/transforms.rst to have it reflected in the pytorch documentation?

Otherwise, the implementation looks good, and should be ready to merge soon! Once merged, I'll create a PR to move this into prototype and investigate the bark spectrogram tolerance, thanks for the contribution :)

specgram = torch.randn(3, 2, 201, 256)

atol = 1e-6 if os.name == "nt" else 1e-8
atol = 1e-4 if os.name == "nt" else 1e-6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yea it is also erroring out for me with those values locally, but seems to run okay on our GitHub CircleCI tests -- I think we can leave as is for now, and I can also test out to see if bark scale is passing on CircleCI tests with smaller tolerances after this PR is merged.

Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feature contribution, I think this implementation looks good!

follow ups on my end after merging

  • move it to a prototype feature
  • experiment with test tolerance
  • add assets image

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2022
Summary:
follow up to #2823
- move bark spectrogram to prototype
- decrease autograd test tolerance (passing on circle ci)
- add diagram for bark fbanks

cc jdariasl

Pull Request resolved: #2843

Reviewed By: nateanl

Differential Revision: D41199522

Pulled By: carolineechen

fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants