Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docstring of sliding_window_cmn #1380

Closed
jackyguo624 opened this issue Mar 10, 2021 · 10 comments
Closed

Fix docstring of sliding_window_cmn #1380

jackyguo624 opened this issue Mar 10, 2021 · 10 comments

Comments

@jackyguo624
Copy link

jackyguo624 commented Mar 10, 2021

specgram (Tensor): Tensor of audio of dimension (..., freq, time)

the input specgram shoud be in shape ( ..., time, freq), instead of (..., freq, time) as doc says.

Right?

@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

Hi @jackyguo624

Thanks for the report. I think you are right about it and we should fix it.

@mthrok mthrok changed the title The docment is misleading ? Fix docstring of sliding_window_cmn Mar 10, 2021
@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

CC @jcaw Sorry we missed it in the review process.

@yoyololicon
Copy link
Collaborator

@jackyguo624
Hmm, I think the docstring is right, (..., freq, time) means you can drop the batch dimension, directly feed a matrix into it.

if len(input_shape) == 2:

This line handle when input is a 2-dimensional matrix.

@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

@jackyguo624
Hmm, I think the docstring is right, (..., freq, time) means you can drop the batch dimension, directly feed a matrix into it.

if len(input_shape) == 2:

This line handle when input is a 2-dimensional matrix.

@yoyololicon

Well, the batch/channel dimension is fine. I think the issue is about the order of the last two dimensions. The expected order is (..., time, freq) whereas the docstring is (..., freq, time)

@yoyololicon
Copy link
Collaborator

Oh, I see it, nevermind.😅

@jcaw
Copy link
Contributor

jcaw commented Mar 10, 2021

Spectrograms returned by torchaudio do have the order (..., freq, time) though. Should sliding_window_cmn be operating on a different order? That seems like a bug.

@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

Spectrograms returned by torchaudio do have the order (..., freq, time) though. Should sliding_window_cmn be operating on a different order? That seems like a bug.

This is the inconsistent design principles between the libraries torchaudio followed. sliding_window_cmn is from Kaldi while spectrogram is from another different source. We could think of changing it but it will be a bc-breaking change, so it needs a separate discussion. For now, let's make the docstring of sliding_window_cmn accurately reflect the operation.

@mohamedirfansh
Copy link
Contributor

Hello, can I work on this and fix this issue?

@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

Thanks. The docstring itself has changed. I'll think of what we can do about the inconsistent design. There are bunch of inconsistent interface design around Kaldi-originated functions.

@mthrok mthrok closed this as completed Mar 10, 2021
@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2021

Thanks. The docstring itself has changed. I'll think of what we can do about the inconsistent design. There are bunch of inconsistent interface design around Kaldi-originated functions.

and if you have any suggestion, we are happy to hear.

mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022
* add tensorboard_profiler tutorial

* Update intermediate_source/tensorboard_profiler_tutorial.py

Co-authored-by: maxluk <maxluk@microsoft.com>

* update title

* remove testing on windows because kineto doesn't support windows now

* rename

* Update with the help of Ilia

Co-authored-by: Teng Gao <tegao@microsoft.com>
Co-authored-by: maxluk <maxluk@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants