Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AxisError when signal contains silence #21

Closed
thequilo opened this issue Aug 20, 2020 · 5 comments
Closed

AxisError when signal contains silence #21

thequilo opened this issue Aug 20, 2020 · 5 comments

Comments

@thequilo
Copy link

The stoi function produces an error if a reference signal only contains a short piece of speech. This seems to be caused by the removal of silent frames.

This is a minimal example using WSJ0-2mix data. Replace wsj0_2mix_root with the root to the WSJ0-2mix data. You might have to remove the suffix _2 if you have a newer version of the WJ0-2mix database:

from pathlib import Path
from pystoi.stoi import stoi
import soundfile as sf

wsj0_2mix_root = Path('<path to WSJ0-2mix root dir>')

observation = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/mix/40ba0112_1.2757_01nc0218_-1.2757.wav'))[0]
target = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/s2/40ba0112_1.2757_01nc0218_-1.2757_2.wav'))[0]

stoi(target, observation, 8000)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
<ipython-input-167-eb5a1701f57b> in <module>
      9 
     10 
---> 11 stoi(target, observation, 8000)

.../python3.7/site-packages/pystoi/stoi.py in stoi(x, y, fs_sig, extended)
     75         # Find normalization constants and normalize
     76         normalization_consts = (
---> 77             np.linalg.norm(x_segments, axis=2, keepdims=True) /
     78             (np.linalg.norm(y_segments, axis=2, keepdims=True) + utils.EPS))
     79         y_segments_normalized = y_segments * normalization_consts

.../python3.7/site-packages/numpy/linalg/linalg.py in norm(x, ord, axis, keepdims)
   2479             # special case for speedup
   2480             s = (x.conj() * x).real
-> 2481             return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
   2482         else:
   2483             try:

AxisError: axis 2 is out of bounds for array of dimension 1

Is this a bug in the implementation or a general flaw of the STOI metric? Do you have a suggestion on how to handle this issue?

@mpariente
Copy link
Owner

Thanks for raising the issue.
Can you show me your pystoi version please?

@thequilo
Copy link
Author

I used version 0.2, a fresh install at least doesn't crash, thank you!

But still, I doubt that returning a small number is the right thing to do. In the above example, doing, stoi(target, target) returns 1e-5, where I would expect a value of 1 when the reference and estimated signals are equal. Especially in the WSJ0-2mix database, there are some examples that always result in a bad stoi value even if the reconstruction is perfect.

@mpariente
Copy link
Owner

There is not enough frames to built a intermediate intelligibility index, so we cannot asses intelligibility with STOI in this case. In wsj0-2mix, there is one for which is always happens for me, but only one. Do you have more than one?

@thequilo
Copy link
Author

This problem occurs for one example in the test (tt min) data, two in the training (tr min) data, and one in the cross-validation (cv min) data. I think it is not a big deal to ignore them.

@mpariente
Copy link
Owner

Yes, I meant for testing, but you're right that there are example in train and val.
I also think they should be ignored, that's why I decided to output a small number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants