AxisError when signal contains silence #21

thequilo · 2020-08-20T12:30:58Z

The stoi function produces an error if a reference signal only contains a short piece of speech. This seems to be caused by the removal of silent frames.

This is a minimal example using WSJ0-2mix data. Replace wsj0_2mix_root with the root to the WSJ0-2mix data. You might have to remove the suffix _2 if you have a newer version of the WJ0-2mix database:

from pathlib import Path
from pystoi.stoi import stoi
import soundfile as sf

wsj0_2mix_root = Path('<path to WSJ0-2mix root dir>')

observation = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/mix/40ba0112_1.2757_01nc0218_-1.2757.wav'))[0]
target = sf.read(str(wsj0_2mix_root / 'data/2speakers/wav8k/min/cv/s2/40ba0112_1.2757_01nc0218_-1.2757_2.wav'))[0]

stoi(target, observation, 8000)

---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
<ipython-input-167-eb5a1701f57b> in <module>
      9 
     10 
---> 11 stoi(target, observation, 8000)

.../python3.7/site-packages/pystoi/stoi.py in stoi(x, y, fs_sig, extended)
     75         # Find normalization constants and normalize
     76         normalization_consts = (
---> 77             np.linalg.norm(x_segments, axis=2, keepdims=True) /
     78             (np.linalg.norm(y_segments, axis=2, keepdims=True) + utils.EPS))
     79         y_segments_normalized = y_segments * normalization_consts

.../python3.7/site-packages/numpy/linalg/linalg.py in norm(x, ord, axis, keepdims)
   2479             # special case for speedup
   2480             s = (x.conj() * x).real
-> 2481             return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
   2482         else:
   2483             try:

AxisError: axis 2 is out of bounds for array of dimension 1

Is this a bug in the implementation or a general flaw of the STOI metric? Do you have a suggestion on how to handle this issue?

The text was updated successfully, but these errors were encountered:

mpariente · 2020-08-20T15:27:46Z

Thanks for raising the issue.
Can you show me your pystoi version please?

thequilo · 2020-08-23T19:56:39Z

I used version 0.2, a fresh install at least doesn't crash, thank you!

But still, I doubt that returning a small number is the right thing to do. In the above example, doing, stoi(target, target) returns 1e-5, where I would expect a value of 1 when the reference and estimated signals are equal. Especially in the WSJ0-2mix database, there are some examples that always result in a bad stoi value even if the reconstruction is perfect.

mpariente · 2020-08-24T07:27:30Z

There is not enough frames to built a intermediate intelligibility index, so we cannot asses intelligibility with STOI in this case. In wsj0-2mix, there is one for which is always happens for me, but only one. Do you have more than one?

thequilo · 2020-08-26T08:15:02Z

This problem occurs for one example in the test (tt min) data, two in the training (tr min) data, and one in the cross-validation (cv min) data. I think it is not a big deal to ignore them.

mpariente · 2020-08-26T09:50:32Z

Yes, I meant for testing, but you're right that there are example in train and val.
I also think they should be ignored, that's why I decided to output a small number.

mpariente closed this as completed Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AxisError when signal contains silence #21

AxisError when signal contains silence #21

thequilo commented Aug 20, 2020

mpariente commented Aug 20, 2020

thequilo commented Aug 23, 2020

mpariente commented Aug 24, 2020

thequilo commented Aug 26, 2020

mpariente commented Aug 26, 2020

AxisError when signal contains silence #21

AxisError when signal contains silence #21

Comments

thequilo commented Aug 20, 2020

mpariente commented Aug 20, 2020

thequilo commented Aug 23, 2020

mpariente commented Aug 24, 2020

thequilo commented Aug 26, 2020

mpariente commented Aug 26, 2020