Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not exclude segments where one/several estimates are all-zero #4

Open
StefanUhlich-sony opened this issue Apr 16, 2018 · 1 comment
Assignees

Comments

@StefanUhlich-sony
Copy link

Currently, segments where one/several estimates are all-zero are not considered for the BSSEval computation

https://github.com/sigsep/sigsep-mus-eval/blob/05d52e4962660417801b78aa82ac598dd8c7b25a/museval/metrics.py#L300

This leads to the effect that the SDR value, which is defined for the jth instrument as

SDR_j = 20\log_10 ( \sum_i,n s_{ij}(n)^2 ) / ( \sum_i,n (s_{ij}(n) - \hat s_{ij}(n))^2 )

depends on the other estimates \hat s_{ik}(n) for k \ne j. Here is a quick example that shows the effect:

import musdb
import museval

import numpy as np


def estimate_and_evaluate1(track):
    """ Simple baseline system using mixture as estimate """
    estimates = {}
    estimates['vocals'] = 0.25 * track.audio
    estimates['accompaniment'] = 0.75 * track.audio

    scores = museval.eval_mus_track(track, estimates, output_dir='.')
    print('Score for `estimate_and_evaluate1`:')
    print(scores)

    return estimates


def estimate_and_evaluate2(track):#
    """ Modified baseline system, which sets the second half of `vocals` to zero """
    estimates = {}
    estimates['vocals'] = 0.25 * track.audio
    estimates['accompaniment'] = 0.75 * track.audio

    estimates['vocals'] *= np.vstack((np.ones((track.audio.shape[0] // 2, 2)),
                                      np.zeros((track.audio.shape[0] - track.audio.shape[0] // 2, 2))))

    scores = museval.eval_mus_track(track, estimates, output_dir='.')
    print('Score for `estimate_and_evaluate2`:')
    print(scores)

    return estimates


def estimate_and_evaluate3(track):#
    """ Modified baseline system, which sets the first half of `vocals` to zero """
    estimates = {}
    estimates['vocals'] = 0.25 * track.audio
    estimates['accompaniment'] = 0.75 * track.audio

    estimates['vocals'] *= np.vstack((np.zeros((track.audio.shape[0] // 2, 2)),
                                      np.ones((track.audio.shape[0] - track.audio.shape[0] // 2, 2))))

    scores = museval.eval_mus_track(track, estimates, output_dir='.')
    print('Score for `estimate_and_evaluate3`:')
    print(scores)

    return estimates


mus = musdb.DB(root_dir='/speech/db/mul/separ4/sisec/data2018/', is_wav=True)
mus.run(estimate_and_evaluate1, estimates_dir=".", tracks=[mus.load_mus_tracks(subsets='test')[0]])
mus.run(estimate_and_evaluate2, estimates_dir=".", tracks=[mus.load_mus_tracks(subsets='test')[0]])
mus.run(estimate_and_evaluate3, estimates_dir=".", tracks=[mus.load_mus_tracks(subsets='test')[0]])

estimate_and_evaluate* are three simple systems that uses the mixture as estimate. Only vocals is modified for the different versions but also the BSSEval values for accompaniment are changed:

$ python separ_and_evaluate.py 
  0%|                                                                                                                                                                                       | 0/1 [00:00<?, ?it/s]
Score for `estimate_and_evaluate1`:
vocals              => SDR:-10.161dB, SIR:-16.848dB, ISR:2.421dB, SAR:28.828dB, 
accompaniment       => SDR:6.991dB, SIR:12.551dB, ISR:11.751dB, SAR:28.828dB, 

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:25<00:00, 85.16s/it]
  0%|                                                                                                                                                                                       | 0/1 [00:00<?, ?it/s]
Score for `estimate_and_evaluate2`:
vocals              => SDR:-12.816dB, SIR:-15.727dB, ISR:0.177dB, SAR:-1.699dB, 
accompaniment       => SDR:7.181dB, SIR:14.078dB, ISR:11.783dB, SAR:27.795dB, 

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:11<00:00, 71.51s/it]
  0%|                                                                                                                                                                                       | 0/1 [00:00<?, ?it/s]
Score for `estimate_and_evaluate3`:
vocals              => SDR:-7.410dB, SIR:-11.257dB, ISR:0.695dB, SAR:2.519dB, 
accompaniment       => SDR:6.783dB, SIR:10.938dB, ISR:11.722dB, SAR:29.830dB, 

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:15<00:00, 75.29s/it]

@faroit @aliutkus What do you think? Should this be changed for a future version of BSSEval?

@faroit
Copy link
Member

faroit commented Apr 20, 2018

I remember we did this on purpose, but can't remember why we did that. Maybe @aliutkus can jump in here?

@faroit faroit transferred this issue from sigsep/sigsep-mus-eval May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants