Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a problems when run sre10 v1 run.sh #1014

Closed
JinmingZhao opened this issue Aug 31, 2016 · 7 comments
Closed

There is a problems when run sre10 v1 run.sh #1014

JinmingZhao opened this issue Aug 31, 2016 · 7 comments

Comments

@JinmingZhao
Copy link

the problem is happend when run the part:

dep pooled: 2.16

echo "GMM-$num_components EER"
for x in ind dep; do
for y in female male pooled; do
#eer=compute-eer <(python local/prepare_for_eer.py $trials exp/scores_gmm_${num_components}_${x}_${y}/plda_scores) 2> /dev/null
echo "${x} ${y}: $eer"
done
done

and the error info is :
GMM-1024 EER
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

environment:
python2.7 ubuntu16.04 'The latest version kaldi'

I don't know what caused it. Thanks for your reply

@david-ryan-snyder
Copy link
Contributor

Have you modified the sre10/v1 example? Are you using the same datasets?

Could you copy the first few lines of exp/scores_gmm_${num_components}${x}${y}/plda_scores and $trials here?

@JinmingZhao
Copy link
Author

First, thanks for your reply. I used the timit datasets. And the scores_gmm_1024_ind_female/plda_scores info:
dr1_faks0 dr1_faks0_sx313 21.01163
dr1_faks0 dr1_faks0_sx403 3.098445
dr1_faks0 dr1_faks0_sx43 14.22429
dr1_fcjf0 dr1_fcjf0_sx307 -12.61255
dr1_fcjf0 dr1_fcjf0_sx37 -7.53771

Thanks.

@JinmingZhao
Copy link
Author

and the data/test/trials info is:
dr1_faks0 dr1_faks0_sx43 target
dr1_fcjf0 dr1_fcjf0_sx307 target
dr1_fcjf0 dr1_fcjf0_sx37 target
dr1_fcjf0 dr1_fcjf0_sx397 target
dr1_fdac1 dr1_fdac1_sx304 target
Because I don't have the LDC datasets, so I forge the trials by:
make_sre_2010_test.pl: print OUT_TRIALS "$spkr ${utt}_${side} $is_target\n";

Could you copy the first few lines of trials for me?

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Sep 1, 2016

Mine looks like this:

32707 tcbbl_B nontarget
32707 tkrqm_B nontarget
32707 tkwut_A nontarget
32707 tllyh_B nontarget
32707 tofhd_A nontarget
32707 trurf_A nontarget
32707 trwyy_B nontarget
32707 tsyww_A target

etc

If you look at local/prepare_for_eer.py you'll see that it's very simple. All it's does is prepare the input to the binary compute-eer. The expected input is of the form

<score1> <target/nontarget>
<score2> <target/nontarget>

etc

Could you try running python local/prepare_for_eer.py by itself, and copy some of the output here?

@JinmingZhao
Copy link
Author

JinmingZhao commented Sep 2, 2016

First, thank you very much!
I think I've found the cause of the problem. First, I have generated the trials from the all data(train set and test set), like this speak-id utt-id istarget:
dr1_faks0 dr1_faks0_sa1 target
dr1_faks0 dr1_faks0_sa2 target
dr1_faks0 dr1_faks0_si1573 target
dr1_faks0 dr1_faks0_si2203 target
dr1_faks0 dr1_faks0_si943 target

just as you can see , the "istarget" column value is "target" for all lines, there is no "nontarget", the reason is that I don't have the "$db_base/keys/coreext-coreext.trialkey.csv" file, so I don't know the value of "istarget" is "target" or "nontarget".
And the trails file is used for compute-eer, if I modified some "nontarget" to "target", the program is OK, but the err is 70%. This is not the result I want.
So, I have a question, if I don't use the nist dataset , how could I build the trails file? Does this file have to be?How much influence the veracity of results with this file?
Another question, If I just have the timit dataset, how could I to split the set to four set which include "sre", “train” ,“s re10_train” and “sre10_test” .

Thank you very much!

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Sep 2, 2016

A "target" trial is where the utterance is spoken by the speaker. A "nontarget" trial is where the utterance is spoken by a different speaker. If your verification system has no errors, target trials should be accepted, and nontarget trials should be rejected by your verification system.

What I suggest is that you look at the spk2utt file, and write a script that generates a trials file for you. For each speaker, you can take the speaker id and all the utterances that belong to it and pair them up to create the "target" trials. For example:

<spkA> <spkA-utt1> target
<spkA> <spkA-utt2> target

To create the nontarget trials, you can randomly pair a speaker with utterances belonging to another speaker. For example:

<spkA> <spkC-utt2> nontarget
<spkB> <spkF-utt1> nontarget
<spkB> <spkH-utt3> nontarget

Since you'll need to write a script to do this, you can create an option that controls the probability of forming nontarget trials, and you can generate several trials files with a different percent of nontargets and see how the results vary between them. Since the scripts evaluate on the equal error-rate (EER), most likely it won't make a big difference. E.g., if your EER is 10% with 50% nontarget trials, it will likely continue to be 10% with 80% nontarget trials.

If I just have the timit dataset, how could I to split the set to four set which include "sre", “train” ,“s re10_train” and “sre10_test” .

In your case you would just have three datasets, train, enroll (called sre10_train in this example), and test (called sre10_test in this example). The SRE dataset is used to train the PLDA model, since the bulk of the training data is out of domain. In your situation, you would train the PLDA model directly on the training data, since it is in domain.

I would do the following:

  • train should consist of a set of speakers that do not overlap with the enroll and test data
  • Utterances from the remaining speakers should be divided into enroll and test sets. Although enroll and test share speakers, they should not contain utterances from the same recordings.

In my opinion, if you have more questions about this, it is best to move this to the Kaldi help page at http://kaldi-asr.org/forums.html.

@JinmingZhao
Copy link
Author

Thanks very much for your detailed answers. Now I know how to modify my script.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants