Cholesky decomposition failed when training plda #3328

czy97 · 2019-05-16T12:06:57Z

Hello,
When I train plda using some feature I extracted, this error occured.The tail of log is shown below

LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 146608
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 209565
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 5 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 140.852
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 197.8
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 6 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 105.141
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 157.448
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 7 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 1117.95
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 5858.14
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 8 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 140.276
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 395.243
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 9 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 12926.4
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 12056.3
LOG (ivector-compute-plda[5.5.0~3-327d]:GetOutput():plda.cc:540) Norm of mean of iVector distribution is 0.745405
WARNING (ivector-compute-plda[5.5.0~3-327d]:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite. Throwing error
Cholesky decomposition failed.# Accounting: begin_time=1558008300



And when I add some very small random noise to my feature like you said in the LDA computing when facing the same problem, the error was still there.
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 7 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 101755
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 141957
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 8 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 3136.84
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 17124.3
LOG (ivector-compute-plda[5.5.0~3-327d]:Estimate():plda.cc:529) Plda estimation iteration 9 of 10
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:511) Trace of within-class variance is 1.78053e+08
LOG (ivector-compute-plda[5.5.0~3-327d]:EstimateFromStats():plda.cc:512) Trace of between-class variance is 1.22304e+08
LOG (ivector-compute-plda[5.5.0~3-327d]:GetOutput():plda.cc:540) Norm of mean of iVector distribution is 0.745405
WARNING (ivector-compute-plda[5.5.0~3-327d]:Cholesky():tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite. Throwing error
Cholesky decomposition failed.# Accounting: begin_time=1558007075

The text was updated successfully, but these errors were encountered:

danpovey · 2019-05-16T18:50:09Z

Are you sure your features aren't limited to a subspace of the space they live in? Or maybe you have fewer features than the PLDA dimension?

czy97 · 2019-05-17T01:06:27Z

Are you sure your features aren't limited to a subspace of the space they live in? Or maybe you have fewer features than the PLDA dimension?

Thanks for your reply. My feature dimension is 256, and more than 1 million features are used to train the plda. Moreover, what do you mean by saying my features aren't limited to a subspace of the space they live in.

danpovey · 2019-05-17T01:37:57Z

I mean something like the feature values sum to one, so the covariance would not be full rank. Or one feature is always zero, something like that.

…

On Thu, May 16, 2019 at 9:06 PM Zhengyang Chen ***@***.***> wrote: Are you sure your features aren't limited to a subspace of the space they live in? Or maybe you have fewer features than the PLDA dimension? Thanks for your reply. My feature dimension is 256, and more than 1 million features are used to train the plda. Moreover, what do you mean by saying my features aren't limited to a subspace of the space they live in. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328?email_source=notifications&email_token=AAZFLOYRESISZAPF37AXTGDPVYAJVA5CNFSM4HNL3RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTOGGA#issuecomment-493282072>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZFLO6JJQZ2ZRGYLRJI7PTPVYAJVANCNFSM4HNL3ROQ> .

czy97 · 2019-05-17T02:16:32Z

I mean something like the feature values sum to one, so the covariance would not be full rank. Or one feature is always zero, something like that.
…
On Thu, May 16, 2019 at 9:06 PM Zhengyang Chen @.***> wrote: Are you sure your features aren't limited to a subspace of the space they live in? Or maybe you have fewer features than the PLDA dimension? Thanks for your reply. My feature dimension is 256, and more than 1 million features are used to train the plda. Moreover, what do you mean by saying my features aren't limited to a subspace of the space they live in. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328?email_source=notifications&email_token=AAZFLOYRESISZAPF37AXTGDPVYAJVA5CNFSM4HNL3RO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTOGGA#issuecomment-493282072>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZFLO6JJQZ2ZRGYLRJI7PTPVYAJVANCNFSM4HNL3ROQ .

Ok, thanks. You can see that the trace of within/between-class variance get very big at the iteration 9 of 10. If I set the parameter --num-em-iters to 9 instead of 10, the error will disappear(the within/between-class is small after 9 iters). And when I check some normal logs of plda training, the within/between-class variance always keeps at a relatively low value(around one hundred). Does it mean that the EM algorithm not converge well? So, is it my data's fault or other cause.

danpovey · 2019-05-17T02:24:37Z

I'll try to look into it at some point.

stale · 2020-06-19T07:37:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shayxurui · 2020-09-22T01:50:46Z

When I use the running script(run.sh) of aishell2 in Kaldi's egs, I get an error when I run it to steps/online/nnet2/train_ivector_extractor.sh, and the log file says cholesky decomposition failed. Maybe matrix is not positive definite. I did not modify any data or code.

danpovey · 2020-09-22T03:22:22Z

Sometimes that error is harmless, anyway it's quite generic, would need to see more info (e.g. more of the log).

…

On Tue, Sep 22, 2020 at 9:51 AM 徐锐 ***@***.***> wrote: When I use the running script(run.sh) of aishell2 in Kaldi's egs, I get an error when I run it to steps/online/nnet2/train_ivector_extractor.sh, and the log file says cholesky decomposition failed. Maybe matrix is not positive definite. I did not modify any data or code. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3RJVLAHMVYXQYL5VDSG77INANCNFSM4HNL3ROQ> .

shayxurui · 2020-09-23T01:56:23Z

Sometimes that error is harmless, anyway it's quite generic, would need to see more info (e.g. more of the log).
…
On Tue, Sep 22, 2020 at 9:51 AM 徐锐 @.***> wrote: When I use the running script(run.sh) of aishell2 in Kaldi's egs, I get an error when I run it to steps/online/nnet2/train_ivector_extractor.sh, and the log file says cholesky decomposition failed. Maybe matrix is not positive definite. I did not modify any data or code. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3RJVLAHMVYXQYL5VDSG77INANCNFSM4HNL3ROQ .

I am very sorry, I re-executed the code, the log file has been overwritten. A new error was encountered, still the train_ivetcor_extractor.sh script and log says expected token "",got instead "BLAS"

danpovey · 2020-09-23T05:35:06Z

You should learn to paste as text. My guess is that probably you ran out of memory, that part can use up a great deal of memory. You could reduce the --num-processes to 1, to train_ivector_extractor.sh, that should help, and/or reduce the --num-jobs too if you are using run.pl

…

On Wed, Sep 23, 2020 at 9:56 AM 徐锐 ***@***.***> wrote: Sometimes that error is harmless, anyway it's quite generic, would need to see more info (e.g. more of the log). … <#m_4403595919129091908_> On Tue, Sep 22, 2020 at 9:51 AM 徐锐 *@*.***> wrote: When I use the running script(run.sh) of aishell2 in Kaldi's egs, I get an error when I run it to steps/online/nnet2/train_ivector_extractor.sh, and the log file says cholesky decomposition failed. Maybe matrix is not positive definite. I did not modify any data or code. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment) <#3328 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3RJVLAHMVYXQYL5VDSG77INANCNFSM4HNL3ROQ . I am very sorry, I re-executed the code, the log file has been overwritten. A new error was encountered, [image: image] <https://user-images.githubusercontent.com/30276311/93955591-abe97100-fd82-11ea-96f3-89634406df86.png> and log [image: 微信图片_20200923095518] <https://user-images.githubusercontent.com/30276311/93955706-f66aed80-fd82-11ea-85dd-30ffe91ff9e8.png> ![image]( https://user-images.githubusercontent.com/30276311/93955737-0682cd00-fd83-11ea-8f58-141c9adb5f24.png) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3RLWAOBLFUQY22PWTSHFIVLANCNFSM4HNL3ROQ> .

shayxurui · 2020-09-23T08:40:29Z

You should learn to paste as text. My guess is that probably you ran out of memory, that part can use up a great deal of memory. You could reduce the --num-processes to 1, to train_ivector_extractor.sh, that should help, and/or reduce the --num-jobs too if you are using run.pl
…
On Wed, Sep 23, 2020 at 9:56 AM 徐锐 @.> wrote: Sometimes that error is harmless, anyway it's quite generic, would need to see more info (e.g. more of the log). … <#m_4403595919129091908_> On Tue, Sep 22, 2020 at 9:51 AM 徐锐 @.> wrote: When I use the running script(run.sh) of aishell2 in Kaldi's egs, I get an error when I run it to steps/online/nnet2/train_ivector_extractor.sh, and the log file says cholesky decomposition failed. Maybe matrix is not positive definite. I did not modify any data or code. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment) <#3328 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3RJVLAHMVYXQYL5VDSG77INANCNFSM4HNL3ROQ . I am very sorry, I re-executed the code, the log file has been overwritten. A new error was encountered, [image: image] https://user-images.githubusercontent.com/30276311/93955591-abe97100-fd82-11ea-96f3-89634406df86.png and log [image: 微信图片_20200923095518] https://user-images.githubusercontent.com/30276311/93955706-f66aed80-fd82-11ea-85dd-30ffe91ff9e8.png — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3328 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3RLWAOBLFUQY22PWTSHFIVLANCNFSM4HNL3ROQ .

thank you for your reply. I want to log out all the information, but I cannot copy the text from the virtual machine because the company has restricted it.

danpovey · 2020-09-23T08:49:36Z

Likely something in your system is printing 'BLAS' to stdout every time a shell is created, e.g. in one of your .xxxrc files. Either that or (somehow) when a certain BLAS library gets loaded it prints BLAS.

shayxurui · 2020-09-25T01:44:36Z

Likely something in your system is printing 'BLAS' to stdout every time a shell is created, e.g. in one of your .xxxrc files. Either that or (somehow) when a certain BLAS library gets loaded it prints BLAS.

When I delete the relevant information of ivector and train again, there is no error. It seems that training aishell2 does not necessarily require ivector.

stale · 2020-11-24T02:23:47Z

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

czy97 added the bug label May 16, 2019

stale bot added the stale Stale bot on the loose label Jun 19, 2020

stale bot removed the stale Stale bot on the loose label Sep 22, 2020

stale bot added the stale Stale bot on the loose label Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cholesky decomposition failed when training plda #3328

Cholesky decomposition failed when training plda #3328

czy97 commented May 16, 2019 •

edited by danpovey

Loading

danpovey commented May 16, 2019

czy97 commented May 17, 2019

danpovey commented May 17, 2019 via email

czy97 commented May 17, 2019

danpovey commented May 17, 2019

stale bot commented Jun 19, 2020

shayxurui commented Sep 22, 2020

danpovey commented Sep 22, 2020 via email

shayxurui commented Sep 23, 2020 •

edited

Loading

danpovey commented Sep 23, 2020 via email

shayxurui commented Sep 23, 2020

danpovey commented Sep 23, 2020

shayxurui commented Sep 25, 2020

stale bot commented Nov 24, 2020

Cholesky decomposition failed when training plda #3328

Cholesky decomposition failed when training plda #3328

Comments

czy97 commented May 16, 2019 • edited by danpovey Loading

danpovey commented May 16, 2019

czy97 commented May 17, 2019

danpovey commented May 17, 2019 via email

czy97 commented May 17, 2019

danpovey commented May 17, 2019

stale bot commented Jun 19, 2020

shayxurui commented Sep 22, 2020

danpovey commented Sep 22, 2020 via email

shayxurui commented Sep 23, 2020 • edited Loading

danpovey commented Sep 23, 2020 via email

shayxurui commented Sep 23, 2020

danpovey commented Sep 23, 2020

shayxurui commented Sep 25, 2020

stale bot commented Nov 24, 2020

czy97 commented May 16, 2019 •

edited by danpovey

Loading

shayxurui commented Sep 23, 2020 •

edited

Loading