Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaldi decoding error with new release of CNTK #65

Closed
lahiruts opened this issue Jan 31, 2016 · 14 comments
Closed

Kaldi decoding error with new release of CNTK #65

lahiruts opened this issue Jan 31, 2016 · 14 comments

Comments

@lahiruts
Copy link

Hi All,
The Kaldi decoding fails with the new version of CNTK. If I use an older version of CNTK the decoding works fine. I find it is difficult to infer the issue from the error message. I have mentioned the error below. Please advice. Thank you.

`Post-processing network complete.
HTKMLFWriter::Init: reading output script file data-lda/test_eval92/split8/1/cntk_test.counts ... 560 entries

Allocating matrices for forward and/or backward propagation.
evaluate: reading 571 frames of 440c02010
evaluate: reading 571 frames of 440c02010

[CALL STACK]
/home/lahiru/Devinstall/cntk_github/CNTK/build/release/lib/libcntkmath.so ( Microsoft::MSR::CNTK::DebugUtil::PrintCallStack() + 0xbf ) [0x7ff296ba6cdf]
cntk ( void Microsoft::MSR::CNTK::ThrowFormattedstd::logic_error(char const_, ...) + 0xdd ) [0x53d5dd]
cntk ( Microsoft::MSR::CNTK::ComputationNode::NotifyFunctionValuesMBSizeModified() + 0x41c ) [0x53e57c]
cntk ( ) [0x758d37]
cntk ( Microsoft::MSR::CNTK::SimpleOutputWriter::WriteOutput(Microsoft::MSR::CNTK::IDataReader&, unsigned long, Microsoft::MSR::CNTK::IDataWriter&, std::vector<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, std::allocator<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > > > const&, unsigned long, bool) + 0x363 ) [0x75bb63]
cntk ( void DoWriteOutput(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x669 ) [0x760849]
cntk ( void DoCommands(Microsoft::MSR::CNTK::ConfigParameters const&) + 0xc07 ) [0x593c47]
cntk ( wmainOldCNTKConfig(int, wchar_t__) + 0x909 ) [0x535519]
cntk ( wmain1(int, wchar_t_*) + 0x68 ) [0x535be8]
cntk ( main + 0xd8 ) [0x529518]
/lib/x86_64-linux-gnu/libc.so.6 ( __libc_start_main + 0xf5 ) [0x7ff29582fec5]
cntk ( ) [0x52d4b7]
Closed Kaldi writer`

@yqwangustc
Copy link
Contributor

Hi lahiruts,

We are aware of this issue. A fix will be on the way soon.

Thanks,
Yongqiang

@mravanelli
Copy link

I have the same problem....
Please, let us know when it is fixed! ;)

Thanks,
Mirco

@frankseide
Copy link
Contributor

Is there a chance you could run this in Debug mode? The call stack is missing a critical entry, between WriteOutput() and NotifyFunctionValuesMBSizeModified(). I guess that function got inlined in the Release build.

@frankseide
Copy link
Contributor

Actually, could you just try the latest? We did fix something related to this a few days ago.

Would you mind letting me know if it works now?

@mravanelli
Copy link

I tried with the latest downloadable sources but the error (see below) still persists.
Any idea about it?

Thank you!

Mirco

Allocating matrices for forward and/or backward propagation. evaluate: reading 250 frames of SimMalespk06-usphdevmalespk06snt3852 [CALL STACK] /home/mirco/CNTK-master/build/release/lib/libcntkmath.so ( Microsoft::MSR::CNTK::DebugUtil::PrintCallStack() + 0xbf ) [0x7f156f3061ff] cntk ( void Microsoft::MSR::CNTK::ThrowFormatted<std::logic_error>(char const*, ...) + 0xdd ) [0x53d41d] cntk ( Microsoft::MSR::CNTK::ComputationNode<float>::NotifyFunctionValuesMBSizeModified() + 0x496 ) [0x53e436] cntk ( ) [0x758857] cntk ( Microsoft::MSR::CNTK::SimpleOutputWriter<float>::WriteOutput(Microsoft::MSR::CNTK::IDataReader<float>&, unsigned long, Microsoft::MSR::CNTK::IDataWriter<float>&, std::vector<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, std::allocator<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > > > const&, unsigned long, bool) + 0x363 ) [0x75b9d3] cntk ( void DoWriteOutput<float>(Microsoft::MSR::CNTK::ConfigParameters const&) + 0x669 ) [0x7606f9] cntk ( void DoCommands<float>(Microsoft::MSR::CNTK::ConfigParameters const&) + 0xc07 ) [0x5942e7] cntk ( wmainOldCNTKConfig(int, wchar_t**) + 0x909 ) [0x535539] cntk ( wmain1(int, wchar_t**) + 0x68 ) [0x535b88] cntk ( main + 0xd8 ) [0x529548] /lib/x86_64-linux-gnu/libc.so.6 ( __libc_start_main + 0xf5 ) [0x7f156df7dec5] cntk ( ) [0x52d4e7] Closed Kaldi writer LOG (latgen-faster-mapped:main():latgen-faster-mapped.cc:163) Time taken 1.2866s: real-time factor assuming 100 frames/sec is inf LOG (latgen-faster-mapped:main():latgen-faster-mapped.cc:166) Done 0 utterances, failed for 0 LOG (latgen-faster-mapped:main():latgen-faster-mapped.cc:168) Overall log-likelihood per frame is -nan over 0 frames.

@frankseide
Copy link
Contributor

Hmm... I think the first would be to see the actual error. Somehow it gets masked.

Are you building your own binary? If so, could you run it in Debug, or if that does not work, enable the fprintf() call in Basics.h ThrowFormatted()?

template
__declspec_noreturn static inline void ThrowFormatted(const char* format, ...)
{
va_list args;
char buffer[1024];

va_start(args, format);
vsprintf(buffer, format, args);

#ifdef _DEBUG // print this to log before throwing, so we can see what the error is
fprintf(stderr, "\nAbout to throw exception '%s'\n", buffer);
#endif
Microsoft::MSR::CNTK::DebugUtil::PrintCallStack();
throw E(buffer);
};

@mravanelli
Copy link

If I enable those fprintf in the Basics.h, before the error reported in the previous post, I have:

`Allocating matrices for forward and/or backward propagation.
evaluate: reading 425 frames of SimFemalespk01-usphdevfemalespk01snt2717
About to throw exception 'NotifyFunctionValuesMBSizeModified: labels InputValue operation had its col dimension 425 changed by the reader to 1, but different from MBLayout.'

It seems that for some reasons the reader does something weird.

The reading part of the CNTK2_write.cntk is the following:

reader=[
# reader to use
readerType=Kaldi2Reader
readMethod=blockRandomize
frameMode=false
miniBatchMode=Partial
randomize=Auto
verbosity=0
features=[
dim=$featDim$
scpFile=$inputCounts$
rx=$inputFeats$
]

@frankseide
Copy link
Contributor

We have discovered a tricky problem which may be related to this. I will let you know when it lands.

frankseide added a commit that referenced this issue Feb 5, 2016
…uteNodes that have already been computed, addressing Issue #65;

cleaned up some unnecessary NULL checks before delete
@mravanelli
Copy link

Thank you! I'll stay tuned!

Mirco

On Fri, Feb 5, 2016 at 6:41 PM, Frank Seide notifications@github.com
wrote:

We have discovered a tricky problem which may be related to this. I will
let you know when it lands.


Reply to this email directly or view it on GitHub
#65 (comment).

@frankseide
Copy link
Contributor

Yongqiang tracked it down--in decoding, the labels should not be referenced, but they were, because the logLLs formally depend on them, indirectly through the priors. The fix is to skip PreComputeNodes that have already been computed while analyzing which inputs an output depends on.

This is in master now, but I do not have a positive test case for this. Would you mind trying it and letting me know whether this fixes it?

@mravanelli
Copy link

Sure, tomorrow I will try it ;)

On Fri, Feb 5, 2016 at 11:11 PM, Frank Seide notifications@github.com
wrote:

Yongqiang tracked it down--in decoding, the labels should not be
referenced, but they were, because the logLLs formally depend on them,
indirectly through the priors. The fix is to skip PreComputeNodes that have
already been computed while analyzing which inputs an output depends on.

This is in master now, but I do not have a positive test case for this.
Would you mind trying it and letting me know whether this fixes it?


Reply to this email directly or view it on GitHub
#65 (comment).

@lahiruts
Copy link
Author

lahiruts commented Feb 6, 2016

Hi Frank,
It seems fixed. My decoding is running as expected now.
Thanks,
Lahiru

@mravanelli
Copy link

I confirm that now it works.
Thank you!
Mirco

@wolfma61
Copy link
Contributor

wolfma61 commented Feb 7, 2016

thank your for your verification and feedback

@wolfma61 wolfma61 closed this as completed Feb 7, 2016
jpauwels pushed a commit to jpauwels/CNTK that referenced this issue Feb 7, 2016
…uteNodes that have already been computed, addressing Issue microsoft#65;

cleaned up some unnecessary NULL checks before delete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants