Decoded string still have words that are not in lexicon with KenLMScorer #10

ybzhou · 2017-07-13T21:15:50Z

When using LM based scorer, I still see words that are not in the lexicon appearing. Is it correct behavior? What beam search strategy is used, is it the same as described in https://arxiv.org/pdf/1408.2873.pdf ?

Thanks.

ryanleary · 2017-07-14T03:49:42Z

There is a low, but non-zero, probability assigned to words not in the lexicon. See https://github.com/ryanleary/pytorch-ctc/blob/master/pytorch_ctc/src/ctc_beam_scorer_klm.h#L95

ybzhou · 2017-07-14T18:21:35Z

I see, thanks for the reply. After changing it to a really small value it worked.

ryanleary · 2017-07-17T02:07:22Z

I'll expose this in a future version as a configuration parameter in Python. See #11.

ryanleary closed this as completed Jul 17, 2017

Chen1399 mentioned this issue Sep 2, 2019

load_lm appear Segmentation fault (core dumped) #115

Closed

olesyaksyon mentioned this issue Aug 18, 2020

segmentation fault using side arpa or binary #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoded string still have words that are not in lexicon with KenLMScorer #10

Decoded string still have words that are not in lexicon with KenLMScorer #10

ybzhou commented Jul 13, 2017

ryanleary commented Jul 14, 2017

ybzhou commented Jul 14, 2017

ryanleary commented Jul 17, 2017

Decoded string still have words that are not in lexicon with KenLMScorer #10

Decoded string still have words that are not in lexicon with KenLMScorer #10

Comments

ybzhou commented Jul 13, 2017

ryanleary commented Jul 14, 2017

ybzhou commented Jul 14, 2017

ryanleary commented Jul 17, 2017