Worse performance with language model #173

jafioti · 2020-11-23T01:10:21Z

I am trying to use the KenLM language model to improve my results, but every time I try to use it, it outputs garbage output. Since the alpha is supposed to weight the LM, I tried setting it lower, but even at 0 it still outputs garbage. The only way I can get it to behave is by removing the LM path. Shouldn't the output be identical with and without the LM if alpha is set to 0?

Here is the decoder initialization:
decoder = CTCBeamDecoder(labels="".join([local_vocab.index2word[i][0] for i in range(local_vocab.num_words)]), model_path="test.arpa", alpha=0.5, beta=0.9, beam_width=100, blank_id=local_vocab.num_words - 1)

Here is the usage:
beam_results, beam_scores, timesteps, out_lens = decoder.decode(F.softmax(output, dim=-1).transpose(0, 1))

Is this expected behavior?

The text was updated successfully, but these errors were encountered:

2000ZRL · 2020-12-15T15:49:01Z

I met the same problem. It also seems that the token must be one-character.

Yushi-Hu · 2021-01-21T06:21:12Z

Same problem here

Sologa · 2021-04-14T19:03:07Z

It seems that #31 has not been solved. My workaround is to replace my lm training file's tokens with Chinese characters.

Jatin-WIAI · 2023-12-28T11:06:08Z

@jafioti were you able to solve the issue?

pe-trik mentioned this issue Oct 13, 2021

Multicharacter Token Support #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worse performance with language model #173

Worse performance with language model #173

jafioti commented Nov 23, 2020 •

edited

Loading

2000ZRL commented Dec 15, 2020

Yushi-Hu commented Jan 21, 2021

Sologa commented Apr 14, 2021

Jatin-WIAI commented Dec 28, 2023

Worse performance with language model #173

Worse performance with language model #173

Comments

jafioti commented Nov 23, 2020 • edited Loading

2000ZRL commented Dec 15, 2020

Yushi-Hu commented Jan 21, 2021

Sologa commented Apr 14, 2021

Jatin-WIAI commented Dec 28, 2023

jafioti commented Nov 23, 2020 •

edited

Loading