preprocess.lua #64

vince62s · 2017-01-11T11:35:45Z

this command line throw an error.

th preprocess.lua -src_vocab_size 50000 -tgt_vocab_size 50000
-train_src data/europarl-v7.fr-en.$sl.tok
-train_tgt data/europarl-v7.fr-en.$tl.tok
-valid_src data/generic_valid.$sl.tok
-valid_tgt data/generic_valid.$tl.tok -save_data exp/model-$sl$tl

I checked the 4 tok files were tokenized the same way -case_feature and -sep_annotate

any clue ?

torch/install/bin/luajit: preprocess.lua:66: all sentences must have the same numbers of additional features
stack traceback:
[C]: in function 'assert'
preprocess.lua:66: in function 'makeVocabulary'
preprocess.lua:124: in function 'initVocabulary'
preprocess.lua:276: in function 'main'
preprocess.lua:311: in main chunk
[C]: in function 'dofile'
...oses/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

vince62s · 2017-01-11T11:44:37Z

okay I think it happens for empty lines in the corpus

guillaumekln · 2017-01-11T12:55:28Z

The commit 08f4c15 should fix this.

vince62s · 2017-01-11T13:07:52Z

thanks.

vince62s closed this as completed Jan 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocess.lua #64

preprocess.lua #64

vince62s commented Jan 11, 2017

vince62s commented Jan 11, 2017

guillaumekln commented Jan 11, 2017

vince62s commented Jan 11, 2017

preprocess.lua #64

preprocess.lua #64

Comments

vince62s commented Jan 11, 2017

vince62s commented Jan 11, 2017

guillaumekln commented Jan 11, 2017

vince62s commented Jan 11, 2017