Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

preprocess.lua #64

Closed
vince62s opened this issue Jan 11, 2017 · 3 comments
Closed

preprocess.lua #64

vince62s opened this issue Jan 11, 2017 · 3 comments

Comments

@vince62s
Copy link
Member

this command line throw an error.

th preprocess.lua -src_vocab_size 50000 -tgt_vocab_size 50000
-train_src data/europarl-v7.fr-en.$sl.tok
-train_tgt data/europarl-v7.fr-en.$tl.tok
-valid_src data/generic_valid.$sl.tok
-valid_tgt data/generic_valid.$tl.tok -save_data exp/model-$sl$tl

I checked the 4 tok files were tokenized the same way -case_feature and -sep_annotate

any clue ?

torch/install/bin/luajit: preprocess.lua:66: all sentences must have the same numbers of additional features
stack traceback:
[C]: in function 'assert'
preprocess.lua:66: in function 'makeVocabulary'
preprocess.lua:124: in function 'initVocabulary'
preprocess.lua:276: in function 'main'
preprocess.lua:311: in main chunk
[C]: in function 'dofile'
...oses/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

@vince62s
Copy link
Member Author

okay I think it happens for empty lines in the corpus

@guillaumekln
Copy link
Collaborator

The commit 08f4c15 should fix this.

@vince62s
Copy link
Member Author

thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants