Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: invalid line 2 in BPE codes file when running apply_bpe.py #114

Closed
jo0704 opened this issue Apr 22, 2022 · 1 comment
Closed

Error: invalid line 2 in BPE codes file when running apply_bpe.py #114

jo0704 opened this issue Apr 22, 2022 · 1 comment

Comments

@jo0704
Copy link

jo0704 commented Apr 22, 2022

Hi,

When running apply_bpe.py to segment given texts with the generated vocabulary I get the following error:

Error: invalid line 2 in BPE codes file: bpeout/vocab
The line should exist of exactly two subword units, separated by whitespace

The exact command lines I used:

echo "#version: 0.2" > bpeout/vocab.seg # add version info
echo bpeout/vocab >> bpeout/vocab.seg
python3 subword-nmt/subword_nmt/apply_bpe.py -c bpeout/vocab.seg <X-EN/de_en/train.de >bpeout/train_out.de

I added the vocab and train file I'm trying to segment:
bpe_vocab.zip

A similar issue was reported here #46 , but it doesn't seem to solve the error in my case.

@rsennrich
Copy link
Owner

Hi,

your vocab seems to have been produced by a different script, and is invalid. Specifically, there are lines containing just one symbol (__en__), and it's also not distinguishing between word-internal and word-final merge operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants