Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix changes of file locations of subword-nmt #1219

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This model uses a `Byte Pair Encoding (BPE)
vocabulary <https://arxiv.org/abs/1508.07909>`__, so we'll have to apply
the encoding to the source text before it can be translated. This can be
done with the
`apply\_bpe.py <https://github.com/rsennrich/subword-nmt/blob/master/apply_bpe.py>`__
`apply\_bpe.py <https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/apply_bpe.py>`__
script using the ``wmt14.en-fr.fconv-cuda/bpecodes`` file. ``@@`` is
used as a continuation marker and the original text can be easily
recovered with e.g. ``sed s/@@ //g`` or by passing the ``--remove-bpe``
Expand Down
2 changes: 1 addition & 1 deletion examples/translation/prepare-iwslt14.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ SCRIPTS=mosesdecoder/scripts
TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
LC=$SCRIPTS/tokenizer/lowercase.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
BPEROOT=subword-nmt
BPEROOT=subword-nmt/subword_nmt
BPE_TOKENS=10000

URL="https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz"
Expand Down
2 changes: 1 addition & 1 deletion examples/translation/prepare-wmt14en2de.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
BPEROOT=subword-nmt
BPEROOT=subword-nmt/subword_nmt
BPE_TOKENS=40000

URLS=(
Expand Down
2 changes: 1 addition & 1 deletion examples/translation/prepare-wmt14en2fr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
BPEROOT=subword-nmt
BPEROOT=subword-nmt/subword_nmt
BPE_TOKENS=40000

URLS=(
Expand Down
1 change: 0 additions & 1 deletion examples/translation_moe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ wget dl.fbaipublicfiles.com/fairseq/data/wmt14-en-de.extra_refs.tok

Next apply BPE on the fly and run generation for each expert:
```bash
BPEROOT=examples/translation/subword-nmt/
BPE_CODE=examples/translation/wmt17_en_de/code
for EXPERT in $(seq 0 2); do \
cat wmt14-en-de.extra_refs.tok \
Expand Down
2 changes: 1 addition & 1 deletion fairseq/data/encoders/subword_nmt_bpe.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def __init__(self, args):
raise ValueError('--bpe-codes is required for --bpe=subword_nmt')
codes = file_utils.cached_path(args.bpe_codes)
try:
from subword_nmt import apply_bpe
from subword_nmt.subword_nmt import apply_bpe
bpe_parser = apply_bpe.create_parser()
bpe_args = bpe_parser.parse_args([
'--codes', codes,
Expand Down