Skip to content

Commit

Permalink
[s2s] dont document packing because it hurts performance (huggingface…
Browse files Browse the repository at this point in the history
  • Loading branch information
sshleifer committed Jul 27, 2020
1 parent 9d0d3a6 commit 1e00ef6
Showing 1 changed file with 1 addition and 11 deletions.
12 changes: 1 addition & 11 deletions examples/seq2seq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,7 @@ this should make a directory called `cnn_dm/` with files like `test.source`.
```

WMT16 English-Romanian Translation Data:

This dataset comes in two formats. The "packed" version merges short training examples into examples of <200 tokens to increase GPU utilization (and also improves validation performance).

```bash
cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro_packed_train_200.tgz
tar -xzvf wmt_en_ro_packed_200.tgz
export ENRO_DIR=wmt_en_ro_packed_train_200
```

The original data can also be downloaded with this command:
download with this command:
```bash
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
tar -xzvf wmt_en_ro.tar.gz
Expand Down

0 comments on commit 1e00ef6

Please sign in to comment.