Skip to content

Latest commit

 

History

History
12 lines (9 loc) · 614 Bytes

README.md

File metadata and controls

12 lines (9 loc) · 614 Bytes

English - Chinese neural machine translation sample data format.

This directory contains sample data for Chinese Neural Machine Translation. Please refer to this data format and prepare your own data.

Note that this sample data should not be used to train the NMT model.

  • en_yttm.model English YouTokenToMe model for src tokenizer.
  • zh_vocab.txt Chinese character vocabulary for tgt tokenizer.
  • train.en English sentence data for training.
  • train.zh Chinese sentence data for training.
  • valid.en English sentence data for validing.
  • valid.zh Chinese sentence data for validing.