seq2seq

Attention-based sequence to sequence learning

Dependencies

TensorFlow for Python 3
YAML and Matplotlib modules for Python 3: sudo apt-get install python3-yaml python3-matplotlib

How to use

Train a model (CONFIG is a YAML configuration file, such as config/default.yaml):

python3 -m translate CONFIG --train -v

Translate text using an existing model:

python3 -m translate CONFIG --decode FILE_TO_TRANSLATE --output OUTPUT_FILE

or for interactive decoding:

python3 -m translate CONFIG --decode

Example model:

experiments/WMT14/download.sh    # download WMT14 data into data/raw
experiments/WMT14/prepare.sh     # preprocess the data, and copy the files to experiments/WMT14/data
python3 -m translate experiments/WMT14/baseline.yaml --train -v   # train a baseline model on this data

Features

YAML configuration files
Beam-search decoder
External language model
Ensemble decoding
Multiple encoders
Hierarchical encoder
Bidirectional encoder
Local attention model
Convolutional attention model
Detailed logging
Periodic BLEU evaluation
Periodic checkpoints
Multi-task training: train on several tasks at once (e.g. French->English and German->English MT)
Subwords training and decoding
Input binary features instead of text
Pre-processing script: we provide a fully-featured Python script for data pre-processing (vocabulary creation, lowercasing, tokenizing, splitting, etc.)
Dynamic RNNs: we use symbolic loops instead of statically unrolled RNNs. This means faster model creation, and that we don't need buckets

Speech translation

To replicate the results from arxiv.org/abs/1612.01744, you should look at the branch speech. The default branch baseline implements a different model, from arxiv.org/abs/1409.0473.

Credits

This project is based on TensorFlow's reference implementation
We include some of the pre-processing scripts from Moses
The scripts for subword units come from github.com/rsennrich/subword-nmt

Name		Name	Last commit message	Last commit date
Latest commit History 538 Commits
config		config
experiments		experiments
scripts		scripts
translate		translate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
seq2seq.sh		seq2seq.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seq2seq

Dependencies

How to use

Features

Speech translation

Credits

About

Releases

Packages

Languages

License

nanfengpo/seq2seq-1

Folders and files

Latest commit

History

Repository files navigation

seq2seq

Dependencies

How to use

Features

Speech translation

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages