Bytenet with masking

A Machine Translation Tensorflow Implementation Paper: Neural Machine Translation in Linear Time

Notes

Few model structures are different from the paper
- I used the IWSLT 2016 de-en dataset and the code to process the dataset has been changed slightly from the original code of Kyubyung
- I didn't implement 'Dynamic Unfolding'
- I apply the masking for all residual blocks to eliminate the influence of pad embedding
- I apply dropout just before the summation of residual block output.

Requirements

Tensorflow >= 1.0.0
Numpy >= 1.11.1
nltk > 3.2.2

Steps

Download IWSLT 2016 German–English parallel corpus and extract it to data/ folder.
Run train.py with specific hyper parameters.
Run translate.py with same hyper parameters as above.

Results

I got the Bleu Score 8.44 after 20 epochs. However, I got the Bleu score 44.69 by in-sampled data with embedding size 512, and I think it means that the model was trained well but overfitted. Therefore I suggest that you should try to run this model with larger dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
data.py		data.py
model.py		model.py
ops.py		ops.py
train.py		train.py
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

data.py

data.py

model.py

model.py

ops.py

ops.py

train.py

train.py

translate.py

translate.py

Repository files navigation

Bytenet with masking

Notes

Requirements

Steps

Results

About

Releases

Packages

Languages

soobinseo/bytenet_masked

Folders and files

Latest commit

History

Repository files navigation

Bytenet with masking

Notes

Requirements

Steps

Results

About

Resources

Stars

Watchers

Forks

Languages