Skip to content

A tensorflow implementation of ByteNet with layer masking.

Notifications You must be signed in to change notification settings

soobinseo/bytenet_masked

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bytenet with masking

A Machine Translation Tensorflow Implementation Paper: Neural Machine Translation in Linear Time

Notes

  • Few model structures are different from the paper
    • I used the IWSLT 2016 de-en dataset and the code to process the dataset has been changed slightly from the original code of Kyubyung
    • I didn't implement 'Dynamic Unfolding'
    • I apply the masking for all residual blocks to eliminate the influence of pad embedding
    • I apply dropout just before the summation of residual block output.

Requirements

  • Tensorflow >= 1.0.0
  • Numpy >= 1.11.1
  • nltk > 3.2.2

Steps

  1. Download IWSLT 2016 German–English parallel corpus and extract it to data/ folder.
  2. Run train.py with specific hyper parameters.
  3. Run translate.py with same hyper parameters as above.

Results

I got the Bleu Score 8.44 after 20 epochs. However, I got the Bleu score 44.69 by in-sampled data with embedding size 512, and I think it means that the model was trained well but overfitted. Therefore I suggest that you should try to run this model with larger dataset.

About

A tensorflow implementation of ByteNet with layer masking.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages