This is an implementation of the transformer for the paper
- Features supported:
- Multi-layer transformer encoder-decoder networks
- Multi-GPU / Single GPU training (mGPU is outdated for now)
- Checkpointing models for better Memory/Speed trade-off
- Research ideas
- The code is based on several modules (Dictionary and Loss functions) of "OpenNMT-py"