I undertook this project in order to better understand how Transformer Models work. I became interested in the technology when I set out to work on a language translation model using an RNN. However, in learning about RNNs and LSTMs I came across Transformers. As I started to learn more I realiazed that I needed shift to this model for my project.
Since the paper Attention is all you need was published in 2017, introducing transformers, they and their many variants have become the models of choice for Natural Language Processing - NLP. They are used for to solve many types sequence to sequence problems including language translation, information retrieval, text classification, document summarization, image captioning, and genome analysis. More recently they are showing great results in image recognition and object detection.
The dataset comes from the European Parliament Proceedings Parallel Corpus 1996-2011 found at the Statistical Machine Translation website. Specifically, we use the French-English parallel corpus .
This project was created using Google Colab. All of the required libraries are included in the Colab environment. You will need a Google account in order to use Colab.
Attention is all you need - The paper that started it all
The Illustrated Transformer - Blog by Jay Alammar
Transformer model for language understanding - Tensorflow Tutorial
What Are Transformer Models in Machine Learning? -Blog by Rahul Agarwal