Skip to content

Implementation of the Transformer architecture, based on the 2017 Paper "Attention Is All You Need" (https://arxiv.org/abs/1706.03762)

License

Notifications You must be signed in to change notification settings

flpeters/dstc-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Transformer

Part of a Student Project at TU Berlin
By: Florian Peters
last change: 13.03.2019

Architecture

This is an implementation of the Transformer architecture, as described in the 2017 Paper "Attention Is All You Need".

Tokenization

We are using the Byte Pair Encoding (BPE) Scheme as described in "Language Models are Unsupervised Multitask Learners" (2019), bettern known as GPT-2.
The BPE implementation is taken directly from https://github.com/huggingface/, however it seems that the original link we used is no longer valid.

Code

All the code needed to run the Model is in Main.ipynb, with only the Tokenizer / Encoder / Decoder seperated into textencoder.py.
We are using the fast.ai deep learning library, which is based on pytorch.
Originally we were inspired to use this model by "The Annotated Transformer" (2018).
The three step training process described below was inspired by "Universal Language Model Fine-tuning for Text Classification" (2018), and "Improving Language Understanding by Generative Pre-Training" (2018).

How To Use:

  • Open the jupyter notebook Main.ipynb.
  • You should be able to run all the cells up to Step 1: LM (wiki103).
  • From here on, there are three steps needed to complete the training of the model (or less, depending on what you want the model to do).
  1. Step 1: LM (wiki103): Training a language model for predicting the next word in a sentence on the wikitext103 dataset.
  2. Step 2: LM (dstc7): Taking the pretrained model from step one and fine tuning it with the same task on the dstc7 dataset.
  3. Step 3: CL (dstc7) Taking the pretrained model from step two and fine tuning it on a classification task on the dstc7 dataset. The classification task we trained on is Subtask nr. 4 from the "NOESIS: Noetic End-to-End Response Selection Challenge" (2018).
  • All the data is available as a custom data set downloadable from Kaggle here. To run the Model, only data.7z is needed. The other files are the raw sources of text, in case you want to try a different text representation.
  • Download the data and change the defined paths at the top of each step to point to the correct location within the data folder. If you extract data.7z to the same directory as Main.ipynb, the paths should work as they are.
  • At the end of each training step, use learner.save('file_name') to save the weights to disk. The directory used will be data/models. The data/ part of the path is equal to whatever you specify in the path variable passed to the databunch, and the models part is a new folder that will be created if it doesn't already exist. The file name will be whatever you pass to the function. Be careful though, as using the same filename twice results in the old file being overwritten.
  • At the start of each Step you can specify a pretrained_path, which should point to the weights saved in the previous step.
  • Be aware that when changing from step 2 to step 3, the task changes, and so does the architecture of the Model. When creating the learner, the weights of the pretrained file are converted to fit the new architecture. Saving and restarting on this step might not work properly though.
  • After each step, you get a fully working model. During step 1 and 2 you can use the predict() function to make the model generate text from an arbitrary input primer. The last step (classification) never worked as intended though, and we didn't get around to fixing it. We will leave it to the reader to improve on what has already been done.

About

Implementation of the Transformer architecture, based on the 2017 Paper "Attention Is All You Need" (https://arxiv.org/abs/1706.03762)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published