Skip to content

mariushobbhahn/transformers_from_scratch

Repository files navigation

Transformers from scratch

We build multiple small transformers from scratch. More concretely, we start by building the attention mechanism, single-head attention, multi-head attention and a decoder block. Then we build a small sentiment classifier, a BERT model and a GPT model using existing PyTorch functions.

In all cases, we only want to get the pipeline to work on a basic level and the performance does not have to beat the state-of-the-art or similar.

You can find a more detailed description of the project and the rules we followed in this LessWrong post: https://www.lesswrong.com/posts/98jCNefEaBBb7jwu6/building-a-transformer-from-scratch-ai-safety-up-skilling.