Transformers from scratch

We build multiple small transformers from scratch. More concretely, we start by building the attention mechanism, single-head attention, multi-head attention and a decoder block. Then we build a small sentiment classifier, a BERT model and a GPT model using existing PyTorch functions.

In all cases, we only want to get the pipeline to work on a basic level and the performance does not have to beat the state-of-the-art or similar.

You can find a more detailed description of the project and the rules we followed in this LessWrong post: https://www.lesswrong.com/posts/98jCNefEaBBb7jwu6/building-a-transformer-from-scratch-ai-safety-up-skilling.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
BERT.ipynb		BERT.ipynb
GPT.ipynb		GPT.ipynb
README.md		README.md
basic_building_blocks.ipynb		basic_building_blocks.ipynb
binary_sentiment_classification.ipynb		binary_sentiment_classification.ipynb
figures.pptx		figures.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers from scratch

About

Releases

Packages

Languages

mariushobbhahn/transformers_from_scratch

Folders and files

Latest commit

History

Repository files navigation

Transformers from scratch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages