A playground-like experimental project to explore various transformer architectures from scratch.
- Intuition behind Attention Mechanism | Notebook
- Intuition behind individual Transformer Blocks | Notebook
- Intuition behind Chunked Cross-attention by RETRO Deepmind | Notebook
Create virtual environment:
conda create -n applied-transformers python=3.10
conda activate applied-transformers
Install Dependencies:
pip install -r requirements.txt
- Transformer Model from Scratch {Vaswani et. al, 2017} | Dataset Sample | Python Code
# example training run
python transformer_architectures/vanilla/run.py --num_layers=5\
--d_model=256 --d_ff=1024 --num_heads=4 --dropout=0.2 \
--train_path=<PATH_TO_TRAIN_DATASET>.csv --valid_path=<PATH_TO_VALIDATION_DATASET>.csv
- GPT Model from Scratch {Radford et. al, 2018} | Coming Soon
- BERT Model from Scratch {Lewis et. al, 2019} | Coming Soon
- RETRO Model from Scratch {Borgeaud et. al, 2021} | Coming Soon
- BART Model from Scratch {Lewis et. al, 2019} | Coming Soon
- Text Generation Schemes
- Text Generation Eval Metrics
- Sequence Tokenization Algorithms
- Optimized Einsum Implementation