Nano-GPT : Decoder only Transformer

Simple GPT with multiheaded attention for char level tokens, inspired from Andrej Karpathy's video lectures : https://github.com/karpathy/ng-video-lecture

Features

Multi-headed self attention
Layer Norm layers
Skip connections
Feed Forward layer