In this notebook, we implemented a Decoder-based model to generate text from a given .txt file. The key components of our implementation included token embedding, positional encoding, multi-head self-attention, and feedforward layers, which together form the foundation of transformer-based text generation.
We also explored the self-attention mechanism, which allows the model to dynamically weigh different parts of the input sequence to capture long-range dependencies. Unlike traditional sequence models such as RNNs, self-attention enables efficient parallelization and better context modeling, leading to more coherent text generation.