Skip to content

rahmanidashti/DecoderOnlyTransformer

Repository files navigation

DecoderOnlyTransformer

Decoder-Only Transformer

This repository contains an implementation of a Decoder-Only Transformer, a foundational architecture for large language models (LLMs). The model processes input text by passing it through several key components:

  • Word Embedding: Converts input tokens (e.g., words) into dense vector representations.
  • Positional Encoding: Adds positional information to embeddings, allowing the model to understand token order.
  • Masked Self-Attention: Focuses on previous tokens during training to enable autoregressive text generation.
  • Residual Connections and Fully Connected Layers: Enhance model efficiency and depth for better learning.
  • Softmax Layer: Outputs probabilities for the next token in the sequence.

This implementation is ideal for experimenting with LLM concepts and serves as a learning resource for understanding decoder-only transformer models. I have used PyTorch + Lightning to create and optimize a Decoder-Only Transformer.

Feel free to explore, modify, and contribute!

About

Decoder-Only Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published