Skip to content

Add ArlowGPT #36988

@yuchenxie4645

Description

@yuchenxie4645

Add Arlow Model & Tokenizer Support

Adds support for the Arlow model and its corresponding ArlowTokenizer.

Features:

  • Flash Attention 2 for fast memory-efficient training
  • Rotary Position Embeddings (RoPE) with rope_theta=100000.0
  • Grouped Query Attention (GQA)
  • Cross-Attention for future multimodal extensions
  • RMSNorm, SiLU activations, tied embeddings
  • Supports full causal language modeling (ArlowForCausalLM)
  • ArlowTokenizerFast (fast tokenizer, vocab size 131072)
  • ArlowTokenizer (Non fast tokenizer)

Includes:

  • ArlowConfig, ArlowModel, ArlowForCausalLM, ArlowPreTrainedModel
  • ArlowTokenizerFast as fast tokenizer (tokenization_arlow_fast.py)
  • ArlowTokenizer as fast tokenizer (tokenization_arlow.py)
  • Auto mapping + lazy loading registration

@ArthurZucker

PR Link: Here

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

#36899

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions