-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Closed
Labels
Description
Add Arlow Model & Tokenizer Support
Adds support for the Arlow model and its corresponding ArlowTokenizer.
Features:
- Flash Attention 2 for fast memory-efficient training
- Rotary Position Embeddings (RoPE) with
rope_theta=100000.0 - Grouped Query Attention (GQA)
- Cross-Attention for future multimodal extensions
- RMSNorm, SiLU activations, tied embeddings
- Supports full causal language modeling (ArlowForCausalLM)
ArlowTokenizerFast(fast tokenizer, vocab size 131072)ArlowTokenizer(Non fast tokenizer)
Includes:
ArlowConfig,ArlowModel,ArlowForCausalLM,ArlowPreTrainedModelArlowTokenizerFastas fast tokenizer (tokenization_arlow_fast.py)ArlowTokenizeras fast tokenizer (tokenization_arlow.py)- Auto mapping + lazy loading registration
PR Link: Here
Open source status
- The model implementation is available
- The model weights are available