A pure NumPy implementation of the LLaMA model for inference and educational purposes. Supports LLaMA 1, 2, and 3 architectures (LLaMA 4 is not supported).
This repository demonstrates how to run LLaMA inference using only NumPy, making it ideal for learning and understanding transformer internals without heavy dependencies.
python llama.py "I have a dream"The example uses a small model trained by Andrej Karpathy for demonstration.
Inspired by llama3.np and Hugging Face Transformers. Licensed under their respective terms.
MIT