Skip to content

This repository contains an implementation of the LLaMA 3 (Large Language Model Meta AI) model using the MLX framework. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.

License

Notifications You must be signed in to change notification settings

mattdepaolis/Llama-3-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Llama 3 8B Instruct from Scratch

Overview

This repository hosts a custom implementation of the "Llama 3 8B Instruct" model using the MLX framework, designed specifically for Apple's silicon, ensuring optimal performance on Apple hardware. Additionally, it integrates a tokenizer created by Meta. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.

The implementation utilizes the robust MLX library, distinguished for its efficiency and flexibility in handling machine learning tasks on Apple silicon. This compatibility ensures that users can leverage the full capabilities of their hardware for machine learning applications.

Features

  • RMS-Normalization: RMSNorm is a simplification of the original layer normalization (LayerNorm). LayerNorm is a regularization technique that might handle the internal covariate shift issue so as to stabilize the layer activations and improve model convergence. It has been proved quite successful in LLaMA 3.
  • Activation Function: LLaMA 3 uses the SwiGLU activation function instead of ReLU, leading to improved training performance.
  • Rotary Positional Embeddings (RoPE): Inspired by the GPT-Neo-X project, LLaMA 3 incorporates rotary positional embeddings at each layer, enhancing the model's positional understanding.
  • Increased Context Length and Grouped-Query Attention (GQA) LLaMA 3 model has a doubled context window (from 4096 to 8192 tokens) and employs grouped-query attention. This allows for better processing of long documents, chat histories, and summarization tasks.
  • Optimized for Apple Silicon: Fully integrates with the MLX framework, exploiting the performance optimizations available on Apple silicon for superior model efficiency.

Usage

Setup

To utilize this model, clone the repository and install the necessary dependencies as outlined in the requirements.txt file.

Running the Model

Execute the model via the command line with various configurable options:

  • --model-path: Path to the model weights and tokenizer.
  • --prompt: The initial text to kickstart the generation process.
  • --max-tokens: The maximum number of tokens to generate.
  • --write-every: The interval of token detokenization to check the output during generation.
  • --temp: The sampling temperature, which affects the randomness of the output.
  • --seed: The seed for the pseudo-random number generator, ensuring reproducibility.

Example command:

python llama.py --model-path ./mlx_model --prompt "Hello, world!" --max-tokens 50

Text Generation

The generate function manages text generation, orchestrating the model's capabilities to produce text from input prompts. It oversees the conversion of text to tokens, processes through the model, and transforms tokens back to text.

Tokenizer

The tokenizer is integral for converting text into a format the neural network can process, segmenting language into manageable components and facilitating the model to generate human-like text.

Contributions

Contributions are welcome! If you have improvements or bug fixes, please submit a pull request or open an issue.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This repository contains an implementation of the LLaMA 3 (Large Language Model Meta AI) model using the MLX framework. The implementation focuses on the model architecture and the inference process. The code is restructured and heavily commented to facilitate easy understanding of the key parts of the architecture.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages