Running a LLM on the ESP32-S3

Summary

This project demonstrates running a Large Language Model (LLM) directly on an ESP32-S3 microcontroller. It implements a highly optimized inference engine capable of running small transformer models with extensive optimizations for embedded systems.

The model is a 260K parameter TinyLlamas checkpoint trained on the TinyStories dataset. Despite its small size, it generates coherent, simple text suitable for embedded applications.

Based on llama2.c with extensive ESP32-specific optimizations.

Hardware Requirements

ESP32-S3 with 2MB PSRAM (ESP32-S3FH4R2)
8MB flash (7.9MB app partition)
~1.5MB available RAM

Performance

Achieves ~32.83 tokens/second through:

SIMD Acceleration: ESP32-S3 vector instructions via ESP-DSP
Memory Alignment: 16-byte aligned allocations for SIMD operations
Dual-Core Processing: Parallel computation across both cores
Optimized Clocks: CPU at 240MHz, PSRAM at 80MHz
Assembly Optimizations: Custom float division routines
Efficient Math: Lookup tables for activation functions

Features

Web Interface

Real-time token streaming via WebSocket
Adjustable temperature and max tokens
Mobile-responsive design
Access at http://192.168.4.1

Core Components

LLM Engine: Complete transformer implementation with top-p sampling
Memory Management: Custom aligned allocators for SIMD efficiency
WiFi Manager: Station mode connectivity
Embedded Model: 1MB model + 6KB tokenizer built into firmware

Project Structure

├── main/                  # Application entry point
├── components/
│   ├── llm/              # LLM inference engine
│   │   ├── assets/       # Embedded model & tokenizer
│   │   └── src/          # Core implementation + ASM
│   └── web/              # Web server & WiFi
│       └── src/          # HTTP/WebSocket server
└── partitions.csv        # 8MB partition table

Setup and Installation

Requires ESP-IDF v5.3.2 or later.

1. Configure WiFi

# Copy the template
cp components/web/include/wifi_config.h.template components/web/include/wifi_config.h

# Edit with your WiFi credentials
# Set WIFI_SSID and WIFI_PASS in wifi_config.h

2. Build and Flash

# Set up ESP-IDF environment
. ~/esp/esp-idf/export.sh

# Configure for ESP32-S3
idf.py set-target esp32s3

# Build the project
idf.py build

# Flash and monitor (replace PORT with your device)
idf.py -p /dev/ttyUSB0 flash monitor

Usage

After flashing, ESP32 connects to your WiFi network
Find the device IP in serial monitor output
Navigate to http://<device-ip> in your browser
Enter prompts and watch tokens stream in real-time

Serial Output

The device provides detailed logs including:

Memory usage statistics
Performance metrics per layer
Token generation progress

Configuration

Key options in idf.py menuconfig:

I2C GPIO pins (if using external peripherals)
I2C clock speed

Technical Details

Model: 260K parameters, 6 layers, 288 dimensions
Vocabulary: 512 tokens optimized for simple stories
Context: 512 token maximum sequence length
Sampling: Top-p (nucleus) sampling with temperature control
Memory: ~1.5MB runtime memory requirement

Limitations

Simple vocabulary suitable for basic stories
512 token context window
No dynamic model loading (embedded in firmware)
WiFi required for web interface

Contributing

Contributions welcome! Areas for improvement:

Further SIMD optimizations
Model quantization support
External storage for larger models
Additional sampling methods

License

This project is intended to be open source. Please add a LICENSE file to clarify terms.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cursor/rules		.cursor/rules
components		components
main		main
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
ESP32_LLM.jpg		ESP32_LLM.jpg
Kconfig.projbuild		Kconfig.projbuild
README.md		README.md
dependencies.lock		dependencies.lock
linker.lf		linker.lf
llm_output.gif		llm_output.gif
partitions.csv		partitions.csv
sdkconfig		sdkconfig
sdkconfig.ci		sdkconfig.ci
sdkconfig.old		sdkconfig.old

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Running a LLM on the ESP32-S3

Summary

Hardware Requirements

Performance

Features

Web Interface

Core Components

Project Structure

Setup and Installation

1. Configure WiFi

2. Build and Flash

Usage

Serial Output

Configuration

Technical Details

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Languages

eric-humane/esp32-llm

Folders and files

Latest commit

History

Repository files navigation

Running a LLM on the ESP32-S3

Summary

Hardware Requirements

Performance

Features

Web Interface

Core Components

Project Structure

Setup and Installation

1. Configure WiFi

2. Build and Flash

Usage

Serial Output

Configuration

Technical Details

Limitations

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages