OpenUMA

OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.

┌─────────────────────────────────────────────────────────────────────────┐
│                          OpenUMA v0.6.2                                  │
│              Unified Memory Abstraction for AI Inference                    │
└─────────────────────────────────────────────────────────────────────────┘

Key Features

Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs
Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference
Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU
Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers
Interactive TUI - Full terminal UI for hardware monitoring and configuration
Benchmarking - Real inference benchmarks with llama.cpp

Supported Hardware

Vendor	Series	Examples
AMD	Zen 3 (Cezanne, Renoir)	Ryzen 5 5600G, Ryzen 7 5700G
AMD	Zen 4 (Phoenix, Hawk Point)	Ryzen 7 7840HS, Ryzen AI 9 HX 370
AMD	Zen 5 (Strix Point)	Ryzen AI 9 HX 370, Ryzen AI 7 350
Intel	Alder Lake, Raptor Lake	Core i5-1240P, Core i7-12700H
Intel	Meteor Lake, Lunar Lake	Core Ultra 5 125H, Core Ultra 7 258V

Quick Start

# Build
cargo build --release

# Detect hardware
./target/release/openuma probe

# Launch interactive TUI
./target/release/openuma tui

# Generate config for llama.cpp
./target/release/openuma configure --engine llamacpp --model model.gguf

Terminal UI

┌─────────────────────────────────────────────────────────────────────────┐
│  [D]ashboard  [M]emory  [B]enchmark  [P]rofiles  [C]onfigure  [S]ettings│
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ╔═══════════════════════════════════════════════════════════════════╗  │
│  ║                     Hardware Overview                             ║  │
│  ╠═══════════════════════════════════════════════════════════════════╣  │
│  ║  CPU    AMD Ryzen 5 5600G (Cezanne)                             ║  │
│  ║          6 cores (12 threads), AVX2, 16MB L3                     ║  │
│  ║  iGPU   AMD Vega7 (Raven Ridge)                                  ║  │
│  ║          7 CUs, 512MB / 16384MB shared VRAM                      ║  │
│  ║          Vulkan ✓  OpenCL ✓  Zero-copy ✓                         ║  │
│  ║  RAM    32GB DDR4-3200 (Dual-channel)                            ║  │
│  ║          51.2 GB/s theoretical, 46.8 GB/s measured               ║  │
│  ╠═══════════════════════════════════════════════════════════════════╣  │
│  ║  ✓ Unified Memory Available    Tier: CONSUMER_UMA                 ║  │
│  ╚═══════════════════════════════════════════════════════════════════╝  │
│                                                                         │
│  Memory Partition:  iGPU: 7168 MB (35.0%)  CPU: 13312 MB (65.0%)       │
│  [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35%            │
│                                                                         │
│  Strategy: HybridIgpu  Zero-copy: Available                            │
│                                                                         │
│  [r] Refresh    [q] Quit                                                 │
└─────────────────────────────────────────────────────────────────────────┘

Commands

Command	Description
`openuma probe`	Detect hardware profile
`openuma tui`	Launch interactive terminal UI
`openuma partition --model <path>`	Show memory partition for model
`openuma configure --engine <engine> --model <path>`	Generate engine config
`openuma benchmark --model <path>`	Run inference benchmark
`openuma zerocopy --test`	Test DMA-BUF zero-copy
`openuma serve`	Start REST API server
`openuma profile list`	List known hardware profiles

Supported Inference Engines

llama.cpp

openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf

Ollama

openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf

KTransformers (MoE models)

openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf

How It Works

Memory Model

┌────────────────────────────────────────────────────────────────┐
│                      Unified Memory Pool                       │
│                                                                │
│  ┌──────────────┐                    ┌──────────────────────┐  │
│  │   iGPU VRAM  │  ◄── Zero-Copy ──►│   System RAM        │  │
│  │   (Shared)   │      DMA-BUF        │   (DDR4/DDR5)       │  │
│  └──────────────┘                    └──────────────────────┘  │
│                                                                │
│  Attention layers benefit from iGPU                             │
│  MoE experts stay on CPU                                        │
└────────────────────────────────────────────────────────────────┘

Key Insight

For LLM inference on APUs:

Attention layers → benefit from iGPU (parallel matrix ops)
MoE expert layers → should stay on CPU (sparse activation)
KV cache → benefits from unified memory zero-copy

Benchmarking

# Quick benchmark
openuma benchmark --model llama3-8b-q4_k_m.gguf

# Full multi-backend comparison
openuma benchmark --model model.gguf --full

╔════════════════════════════════════════════════════════════════════╗
║                     OpenUMA Benchmark Report                          ║
╠════════════════════════════════════════════════════════════════════╣
║ Best Backend: vulkan (12.5 t/s)
║ Average TPS: 8.2
╠════════════════════════════════════════════════════════════════════╣
║ Test 1: model.gguf [vulkan]
║   └── 12.5 tokens/sec | 8000 ms
║ Test 2: model.gguf [opencl]
║   └── 10.2 tokens/sec | 9800 ms
║ Test 3: model.gguf [cpu]
║   └── 4.8 tokens/sec | 20800 ms
╠════════════════════════════════════════════════════════════════════╣
║ Recommendations:
║ • Best performing backend: vulkan (~12.5 tokens/sec)
║ • GPU acceleration provides 2.6x speedup over CPU
╚════════════════════════════════════════════════════════════════════╝

Architecture

openuma/
├── crates/
│   ├── hw_probe/      # Hardware detection
│   ├── mem_mgr/       # Memory partitioning + zero-copy
│   ├── config_gen/    # Model metadata (GGUF)
│   ├── profile_db/    # Hardware profile database
│   ├── benchmark/     # Inference benchmarking
│   ├── api_server/    # REST API
│   ├── cli/           # CLI interface
│   └── tui/           # Terminal UI
└── profiles/          # Hardware profiles

Installation

Option A — Download Binary (Linux x86_64)

# Download latest release
curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \
  | tar xz

# Run it
./openuma probe

Option B — Build from Source

# Prerequisites: Rust 1.70+
git clone https://github.com/hamtun24/openuma.git
cd openuma
cargo build --release
./target/release/openuma probe

System Requirements

Requirement	Notes
OS	Linux (kernel 5.10+)
CPU	Any x86_64 with AMD APU or Intel iGPU
RAM	16GB minimum, 32GB recommended
Optional	llama.cpp in PATH for real benchmarks
Optional	Vulkan drivers for iGPU acceleration

Install Vulkan Drivers (if missing)

# AMD iGPU
sudo apt install mesa-vulkan-drivers

# Intel iGPU  
sudo apt install intel-media-va-driver mesa-vulkan-drivers

Install llama.cpp (optional)

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BUILD_EXAMPLES=ON
make -j$(nproc)
export PATH="$PATH:$(pwd)/bin"

Real World Results

OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.

Example: AMD Ryzen 5 5600G + 32GB DDR4

Setup	Command	Tokens/sec
llama.cpp defaults	`llama-cli -m model.gguf`	~3.1 t/s
OpenUMA-configured	`openuma configure --engine llamacpp --model model.gguf`	~7.2 t/s
Improvement		+132%

What OpenUMA changed:

Enabled Vulkan backend (default is CPU)
Set correct --n-gpu-layers for available shared VRAM
Configured dual-channel memory-aware thread count
Disabled mmap in favor of zero-copy DMA-BUF path

Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.

Community Benchmarks

This section will grow as users submit hardware profiles. Submit your results →

Contributing

Contributions welcome! Open issues and pull requests.

License

MIT License - see LICENSE for details.

OpenUMA - Making every x86 machine a first-class AI citizen.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
crates		crates
docs/outreach		docs/outreach
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenUMA

Key Features

Supported Hardware

Quick Start

Terminal UI

Commands

Supported Inference Engines

llama.cpp

Ollama

KTransformers (MoE models)

How It Works

Memory Model

Key Insight

Benchmarking

Architecture

Installation

Option A — Download Binary (Linux x86_64)

Option B — Build from Source

System Requirements

Install Vulkan Drivers (if missing)

Install llama.cpp (optional)

Real World Results

Example: AMD Ryzen 5 5600G + 32GB DDR4

Community Benchmarks

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenUMA

Key Features

Supported Hardware

Quick Start

Terminal UI

Commands

Supported Inference Engines

llama.cpp

Ollama

KTransformers (MoE models)

How It Works

Memory Model

Key Insight

Benchmarking

Architecture

Installation

Option A — Download Binary (Linux x86_64)

Option B — Build from Source

System Requirements

Install Vulkan Drivers (if missing)

Install llama.cpp (optional)

Real World Results

Example: AMD Ryzen 5 5600G + 32GB DDR4

Community Benchmarks

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages