oRKLLM

              )       (
             ( \     / )          ██████╗ ██████╗ ██╗  ██╗██╗     ██╗     ███╗   ███╗
              \_\   /_/          ██╔═══██╗██╔══██╗██║ ██╔╝██║     ██║     ████╗ ████║
            .-----------.        ██║   ██║██████╔╝█████╔╝ ██║     ██║     ██╔████╔██║
           /  [*]   [*]  \       ██║   ██║██╔══██╗██╔═██╗ ██║     ██║     ██║╚██╔╝██║
          |    \  ω  /    |      ╚██████╔╝██║  ██║██║  ██╗███████╗███████╗██║ ╚═╝ ██║
           \  .-------.  /        ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝╚══════╝╚═╝     ╚═╝
          _/\/  #####  \/\_
         /  /   #####   \  \      Pronounced "ORC-EL-EL-EM"
        / ,/    #####    \, \     OpenAI-compatible LLM inference for Rockchip NPU.
       | / |  .-------.  | \ |    No cloud. No nonsense. Just efficient NPU inference.
       |/  '--[=======]--'  \|
       |       |     |       |
        \   ,  |     |  ,   /
         \  \. |     | ./  /
          '--' |     | '--'
               |     |
              / \   / \
             '   '-'   '

oRKLLM is an energy-efficient, OpenAI API-compatible local LLM inference server and premium admin console designed specifically for Rockchip NPU-powered platforms (such as the RK3576 found in the NanoPi M5 and RK3588 series SBCs).

Inspired by jundot/oMLX (which does the same for Apple Silicon), oRKLLM is adaptively re-engineered to run on the Rockchip RKLLM runtime (librkllmrt.so) with its unique hardware and concurrency constraints.

🚀 Key Features

OpenAI API Compatibility: Drop-in /v1/chat/completions, /v1/models, and /v1/embeddings endpoints — works with Open WebUI, Claude Code, and any OpenAI-compatible client.
Full Admin Console: Built with Vue 3 and Vuetify 3 — six dedicated pages:
- Dashboard — live CPU/NPU/RAM/Temperature gauges, serving stats, inference playground
- Models — local model manager, HuggingFace search, collection browser, direct downloader
- Settings — inference defaults, HF token, prefix cache config, trusted proxy
- Logs — full-page real-time log terminal over WebSocket
- Bench — inference benchmark (TTFT, prefill tok/s, generation tok/s)
- Chat — full streaming chat UI with system prompt, model selector, and parameter controls
Multi-User Auth & RBAC: Local accounts or federated SSO via OIDC/SAML (Keycloak, Google, Azure AD). Two roles: admin and user. Site Management UI for user CRUD, auth provider config, and audit log.
OIDC / SAML SSO: Standard Flow with PKCE for public clients (no secret required). Group-to-role mapping from IdP claims. Routes at /auth/oidc/* and /auth/saml/*.
HuggingFace Integration: Search the HF Hub, browse collections (e.g. huggingface.co/collections/Qwen/qwen3-...), download .rkllm models directly from the admin console.
Prefix KV Cache: Tiered SSD hot/cold LRU cache saves KV state between conversation turns, skipping re-prefill of repeated prefixes. Sliding context window prevents NPU OOM on long conversations.
Process-Isolated Execution: Inference engine runs in a dedicated child process. Model unload/swap terminates the process, guaranteeing full NPU driver memory cleanup.
Smart Resource Management: Single active model lock, auto-swap, configurable idle timeout.
Database Migrations: PRAGMA user_version migration runner — schema changes apply automatically on startup, safe across upgrades from any previous version.
Seamless Mock Fallback: On non-ARM64/non-Linux platforms, oRKLLM falls back to a JS mock engine — rapid UI development on macOS/Windows without a board.
Dynamic N-API Bindings: C++ addon uses dlopen/dlsym — no compile-time dependency on librkllmrt.so.
Secure Auth: PBKDF2-HMAC-SHA256 password hashing, signed session cookies (userId|username|role|expires|HMAC), backward-compatible with single-user installs.

🛠️ Architecture & Tech Stack

graph TD
    Client[HTTP Client / Open WebUI] -->|REST API| Fastify[Fastify Server]
    Fastify -->|Admin SPA| Admin[Vue 3 / Vuetify Admin]
    Fastify -->|OpenAI Routes| API[OpenAI API Router]

    API -->|Queue Request| Pool[Engine Pool & Resource Manager]
    Pool -->|Spawn / Message| Worker[Worker Process]
    Worker -->|N-API Addon| Addon[orkllm_napi.node]
    Addon -->|Dynamic dlopen| C_API[librkllmrt.so C API]
    C_API -->|NPU Driver| NPU[Rockchip NPU Hardware]

    Admin -->|WebSocket Telemetry| Monitor[Telemetry Monitor]
    Monitor -->|/sys/kernel/debug/rknpu| Linux[Linux Kernel]

Layer	Technology
API Server	Node.js + Fastify (ES Modules)
Native Bindings	C++ N-API addon (`node-addon-api`) with `dlopen`/`dlsym`
Mock Fallback	Pure JS mock engine (auto-enabled on non-ARM64/non-Linux)
Frontend	Vue 3 + Vuetify 3 SPA, built with Vite, route-based code splitting
Database	SQLite via `node:sqlite` (Node ≥22.5) or `better-sqlite3` (Node 20)
Auth	Local PBKDF2 + OIDC (PKCE) + SAML 2.0
Testing	Playwright E2E (33 tests), mock OIDC service container in CI

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Pre-built .deb packages for ARM64 are available via the oRKLLM APT repository or directly from the GitHub Releases page.

Option A — APT repository (recommended)

# Trust the oRKLLM signing key
curl -fsSL https://mafischer.github.io/oRKLLM/orkllm.gpg \
  | sudo gpg --dearmor -o /usr/share/keyrings/orkllm.gpg

# Add the repository
echo "deb [arch=arm64 signed-by=/usr/share/keyrings/orkllm.gpg] \
  https://mafischer.github.io/oRKLLM stable main" \
  | sudo tee /etc/apt/sources.list.d/orkllm.list

sudo apt update && sudo apt install orkllm

Option B — Direct download

wget https://github.com/mafischer/oRKLLM/releases/latest/download/orkllm_VERSION_arm64.deb
sudo dpkg -i orkllm_VERSION_arm64.deb

Configure

sudo nano /etc/orkllm/orkllm.conf

ORKLLM_HOST=0.0.0.0
ORKLLM_PORT=8000
ORKLLM_LIB_PATH=/usr/lib/librkllmrt.so
ORKLLM_MODELS_DIR=/var/lib/orkllm/models
ORKLLM_DB_PATH=/var/lib/orkllm/orkllm.db

Add models and start

sudo cp your_model.rkllm /var/lib/orkllm/models/
sudo systemctl start orkllm

Admin console: http://<device-ip>:8000/admin

Service management

sudo systemctl start|stop|restart|status orkllm
journalctl -u orkllm -f

⚙️ Installation from Source

Prerequisites

Node.js ≥ 18 (≥ 22.5 preferred for native node:sqlite)
node-gyp dependencies: Python 3, C++ compiler (Xcode CLT on macOS, build-essential on Linux)
A compiled .rkllm model (use rkllm-toolkit to convert from HuggingFace)
librkllmrt.so on the target board (typically at /usr/lib/librkllmrt.so)

Setup & Run

# Install all dependencies (compiles native addon)
npm install

# Build Vue frontend
npm run build:frontend

# Start development server (mock engine auto-enabled on macOS)
npm run dev:server
# → http://localhost:8000/admin

Environment Variables

Variable	Default	Description
`ORKLLM_HOST`	`127.0.0.1`	Listen address (`0.0.0.0` for LAN)
`ORKLLM_PORT`	`8000`	Listen port
`ORKLLM_LIB_PATH`	`/usr/lib/librkllmrt.so`	Path to Rockchip RKLLM runtime
`ORKLLM_MODELS_DIR`	`./models`	Directory scanned for `.rkllm` files
`ORKLLM_DB_PATH`	`~/.config/orkllm/auth.db`	SQLite database path
`ORKLLM_TRUSTED_PROXY`	(unset)	`true` or CIDR to trust `X-Forwarded-*` headers

🧪 Running Tests

# Full E2E suite (mock mode, no board required)
npm test

# SSO integration tests using local Keycloak container (same as CI)
npm run test:sso        # starts Keycloak + runs SSO tests
npm run test:sso:down   # tear down Keycloak when done

CI runs the full suite including OIDC SSO via a containerised Keycloak instance with a pre-configured orkllm realm.

Test environment variables

Set these in .env locally (gitignored) or as GitHub Actions secrets/variables. The .env file is loaded automatically by Playwright.

Variable	Where	Description
`ORKLLM_TEST_ADMIN_USER`	Secret	Admin username created during test setup
`ORKLLM_TEST_ADMIN_PASS`	Secret	Admin password
`ORKLLM_TEST_OIDC_ISSUER`	Secret	Real Keycloak issuer URL (for `ORKLLM_TEST_LIVE=1`)
`ORKLLM_TEST_OIDC_CLIENT_ID`	Secret	OIDC client ID (`orkllm-oidc`)
`ORKLLM_TEST_SAML_METADATA_URL`	Secret	Real Keycloak SAML metadata URL
`ORKLLM_TEST_OIDC_USER`	Secret	Keycloak test user (`testuser`)
`ORKLLM_TEST_OIDC_USER_PASS`	Secret	Keycloak test user password
`ORKLLM_TEST_OIDC_ADMIN_USER`	Secret	Keycloak admin test user (`testadminuser`)
`ORKLLM_TEST_OIDC_ADMIN_PASS`	Secret	Keycloak admin test user password
`ORKLLM_TEST_MOCK_OIDC_URL`	Auto-set	Issuer URL of CI Keycloak container (`http://localhost:8080/realms/orkllm`)
`ORKLLM_TEST_REDIRECT_BASE`	Auto-set	Base URL for OIDC `redirect_uri` — derived from this so protocol is correct (`http://` in CI, `https://` live)
`ORKLLM_TEST_LIVE`	Variable	Set to `1` to run SSO tests against real Keycloak on LAN
`ORKLLM_TEST_LIVE_URL`	Variable	Live server URL (e.g. `https://orkllm.fischerapps.com`)

Debugging failed CI tests

When E2E tests fail in CI, Playwright uploads screenshots and error context as an artifact named playwright-report (retained 7 days).

Download via CLI:

gh run download <run-id> --name playwright-report -D /tmp/report
# Find the run ID with: gh run list --limit 5

Download via browser: GitHub Actions run → Summary → Artifacts section at the bottom → download playwright-report.zip.

Each failed test has a test-failed-1.png screenshot and an error-context.md with the stack trace, making it easy to see exactly what the browser showed at the point of failure.

🤝 Credits & Acknowledgements

jundot/oMLX: Inspired the dashboard layout, metrics design, single-model lifecycle, and OpenAI compatibility structures.
Rockchip: SDKs and runtime libraries (librkllmrt.so) powering localized NPU inference.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github		.github
debian		debian
e2e		e2e
frontend		frontend
scripts		scripts
src		src
.gitignore		.gitignore
.releaserc.json		.releaserc.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
binding.gyp		binding.gyp
docker-compose.test.yml		docker-compose.test.yml
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oRKLLM

🚀 Key Features

🛠️ Architecture & Tech Stack

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Option A — APT repository (recommended)

Option B — Direct download

Configure

Add models and start

Service management

⚙️ Installation from Source

Prerequisites

Setup & Run

Environment Variables

🧪 Running Tests

Test environment variables

Debugging failed CI tests

🤝 Credits & Acknowledgements

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

oRKLLM

🚀 Key Features

🛠️ Architecture & Tech Stack

📦 Installing from a Release Package (Ubuntu / Armbian ARM64)

Option A — APT repository (recommended)

Option B — Direct download

Configure

Add models and start

Service management

⚙️ Installation from Source

Prerequisites

Setup & Run

Environment Variables

🧪 Running Tests

Test environment variables

Debugging failed CI tests

🤝 Credits & Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages